# Webscraping weather.gov.sg
This is a guide on how to scrap town and weather information from the [SG Government's Weather Website](http://www.weather.gov.sg/weather-forecast-2hrnowcast-2/). There is an API for this, but this method is straightfoward and doesn't require an API key.

## Load the soup

In [67]:
import bs4 as bs
import pandas as pd
import numpy as np
import urllib.request

source = urllib.request.urlopen('http://www.weather.gov.sg/weather-forecast-2hrnowcast-2/').read()
soup = bs.BeautifulSoup(source,'html.parser')

## Find the target data in the tables
Visit the website first to get a clue on where the data is conatined.

After initial digging in the website's elements, we know that the data is store in 2 tables. Below is the soup for one of them.

In [87]:
soup.find_all('table')[0]

<table class="table"> <thead>
<tr>
<th class="col-xs-6">Town</th>
<th class="col-xs-6">Weather</th>
</tr>
</thead>
<tbody> <tr> <td>Ang Mo Kio</td> <td><span><img alt="icon-moderate-rain-sm" src="http://www.weather.gov.sg/wp-content/themes/wiptheme/assets/img/icon-moderate-rain-sm.png"/> Moderate Rain</span></td> </tr> <tr> <td>Bedok</td> <td><span><img alt="icon-cloudy-sm" src="http://www.weather.gov.sg/wp-content/themes/wiptheme/assets/img/icon-cloudy-sm.png"/> Cloudy</span></td> </tr> <tr> <td>Bishan</td> <td><span><img alt="icon-moderate-rain-sm" src="http://www.weather.gov.sg/wp-content/themes/wiptheme/assets/img/icon-moderate-rain-sm.png"/> Moderate Rain</span></td> </tr> <tr> <td>Boon Lay</td> <td><span><img alt="icon-cloudy-sm" src="http://www.weather.gov.sg/wp-content/themes/wiptheme/assets/img/icon-cloudy-sm.png"/> Cloudy</span></td> </tr> <tr> <td>Bukit Batok</td> <td><span><img alt="icon-cloudy-sm" src="http://www.weather.gov.sg/wp-content/themes/wiptheme/assets/img/icon-cl

## Town data

In [25]:
tables[1].find_all('tr')[1].find_all('td')[0].text

'Pasir Ris'

## Weather data

In [32]:
str(tables[1].find_all('tr')[1].find_all('td')[1].text).replace('\xa0', '')

'Showers'

## Get the forecast generation time

In [27]:
soup.find('span', {'class':'time'}).text

'8.00 am to 10.00 am'

## Gather all weather data in a dataframe
Use loops to navigate the soup and extract all towns and respective weather.

In [63]:
ls_town = []
ls_weather = []
for table in soup.find_all('table'):
    for tr in table.find_all('tr'):
        ls_td = tr.find_all('td')
        if ls_td:
            town = ls_td[0].text
            weather = ls_td[1].text.replace('\xa0', '')
            ls_town.append(town)
            ls_weather.append(weather)

# Create dataframe
df = pd.DataFrame({
    'town': ls_town,
    'weather': ls_weather,
})

# Print the first 5 lines
df.head()

Unnamed: 0,town,weather
0,Ang Mo Kio,Moderate Rain
1,Bedok,Cloudy
2,Bishan,Moderate Rain
3,Boon Lay,Cloudy
4,Bukit Batok,Cloudy


## Add is_rain indicator column

In [80]:
ls_rain = ['rain', 'showers']

df['is_rain'] = 0
df['is_rain'] = np.where(df['weather'].str.lower().str.contains('rain'),1, df['is_rain'])
df['is_rain'] = np.where(df['weather'].str.lower().str.contains('showers'),1, df['is_rain'])
df.head()

Unnamed: 0,town,weather,is_rain
0,Ang Mo Kio,Moderate Rain,1
1,Bedok,Cloudy,0
2,Bishan,Moderate Rain,1
3,Boon Lay,Cloudy,0
4,Bukit Batok,Cloudy,0
