# Week 5 applications
## Web scrapping

Apart from the standard libraries, we will also need `BeautifulSoup` and `requests` libraries.

In [1]:
%pip install beautifulsoup4 requests

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import warnings
warnings.simplefilter(action = "ignore", category = FutureWarning)

### Demographics in Singapore

In the first practice, we will obtain two tables on this page: https://en.wikipedia.org/wiki/Demographics_of_Singapore

With information on

+ Gender decomposition of resident population, and

+ Household income from work.

In [3]:
url = "https://en.wikipedia.org/wiki/Demographics_of_Singapore"

# Send a request and parse the HTML content
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Extract all tables with the "wikitable" class
tables = soup.find_all("table", class_= "wikitable")

# Convert table into data frame
dfs = [pd.read_html(str(table))[0] for table in tables]

# Select specific tables based on index
gender_compo = dfs[4]
household_income = dfs[20]

In [4]:
gender_compo.head()


Unnamed: 0,Year,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015,2020
0,Total,1646.4,1886.9,2013.6,2262.6,2282.1,2482.6,2735.9,3013.5,3273.4,3467.8,3771.7,3902.7,4044.2
1,Males,859.6,973.8,1030.8,1156.1,1159.0,1258.5,1386.3,1514.0,1634.7,1721.1,1861.1,1916.6,1977.6
2,Females,786.8,913.1,982.8,1106.5,1123.1,1224.2,1349.6,1499.5,1638.7,1746.7,1910.6,1986.1,2066.7
3,"Sex ratio (males per 1,000 females)",1093.0,1066.0,1049.0,1045.0,1032.0,1028.0,1027.0,1010.0,998.0,985.0,974.0,965.0,957.0


In [5]:
household_income.head()

Unnamed: 0,Year,1990,1995,1997,1998,1999,2000,2010,2011,2017
0,Average income,3076,4107,4745,4822,4691,4943,8726,9618,11589
1,Median income,2296,3135,3617,3692,3500,3607,5600,6307,8846


### Gross monthly income in Singapore

Next, we would like to obtain the table on median gross monthly income from earning from the following page: https://stats.mom.gov.sg/Pages/Income-Summary-Table.aspx

In [6]:
url = "https://stats.mom.gov.sg/Pages/Income-Summary-Table.aspx"

# Send a request and parse the HTML content
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Extract the table with the CSS selector
table = soup.select_one("#iMAS_SP_Summ")

# Convert table into data frame
df = pd.read_html(str(table), header = 1)[0]

In [7]:
df.head()

Unnamed: 0,Mid-Year,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Levels ($),3770,3949,4056,4232,4437,4563,4534,4680,5070,5197,5500


## Working with APIs
### Real median houshold income

In the code below, the API token is store in an `.env` file as `FRED_KEY`. 

We will need some additional libraries in this section: `python-dotenv` and `requests`.

In [8]:
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [9]:
import os
from dotenv import load_dotenv
import requests

In [10]:
load_dotenv()
api_key = os.getenv("FRED_KEY")

In [11]:
resource_url = "https://api.stlouisfed.org/fred/series/observations"
query_params = {
    "api_key": api_key,
    "series_id": "MEHOINUSA672N",
    "file_type": "json"
}

res = requests.get(resource_url, params = query_params)
res_json = res.json()
df_income = pd.DataFrame(res_json["observations"])

In [12]:
df_income.head()

Unnamed: 0,realtime_start,realtime_end,date,value
0,2025-01-28,2025-01-28,1984-01-01,58930
1,2025-01-28,2025-01-28,1985-01-01,60050
2,2025-01-28,2025-01-28,1986-01-01,62280
3,2025-01-28,2025-01-28,1987-01-01,63060
4,2025-01-28,2025-01-28,1988-01-01,63530


### Real-time carpark availability

Next, we query data from the LTA DataMall. Again, we need to obtain an API token and store it safely.

In [13]:
load_dotenv()
api_key = os.getenv("LTA_KEY")

In [14]:
import os
import requests
import pandas as pd

resource_url = "http://datamall2.mytransport.sg/ltaodataservice/CarParkAvailabilityv2"
headers = {
    "AccountKey": api_key,  # API Key for authentication
    "Accept": "application/json"
}

response = requests.get(resource_url, headers = headers)

# Parse JSON response
res_json = response.json()
df_carpark = pd.DataFrame(res_json["value"])
df_carpark.head()

Unnamed: 0,CarParkID,Area,Development,Location,AvailableLots,LotType,Agency
0,1,Marina,Suntec City,1.29375 103.85718,557,C,LTA
1,2,Marina,Marina Square,1.29115 103.85728,1239,C,LTA
2,3,Marina,Raffles City,1.29382 103.85319,515,C,LTA
3,4,Marina,The Esplanade,1.29011 103.85561,577,C,LTA
4,5,Marina,Millenia Singapore,1.29251 103.86009,562,C,LTA
