# Analyzing Housing and Development Board (HDB) datasets

## Details
Name: Reuben Goh  
Adm Num: P2205711  
Class: EP0302 04

## URLs of Datasets Chosen
1. [HDB Property Information](https://beta.data.gov.sg/datasets/d_17f5382f26140b1fdae0ba2ef6239d2f/view)
2. [Housing And Development Board Resale Price Index (1Q2009 = 100), Quarterly](https://beta.data.gov.sg/datasets/d_14f63e595975691e7c24a27ae4c07c79/view)
3. [Resale Flat Prices](https://beta.data.gov.sg/collections/189/view) (used all datasets in this collection to obtain data from 1990-2024)

# NOTES (to be deleted at the end)

must have missing value analysis

Urban Planning:

    High-density towns may require more resources and infrastructure development.
    Towns with lower densities could be targeted for future housing projects.

Market Trends:

    Historical trends in resale prices can inform investment strategies and policy decisions.
    Significant price shifts may indicate market cycles or impacts of policy changes.

Housing Preferences:

    Understanding the distribution of flat types can guide future construction projects to match demand.
    Policies can be tailored to ensure a balanced supply of different flat types.

Property Value:

    Insights into how property age affects value can assist buyers and investors in making informed decisions.
    Highlighting outliers in resale prices can identify opportunities or risks in the market.

Investment Strategies:

    Detailed resale price distributions by flat type can help buyers and sellers understand market expectations.
    Identifying trends in resale prices can guide strategic investment in specific flat types.

In [58]:
# Imports and Setup
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

dataset_path = "./datasets"

### Dataset 1 (HDB Property Information):

In [59]:
try:
  property_info = pd.read_csv(os.path.join(dataset_path, "HDBPropertyInformation.csv"))

  print(f"Shape of dataset:\n{property_info.shape}\n")
  print(f"Index of dataset:\n{property_info.index}\n")
  print(f"List of columns:\n{property_info.columns}\n")
  print(f"Total number of non-NA values in dataset:\n{property_info.count()}\n")

  # summary
  print(f"Dataset summary:\n{property_info.}\n")

except Exception as e:
  print("Error while reading dataset:")
  print(e)


Shape of dataset:
(12877, 24)

Index of dataset:
RangeIndex(start=0, stop=12877, step=1)

List of columns:
Index(['blk_no', 'street', 'max_floor_lvl', 'year_completed', 'residential',
       'commercial', 'market_hawker', 'miscellaneous', 'multistorey_carpark',
       'precinct_pavilion', 'bldg_contract_town', 'total_dwelling_units',
       '1room_sold', '2room_sold', '3room_sold', '4room_sold', '5room_sold',
       'exec_sold', 'multigen_sold', 'studio_apartment_sold', '1room_rental',
       '2room_rental', '3room_rental', 'other_room_rental'],
      dtype='object')

Total number of non-NA values in dataset:
blk_no                   12877
street                   12877
max_floor_lvl            12877
year_completed           12877
residential              12877
commercial               12877
market_hawker            12877
miscellaneous            12877
multistorey_carpark      12877
precinct_pavilion        12877
bldg_contract_town       12877
total_dwelling_units     12877
1room_sold 

### Dataset 2 (Housing and Development Board Resale Price Index):

### Dataset 3 (Resale Flat Prices):

### Urban Planning (Bar Graph)

In [60]:
# Bar Graph

### Market Trends (Line Graph)

In [61]:
# Line Graph

### Housing Preferences (Pie Chart)

In [62]:
# Pie Chart

### Property Value (Scatter Plot)

In [63]:
# Scatter Plot

### Investment Strategies (Box Plot)

In [64]:
# Box Plot