# **Predictive Modelling of HDB Resale Prices in Relation to Urban Accessibility and Macroeconomic Trends**
## **SC3021 Project - AY25/26 Semester 2**
## **BACF1 Team 1**
Kwok Weng Jian (U2510454J)  
Chew En Yu (U2510555H)  
Nguyen Hoang Duong (U2510950H)

# **Introduction**
The public housing market in Singapore serves as a critical pillar of national stability, housing over 80% of the resident population. However, the HDB resale market operates on open market principles, where transaction prices fluctuate significantly based on intrinsic property attributes, locational convenience, and macroeconomic conditions. Understanding the valuation dynamics of resale flats is essential for prospective homeowners, urban planners, and policymakers to navigate housing affordability and asset progression.

# **Project Goal**
This project aims to engineer a predictive model for HDB resale prices by analyzing the correlation between three primary dimensions:

* Structural Attributes (Age, floor area, lease remainder)

* Locational Utility (Proximity to transport and top-tier education)
* Macroeconomic Trends (Inflation and purchasing power)

##**Our Hypothesis**
* Locational accessibility such as proximity to MRT nodes and primary schools is positively correlated with resale valuation.

* Remaining lease tenure is positively correlated with price.

* Inflationary pressure (CPI) significantly distorts historical pricing, requiring normalization to compare value over time accurately.

<br>

####To validate these hypotheses and construct a robust pricing model, we have curated a comprehensive data architecture comprising six distinct components. These datasets were selected to ensure our model captures both the physical reality of the housing units and the economic context in which they are traded.<br>
<br>


###**Dataset 1: The Core Dataset**
This dataset forms the foundation of our regression analysis. It contains granular transaction data from 1990 to the present, including Price, Floor Area, Flat Type, and Lease Commencement Date.

We chose this dataset as it provides the Dependent Variable (Resale Price) essential for training our predictive model.

Source: [Resale Flat Prices](https://data.gov.sg/datasets?query=Resale+Flat+Prices&resultId=d_8b84c4ee58e3cfc0ece0d773c8ca6abc)

###**Dataset 2: Physical Building Details**
The core transaction data lacks specific structural metadata necessary for a granular assessment. Therefore, we chose this dataset as it offers a structural profile for every HDB block in Singapore, including critical features such as "Year Completed," "Max Floor Level," and "Total Dwelling Units."

By merging this with the core dataset, we can engineer high-value features such as Remaining Lease and Storey Range (e.g., differentiating high-rise premiums), thereby significantly improving prediction accuracy.

Source: [HDB Property Information](https://data.gov.sg/datasets?query=HDB+Property+Information&resultId=d_17f5382f26140b1fdae0ba2ef6239d2f)

###**Dataset 3: Transport Accessibility**
In Singaporeâ€™s dense urban fabric, connectivity is a primary driver of real estate value. We chose this dataset as this provides the precise geospatial coordinates of all MRT station exits across the island.

Therefore, this allows us to calculate the walking distance (in km) between a target HDB block and the nearest MRT, quantifying the convenience of accessibility to the nearest MRT.

Source: [LTA MRT Station Exit (GeoJSON)](https://data.gov.sg/datasets?query=LTA+MRT+Station+Exit&resultId=d_b39d3a0871985372d7e1637193335da5)


###**Dataset 4: Education Accessibility**
Proximity to educational institutions, particularly the "1km radius" rule for primary school registration, exerts strong pressure on housing demand. The dataset contains the official addresses and postal codes for all Primary, Secondary, and Junior College institutions.

This enables the creation of a "School Proximity" feature, quantifying the distance to the nearest educational institution.

Source: [General information of schools](https://data.gov.sg/datasets?query=General+Information+of+Schools&resultId=d_688b934f82c1059ed0a6993d2a829089)


###**Dataset 5: Economic Context**
Comparing a flat sold in 2010 to one sold in 2024 requires adjusting for the changing value of money. This dataset inludes the weighted average change in prices of a basket of consumer goods and services, serving as a key information for inflation.

We intend to use this dataset to adjust our model according to inflation rates allowing for a fair comparison of value across different economic eras.

Source: [Consumer Price Index - Singstat](https://tablebuilder.singstat.gov.sg/table/TS/M213751)


###**Dataset 6: Geospatial Mapping**
To bridge the gap between textual addresses and mathematical distance calculations, we leverage dynamic API integration. This API offers the most accurate geocoding services available.

We chose this dataset to satisfy the requirement for data engineering skills. We inten to utilize the Search API to convert HDB street addresses (e.g., "Block 105 Ang Mo Kio") into coordinates, enabling the precise distance calculations required for the transport and education features.

Source: [OneMap API](https://www.onemap.gov.sg/apidocs/maps)

In [None]:
# Importing Dataset 1 - The Core Dataset
import pandas as pd

df_dataset1 = pd.read_csv('/Dataset 1.csv')
print(df_dataset1)

          month        town  flat_type block        street_name storey_range  \
0       2017-01  ANG MO KIO     2 ROOM   406  ANG MO KIO AVE 10     10 TO 12   
1       2017-01  ANG MO KIO     3 ROOM   108   ANG MO KIO AVE 4     01 TO 03   
2       2017-01  ANG MO KIO     3 ROOM   602   ANG MO KIO AVE 5     01 TO 03   
3       2017-01  ANG MO KIO     3 ROOM   465  ANG MO KIO AVE 10     04 TO 06   
4       2017-01  ANG MO KIO     3 ROOM   601   ANG MO KIO AVE 5     01 TO 03   
...         ...         ...        ...   ...                ...          ...   
224215  2026-01      YISHUN  EXECUTIVE   325        YISHUN CTRL     10 TO 12   
224216  2026-01      YISHUN  EXECUTIVE   325        YISHUN CTRL     04 TO 06   
224217  2026-01      YISHUN  EXECUTIVE   360     YISHUN RING RD     07 TO 09   
224218  2026-01      YISHUN  EXECUTIVE   643       YISHUN ST 61     10 TO 12   
224219  2026-01      YISHUN  EXECUTIVE   643       YISHUN ST 61     04 TO 06   

        floor_area_sqm      flat_model 

In [None]:
# Importing Dataset 2 - Physical Building Details
df_dataset2 = pd.read_csv('/Dataset 2.csv')
print(df_dataset2)

      blk_no             street  max_floor_lvl  year_completed residential  \
0          1           BEACH RD             16            1970           Y   
1          1    BEDOK STH AVE 1             14            1975           Y   
2          1      CANTONMENT RD              2            2010           N   
3          1       CHAI CHEE RD             15            1982           Y   
4          1  CHANGI VILLAGE RD              4            1975           Y   
...      ...                ...            ...             ...         ...   
13262   998A      BUANGKOK CRES             18            2018           Y   
13263   998B      BUANGKOK CRES             17            2018           Y   
13264    999      BUANGKOK CRES              2            2018           N   
13265   999A      BUANGKOK CRES             18            2018           Y   
13266   999B      BUANGKOK CRES             17            2018           Y   

      commercial market_hawker miscellaneous multistorey_carpar

In [None]:
# Importing Dataset 3 - Transport Accessibility (GeoJSON)
import sys
!{sys.executable} -m pip install geopandas fiona pyproj shapely rtree

import geopandas as gpd
df_dataset3 = gpd.read_file('/Dataset 3.geojson')
print(df_dataset3.head())

   OBJECTID                 STATION_NA EXIT_CODE           INC_CRC  \
0     17885    BRIGHT HILL MRT STATION    Exit 1  CFB9350F44E8991A   
1     17886    BRIGHT HILL MRT STATION    Exit 2  F273E9D550B06062   
2     17887    BRIGHT HILL MRT STATION    Exit 4  BC4A442B12F99E85   
3     17888    BRIGHT HILL MRT STATION    Exit 3  2A45076ED986B275   
4     17889  UPPER THOMSON MRT STATION    Exit 2  3EA737829473A972   

       FMEL_UPD_D                   geometry  
0  20251202172807  POINT (103.83364 1.36404)  
1  20251202172807  POINT (103.83351 1.36328)  
2  20251202172807  POINT (103.83195 1.36316)  
3  20251202172807  POINT (103.83238 1.36218)  
4  20251202172807  POINT (103.83183 1.35574)  


In [None]:
# Importing Dataset 4 - Education Accessibility
df_dataset4 = pd.read_csv('/Dataset 4.csv')
print(df_dataset4)

                        school_name                            url_address  \
0          ADMIRALTY PRIMARY SCHOOL       https://admiraltypri.moe.edu.sg/   
1        ADMIRALTY SECONDARY SCHOOL     http://www.admiraltysec.moe.edu.sg   
2      AHMAD IBRAHIM PRIMARY SCHOOL  http://www.ahmadibrahimpri.moe.edu.sg   
3    AHMAD IBRAHIM SECONDARY SCHOOL  http://www.ahmadibrahimsec.moe.edu.sg   
4                    AI TONG SCHOOL           http://www.aitong.moe.edu.sg   
..                              ...                                    ...   
332          ZHANGDE PRIMARY SCHOOL      http://www.zhangdepri.moe.edu.sg/   
333         ZHENGHUA PRIMARY SCHOOL      http://www.zhenghuapri.moe.edu.sg   
334       ZHENGHUA SECONDARY SCHOOL      http://www.zhenghuasec.moe.edu.sg   
335         ZHONGHUA PRIMARY SCHOOL      http://www.zhonghuapri.moe.edu.sg   
336       ZHONGHUA SECONDARY SCHOOL     https://www.zhonghuasec.moe.edu.sg   

                           address  postal_code telephone_no te

In [None]:
# Importing Dataset 5 - Economic Context
df_dataset5 = pd.read_excel('/Dataset 5.xlsx')
print(df_dataset5)

   Unnamed: 0            Unnamed: 1  \
0         NaN                   NaN   
1         NaN                   NaN   
2         NaN                   NaN   
3     Content                   NaN   
4         No.               Subject   
5           1  Consumer Price Index   
6           2  Consumer Price Index   
7           3  Consumer Price Index   
8           4  Consumer Price Index   
9           5  Consumer Price Index   
10          6  Consumer Price Index   
11          7  Consumer Price Index   
12          8  Consumer Price Index   
13          9  Consumer Price Index   
14         10  Consumer Price Index   
15         11  Consumer Price Index   
16         12  Consumer Price Index   
17         13  Consumer Price Index   
18         14  Consumer Price Index   
19         15  Consumer Price Index   

                                           Unnamed: 2  
0                                                 NaN  
1                                                 NaN  
2           

In [None]:
# Importing Dataset 6 - OneMap API
import requests
import time

def get_coordinates(postal_code):
    """
    Get lat/long from OneMap API using postal code
    """
    url = f"https://www.onemap.gov.sg/api/common/elastic/search"
    params = {
        'searchVal': postal_code,
        'returnGeom': 'Y',
        'getAddrDetails': 'Y'
    }

    try:
        response = requests.get(url, params=params)
        data = response.json()

        if data['found'] > 0:
            result = data['results'][0]
            return {
                'latitude': float(result['LATITUDE']),
                'longitude': float(result['LONGITUDE']),
                'address': result['ADDRESS']
            }
        else:
            return None
    except:
        return None

# Example
postal = "520123"
coords = get_coordinates(postal)
print(coords)

{'latitude': 1.34530661835845, 'longitude': 103.953187956159, 'address': '123 SIMEI STREET 1 SINGAPORE 520123'}
