# **Predictive Modelling of HDB Resale Prices in Relation to Urban Accessibility and Macroeconomic Trends**
## **SC3021 Project - AY25/26 Semester 2**
## **BACF1 Team 1**
Kwok Weng Jian (U2510454J)  
Chew En Yu (U2510555H)  
Nguyen Hoang Duong (U2510950H)

# **Introduction**
The public housing market in Singapore serves as a critical pillar of national stability, housing over 80% of the resident population. However, the HDB resale market operates on open market principles, where transaction prices fluctuate significantly based on intrinsic property attributes, locational convenience, and macroeconomic conditions. Understanding the valuation dynamics of resale flats is essential for prospective homeowners, urban planners, and policymakers to navigate housing affordability and asset progression.

# **Project Goal**
This project aims to engineer a predictive model for HDB resale prices by analyzing the correlation between three primary dimensions:

* Structural Attributes (Age, floor area, lease remainder)

* Locational Utility (Proximity to transport and top-tier education)
* Macroeconomic Trends (Inflation and purchasing power)

##**Our Hypothesis**
* Locational accessibility such as proximity to MRT nodes and primary schools is positively correlated with resale valuation.

* Remaining lease tenure is positively correlated with price.

* Inflationary pressure (CPI) significantly distorts historical pricing, requiring normalization to compare value over time accurately.

<br>

####To validate these hypotheses and construct a robust pricing model, we have curated a comprehensive data architecture comprising six distinct components. These datasets were selected to ensure our model captures both the physical reality of the housing units and the economic context in which they are traded.<br>
<br>


###**Dataset 1: The Core Dataset**
This dataset forms the foundation of our regression analysis. It contains granular transaction data from 1990 to the present, including Price, Floor Area, Flat Type, and Lease Commencement Date.

We chose this dataset as it provides the Dependent Variable (Resale Price) essential for training our predictive model.

Source: [Resale Flat Prices](https://data.gov.sg/datasets?query=Resale+Flat+Prices&resultId=d_8b84c4ee58e3cfc0ece0d773c8ca6abc)

###**Dataset 2: Physical Building Details**
The core transaction data lacks specific structural metadata necessary for a granular assessment. Therefore, we chose this dataset as it offers a structural profile for every HDB block in Singapore, including critical features such as "Year Completed," "Max Floor Level," and "Total Dwelling Units."

By merging this with the core dataset, we can engineer high-value features such as Remaining Lease and Storey Range (e.g., differentiating high-rise premiums), thereby significantly improving prediction accuracy.

Source: [HDB Property Information](https://data.gov.sg/datasets?query=HDB+Property+Information&resultId=d_17f5382f26140b1fdae0ba2ef6239d2f)

###**Dataset 3: Transport Accessibility**
In Singaporeâ€™s dense urban fabric, connectivity is a primary driver of real estate value. We chose this dataset as this provides the precise geospatial coordinates of all MRT station exits across the island.

Therefore, this allows us to calculate the walking distance (in km) between a target HDB block and the nearest MRT, quantifying the convenience of accessibility to the nearest MRT.

Source: [LTA MRT Station Exit (GeoJSON)](https://data.gov.sg/datasets?query=LTA+MRT+Station+Exit&resultId=d_b39d3a0871985372d7e1637193335da5)


###**Dataset 4: Education Accessibility**
Proximity to educational institutions, particularly the "1km radius" rule for primary school registration, exerts strong pressure on housing demand. The dataset contains the official addresses and postal codes for all Primary, Secondary, and Junior College institutions.

This enables the creation of a "School Proximity" feature, quantifying the distance to the nearest educational institution.

Source: [General information of schools](https://data.gov.sg/datasets?query=General+Information+of+Schools&resultId=d_688b934f82c1059ed0a6993d2a829089)


###**Dataset 5: Economic Context**
Comparing a flat sold in 2010 to one sold in 2024 requires adjusting for the changing value of money. This dataset inludes the weighted average change in prices of a basket of consumer goods and services, serving as a key information for inflation.

We intend to use this dataset to adjust our model according to inflation rates allowing for a fair comparison of value across different economic eras.

Source: [Consumer Price Index - Singstat](https://tablebuilder.singstat.gov.sg/table/TS/M213751)


###**Dataset 6: Geospatial Mapping**
To bridge the gap between textual addresses and mathematical distance calculations, we leverage dynamic API integration. This API offers the most accurate geocoding services available.

We chose this dataset to satisfy the requirement for data engineering skills. We inten to utilize the Search API to convert HDB street addresses (e.g., "Block 105 Ang Mo Kio") into coordinates, enabling the precise distance calculations required for the transport and education features.

Source: [OneMap API](https://www.onemap.gov.sg/apidocs/maps)