## üìò HDB Resale Flat Prices

### üìå Notebook Description

- **Team:** Team A  
- **Members:** Ben, Shazlin, Alan  
- **Project Name:** HDB Resale Flat Data Engineering Pipeline
- **Description:** Implements automated data ingestion from data.gov.sg and performs dataset merging to produce a unified, analysis-ready dataset.
- **Data Artifacts:**  
    - `/DataLake/<raw files>`  
    - `/Staging/Main.csv`

### üì¶ Import Required Libraries

In [1]:
from PSQL import PSQL
from sqlalchemy import text
import pandas as pd

#---Customized-----------------------------------------
import control_output
pd.set_option("display.float_format", "{:,.2f}".format)
control_output.css

### üß© Initialize Class Instance: PSQL

In [2]:
psql = PSQL()

Connected successfully!


### ‚ùó Available Tales

In [3]:
sql = text("""SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name;
""")

df_tables = psql.query(sql)
df_tables

Total Rows: 9


Unnamed: 0,table_name
1,births
2,divorces
3,gdp
4,inflation
5,main
6,marriages
7,stat_monthly
8,stat_yearly
9,unemployment


### ‚ùó Available Column in Table: main

In [4]:
sql = text("""
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'main'
ORDER BY ordinal_position;
""")

psql.query(sql)

Total Rows: 25


Unnamed: 0,column_name,data_type,is_nullable
1,year_month,timestamp without time zone,YES
2,town,text,YES
3,flat_type,text,YES
4,flat_model,text,YES
5,floor_area_sqm,double precision,YES
6,street_name,text,YES
7,resale_price,double precision,YES
8,lease_commence_date,bigint,YES
9,storey_range,text,YES
10,block,text,YES


### üìã Sample Data: Main Table

In [5]:
sql= text("SELECT * FROM main ;")

df = psql.query(sql)
#df["year_month"] = pd.to_datetime(df["year_month"], format="%Y-%m")
df = df.sort_values("year_month", ascending=True)
df = df.set_index("year_month")

df

Total Rows: 602221


Unnamed: 0_level_0,town,flat_type,flat_model,floor_area_sqm,street_name,resale_price,lease_commence_date,storey_range,block,remaining_lease,...,nearest_mrt,nearest_distance_to_mrt,remaining_years,price_per_sqm,birth,marriages,divorces,unemployment,inflation,gdp
year_month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-01,ANG MO KIO,3RM,IMPROVED,69.00,ANG MO KIO AVE 4,147000.00,1986,07 TO 09,170,85,...,mayflower,0.28,85,2130.43,3585,1602,396,3.60,1.34,96076539925.74
2000-01-01,TAMPINES,EX,PREMIUM APARTMENT,144.00,TAMPINES ST 45,550000.00,1996,10 TO 12,497F,95,...,tampines east,0.63,95,3819.44,3585,1602,396,3.60,1.34,96076539925.74
2000-01-01,TAMPINES,EX,Maisonette,146.00,TAMPINES ST 83,482000.00,1988,01 TO 03,857,87,...,tampines west,1.02,87,3301.37,3585,1602,396,3.60,1.34,96076539925.74
2000-01-01,ANG MO KIO,3RM,NEW GENERATION,67.00,ANG MO KIO AVE 10,161000.00,1979,10 TO 12,404,78,...,ang mo kio,1.08,78,2402.99,3585,1602,396,3.60,1.34,96076539925.74
2000-01-01,TAMPINES,EX,Maisonette,146.00,TAMPINES ST 82,580000.00,1995,07 TO 09,856D,94,...,tampines,0.72,94,3972.60,3585,1602,396,3.60,1.34,96076539925.74
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-01-01,BUKIT PANJANG,4RM,MODEL A,103.00,PENDING RD,450000.00,1988,01 TO 03,115,64 years 03 months,...,petir,0.29,64,4368.93,2858,1891,597,2.70,-1.30,505439514078.02
2023-01-01,BUKIT MERAH,3RM,IMPROVED,59.00,TELOK BLANGAH CRES,346000.00,1975,07 TO 09,22,51 years 06 months,...,harbourfront,1.26,51,5864.41,2858,1891,597,2.70,-1.30,505439514078.02
2023-01-01,BUKIT BATOK,4RM,Model A2,90.00,BT BATOK ST 25,420000.00,1998,01 TO 03,288C,74 years 08 months,...,bukit batok,1.20,74,4666.67,2858,1891,597,2.70,-1.30,505439514078.02
2023-01-01,BUKIT MERAH,4RM,MODEL A,100.00,JLN RUMAH TINGGI,690000.00,1997,16 TO 18,40,73 years 07 months,...,redhill,0.70,73,6900.00,2858,1891,597,2.70,-1.30,505439514078.02


### ‚ùó Ensure Index (year month) Sorted 

In [6]:
print(f"Mininum Date: {df.index.min()}")
print(f"Maximum Date: {df.index.max()}")
print()
for d in df.index[:5]:
    print(d)
print()
for d in df.index[-5:]:
    print(d)

Mininum Date: 2000-01-01 00:00:00
Maximum Date: 2023-01-01 00:00:00

2000-01-01 00:00:00
2000-01-01 00:00:00
2000-01-01 00:00:00
2000-01-01 00:00:00
2000-01-01 00:00:00

2023-01-01 00:00:00
2023-01-01 00:00:00
2023-01-01 00:00:00
2023-01-01 00:00:00
2023-01-01 00:00:00


### ‚ùó Dataframe Info

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 602221 entries, 2000-01-01 to 2023-01-01
Data columns (total 24 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   town                     602221 non-null  object 
 1   flat_type                602221 non-null  object 
 2   flat_model               602221 non-null  object 
 3   floor_area_sqm           602221 non-null  float64
 4   street_name              602221 non-null  object 
 5   resale_price             602221 non-null  float64
 6   lease_commence_date      602221 non-null  int64  
 7   storey_range             602221 non-null  object 
 8   block                    602221 non-null  object 
 9   remaining_lease          602221 non-null  object 
 10  address                  602221 non-null  object 
 11  full_address             602221 non-null  object 
 12  lat                      602221 non-null  float64
 13  long                     602221 non-null  f

### ‚ùó Indexes Created (Not Important)

In [8]:
sql = text("""
SELECT
    indexname,
    tablename,
    indexdef
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY tablename, indexname;
""")

psql.query(sql)

Total Rows: 9


Unnamed: 0,indexname,tablename,indexdef
1,ix_births_year_month,births,CREATE INDEX ix_births_year_month ON public.bi...
2,ix_divorces_year,divorces,CREATE INDEX ix_divorces_year ON public.divorc...
3,ix_gdp_year,gdp,CREATE INDEX ix_gdp_year ON public.gdp USING b...
4,ix_inflation_year,inflation,CREATE INDEX ix_inflation_year ON public.infla...
5,ix_main_year_month,main,CREATE INDEX ix_main_year_month ON public.main...
6,ix_marriages_year,marriages,CREATE INDEX ix_marriages_year ON public.marri...
7,ix_stat_monthly_year_month,stat_monthly,CREATE INDEX ix_stat_monthly_year_month ON pub...
8,ix_stat_yearly_year,stat_yearly,CREATE INDEX ix_stat_yearly_year ON public.sta...
9,ix_unemployment_year,unemployment,CREATE INDEX ix_unemployment_year ON public.un...


### ‚ùó Drop Table (Test Working)

In [9]:
def drop_tall_tables():
    for table in df_tables.table_name.values:
        sql=text(f"DROP TABLE IF EXISTS {table}");
        print(sql)
        psql.execute(sql)
        
### Careful to run this
### Careful to run this
### Careful to run this
### drop_tall_tables()