## üìò HDB Resale Flat Prices

### üìå Notebook Description

- **Team:** Team A  
- **Members:** Ben, Shazlin, Alan  
- **Project Name:** HDB Resale Flat Data Engineering Pipeline
- **Description:** Implements automated data ingestion from data.gov.sg and performs dataset merging to produce a unified, analysis-ready dataset.
- **Data Artifacts:**  
    - `/DataLake/<raw files>`  
    - `/Staging/Main.csv`

### üõ†Ô∏è Installation

In [1]:
#!pip install sqlalchemy psycopg2-binary

### üì¶ Import Required Libraries

In [2]:
from sqlalchemy import create_engine
from sqlalchemy import text
import pandas as pd
from PSQL import PSQL

#---Customized-----------------------------------------
import control_output
pd.set_option("display.float_format", "{:,.2f}".format)
control_output.css

### üß© Initialize Class Instance: PSQL

In [3]:
psql=PSQL()

#sql=text("CREATE DATABASE hdb;")
#psql.execute(sql)

Connected successfully!


### ‚öôÔ∏è Define Function

In [4]:
def populate_table(data_file, info):

    index_name = info['index_name']
    table_name = info['table_name']

    if "/" not in data_file:
        data_file = f"../Project-HDB-Store/staging/{data_file}"

    df = pd.read_csv(data_file,
                     low_memory=False,
                     parse_dates=[index_name],   # convert to datetime during read
                     index_col=index_name)

    # Create engine
    engine = create_engine(psql.connection_url)
    
    # Insert into table
    df.to_sql(table_name, engine, if_exists="replace", index=True, index_label=index_name)

    #sql = f"CREATE INDEX idx_{table_name}_{index_name} ON {table_name} ({index_name})";
    #sql = text(sql)
    #psql.execute(sql)
    #print(sql)

    sql = text(f"SELECT count(*) FROM {table_name}")
    result = psql.query(sql)
    counts = result.iloc[0].values[0]

    print(f"CSV: {data_file:24} imported successfully. Number of records : {counts:>6}, {table_name}:{index_name}")
    print("-" * 125)


### ‚ñ∂Ô∏è Execute File Processor 1: **Populate Tables**

In [5]:
datasets = {
    'stat_monthly.csv': {'table_name': 'stat_monthly', 'index_name': 'year_month'},
    'stat_yearly.csv':  {'table_name': 'stat_yearly', 'index_name': 'year'},
    'Main_final.csv': {'table_name': 'main', 'index_name': 'year_month'}
}

for datafile, info in datasets.items():
    populate_table(datafile, info)

Total Rows: 1
CSV: ../Project-HDB-Store/staging/stat_monthly.csv imported successfully. Number of records :    288, stat_monthly:year_month
-----------------------------------------------------------------------------------------------------------------------------
Total Rows: 1
CSV: ../Project-HDB-Store/staging/stat_yearly.csv imported successfully. Number of records :     24, stat_yearly:year
-----------------------------------------------------------------------------------------------------------------------------
Total Rows: 1
CSV: ../Project-HDB-Store/staging/Main_final.csv imported successfully. Number of records : 602221, main:year_month
-----------------------------------------------------------------------------------------------------------------------------


### ‚ñ∂Ô∏è Execute File Processor 2: **Populate Tables**

In [6]:
files = {
    'Births.csv': {'table_name': 'births', 'index_name': 'year_month'},
    'gdp.csv': {'table_name': 'gdp', 'index_name': 'year'},
    'Marriages.csv': {'table_name': 'marriages', 'index_name': 'year'},
    'Divorces.csv': {'table_name': 'divorces', 'index_name': 'year'},
    'Inflation.csv': {'table_name': 'inflation', 'index_name': 'year'},
    'unemployment.csv': {'table_name': 'unemployment', 'index_name': 'year'},
}

for datafile, info in files.items():
    filename= f'../Project-HDB-Store/working/{datafile}'
    print(filename, info)
    populate_table(filename, info)

../Project-HDB-Store/working/Births.csv {'table_name': 'births', 'index_name': 'year_month'}
Total Rows: 1
CSV: ../Project-HDB-Store/working/Births.csv imported successfully. Number of records :    288, births:year_month
-----------------------------------------------------------------------------------------------------------------------------
../Project-HDB-Store/working/gdp.csv {'table_name': 'gdp', 'index_name': 'year'}
Total Rows: 1
CSV: ../Project-HDB-Store/working/gdp.csv imported successfully. Number of records :     24, gdp:year
-----------------------------------------------------------------------------------------------------------------------------
../Project-HDB-Store/working/Marriages.csv {'table_name': 'marriages', 'index_name': 'year'}
Total Rows: 1
CSV: ../Project-HDB-Store/working/Marriages.csv imported successfully. Number of records :     24, marriages:year
-------------------------------------------------------------------------------------------------------------