# **Generation SG Junior Data Engineer Programme**
### **Interim Project presented by DPPS Team (5)**<br><span style="color:darkblue; font-weight:bold;">Members: Daniel | Pin Pin, Yvonne | Pin Yeen, Erica | Shawn</span>


<br /> <br />
## Data Source and Data Dictionary
___

### Data Source

|   | Data Source | Agency | Data Source Details                           | Format | Data Period     | Subsequent Frequency | URL                                                                          |
| - | ----------- | ------ | --------------------------------------------- | ------ | --------------- | -------------------- | ---------------------------------------------------------------------------- |
| 1 | data.gov.sg | NEA    | Air Temperature across Singapore              | API    | Oct'23 - Sep'24 | Monthly              | [Link](https://data.gov.sg/datasets/d_66b77726bbae1b33f218db60ff5861f0/view) |
| 2 | data.gov.sg | NEA    | Relative Humidity across Singapore            | API    | Oct'23 - Sep'24 | Monthly              | [Link](https://data.gov.sg/datasets/d_2d3b0c4da128a9a59efca806441e1429/view) |
| 3 | data.gov.sg | HDB    | Resale flat Prices Based On Registration Date | CSV    | Oct'23 - Sep'24 | Monthly              | [Link](https://data.gov.sg/datasets/d_8b84c4ee58e3cfc0ece0d773c8ca6abc/view) |

<br /> <br />
### Data Dictionary

#### Table Definition

| Table name      | Table description                                                                                                                                                                | Source                                | Ingestion Remarks                                                             |
| --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------- | ----------------------------------------------------------------------------- |
| air_temp        | Storing air temperature date time and readings                                                                                                                                   | Air Temperature API                   | Periodical ingestion via API                                                  |
| humidity        | Storing humidity date time and readings                                                                                                                                          | Relative Humidity API                 | Periodical ingestion via API                                                  |
| station         | Normalized and newly created dimension table to store air temperature and humidity station-related information. Attribute includes: station_name, latitude, longitude | Air Temperature/Relative Humidity API | Only update when there is new weather station added                          |
| locations       | Newly created table to map weather station location with HDB flat location                                                                                                       | \--                                   | Only update when there is new weather station added. Manual mapping required |
| resale_flat_txn | Storing resale flat price information                                                                                                                                            | Resale Flat Price API                 | Periodical ingestion via CSV                                                  |


<br /> <br />
#### Attribute Definition


| Table Name      | Column Name         | Description                                                          | UDT Name  | Max.Length | Data Type                   | Primary Key | Source                                |
| --------------- | ------------------- | -------------------------------------------------------------------- | --------- | ---------- | --------------------------- | ----------- | ------------------------------------- |
| air_temp        | airtemp_id          | id for air temperature                                               | int4      | NULL       | integer                     | YES         | Surrogate Key                         |
| air_temp        | station_id          | id for weather station                                               | varchar   | 4          | character varying           | NO          | Air Temperature API                   |
| air_temp        | temperature         | Air temperature in degree celcius                                    | numeric   | NULL       | numeric                     | NO          | Air Temperature API                   |
| air_temp        | airtemp_date        | Timestamp (per hour) of air temperature                              | timestamp | NULL       | timestamp without time zone | NO          | Air Temperature API                   |
| humidity        | humidity_id         | id for humidity reading                                              | int4      | NULL       | integer                     | YES         | Surrogate Key                        |
| humidity        | station_id          | id for weather station                                               | varchar   | 4          | character varying           | NO          | Relative Humidity API                 |
| humidity        | humidity_date       | Timestamp (per hour) of humidity reading                             | timestamp | NULL       | timestamp without time zone | NO          | Relative Humidity API                 |
| humidity        | humidity_readings   | Relative humidity (%)                                                | numeric   | NULL       | numeric                     | NO          | Relative Humidity API                 |
| locations       | town_name           | town name of resale flat                                             | varchar   | 20         | character varying           | YES         | Resale Flat Price CSV                 |
| locations       | station_id          | id for weather station                                               | varchar   | 4          | character varying           | NO          | Air Temperature/Relative Humidity API |
| resale_flat_txn | resale_id           | id for resale flat                                                   | int4      | NULL       | integer                     | YES         | Surrogate Key                         |
| resale_flat_txn | resale_date         | transaction date (YYYY-MM) of resale flat                            | date      | NULL       | date                        | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | town_name           | town name of resale flat                                             | varchar   | 20         | character varying           | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | flat_type           | flat type of resale flat                                             | varchar   | 20         | character varying           | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | block_no            | block number of resale flat                                          | varchar   | 5          | character varying           | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | street_name         | street name  of resale flat                                          | varchar   | 30         | character varying           | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | storey_range        | storey range of resale flat                                          | varchar   | 10         | character varying           | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | floor_area_sqm      | floor area in square meter                                           | float8    | NULL       | double precision            | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | flat_model          | flat model of resale flat                                            | varchar   | 30         | character varying           | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | lease_commence_year | lease commence year                                                  | int4      | NULL       | integer                     | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | remaining_lease     | remaining lease (years and months)                                   | varchar   | 20         | character varying           | NO          | Resale Flat Price CSV                 |
| resale_flat_txn | resale_price        | pricing of resale flat                                               | float8    | NULL       | double precision            | NO          | Resale Flat Price CSV                 |
| station         | station_id          | id for weather station                                               | varchar   | 4          | character varying           | YES         | Air Temperature/Relative Humidity API |
| station         | station_name        | name of weather station                                              | varchar   | 30         | character varying           | NO          | Air Temperature/Relative Humidity API |
| station         | latitude            | latitude of weather station                                          | numeric   | NULL       | numeric                     | NO          | Air Temperature/Relative Humidity API |
| station         | longitude           | longitude of weather station                                         | numeric   | NULL       | numeric                     | NO          | Air Temperature/Relative Humidity API |

<br /> <br />
### Locations Table Creation
___
In order to link air temperature and humidity tables (station) with resale flat price table (HDB town), we refer to Singapore district and manually map between station and HDB town. <br />
Reference: [https://sharonanngoh.com/useful-info/singapore-district-guide/](https://sharonanngoh.com/useful-info/singapore-district-guide/)

##### Mapping consideration: 
1. HDB town is at same district of the weather station.
2. If no weather station available at the same district, mapping is done based on distance estimation.

| town_name       | District of HDB town | District of weather station | station_name            |
| --------------- | -------------------- | --------------------------- | ----------------------- |
| ANG MO KIO      | 20                   | 20                          | Ang Mo Kio Avenue 5     |
| BEDOK           | 16                   | 16                          | East Coast Parkway      |
| BISHAN          | 20                   | 20                          | Ang Mo Kio Avenue 5     |
| BUKIT BATOK     | 23                   | (n/a) → map to 24           | Old Choa Chu Kang Road  |
| BUKIT MERAH     | 3                    | (n/a) → map to 4            | Sentosa                 |
| BUKIT PANJANG   | 23                   | (n/a) → map to 24           | Old Choa Chu Kang Road  |
| BUKIT TIMAH     | 10                   | (n/a) → map to 9            | Scotts Road             |
| CENTRAL AREA    | 1                    | 1                           | Marina Gardens Drive    |
| CHOA CHU KANG   | 24                   | 24                          | Old Choa Chu Kang Road  |
| CLEMENTI        | 21                   | 21                          | Clementi Road           |
| GEYLANG         | 14                   | (n/a)  → map to 19          | Kim Chuan Road          |
| HOUGANG         | 19                   | 19                          | Kim Chuan Road          |
| JURONG EAST     | 22                   | 22                          | Banyan Road             |
| JURONG WEST     | 22                   | 22                          | Nanyang Avenue          |
| KALLANG/WHAMPOA | 15                   | (n/a)  → map to 1           | Marina Gardens Drive    |
| MARINE PARADE   | 15                   | (n/a)  → map to 1           | Marina Gardens Drive    |
| PASIR RIS       | 18                   | (n/a) → map to 17           | Upper Changi Road North |
| PUNGGOL         | 19                   | 19                          | Kim Chuan Road          |
| QUEENSTOWN      | 3                    | (n/a) → map to 5            | West Coast Highway      |
| SEMBAWANG       | 27                   | (n/a) → map to 25           | Woodlands Avenue 9      |
| SENGKANG        | 19                   | 19                          | Kim Chuan Road          |
| SERANGOON       | 8                    | (n/a)  → map to 19          | Kim Chuan Road          |
| TAMPINES        | 18                   | (n/a) → map to 17           | Upper Changi Road North |
| TOA PAYOH       | 12                   | (n/a)  → map to 19          | Kim Chuan Road          |
| WOODLANDS       | 25                   | 25                          | Woodlands Avenue 9      |
| YISHUN          | 27                   | (n/a) → map to 25           | Woodlands Avenue 9      |


Note: Tuas Avenue 3, Semakau Landfill and Pulau Ubin stations have no HDB town to map with.