# **Generation SG Junior Data Engineer Programme**
### **Interim Project presented by DPPS Team (5)**<br><span style="color:darkblue; font-weight:bold;">Members: Daniel | Pin Pin, Yvonne | Pin Yeen, Erica | Shawn</span>


<br /> <br />
## Data Preparation | Data Loading
___

### Data Preparation
After designing our database model, create tables at PostgreSQL to ensure data type and referential integrity.

#### Create Tables

```sql
CREATE TABLE station (
station_id VARCHAR(4) PRIMARY KEY,
station_name VARCHAR(30) NOT NULL,
latitude DECIMAL NOT NULL,
longitude DECIMAL NOT NULL);

CREATE TABLE locations (
town_name VARCHAR(20) PRIMARY KEY,
station_id VARCHAR(4) NOT NULL,
FOREIGN KEY (station_id) REFERENCES station(station_id)
);

CREATE TABLE air_temp (
airtemp_id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
station_id VARCHAR(4) NOT NULL,
temperature DECIMAL(3,1) NOT NULL,
airtemp_date TIMESTAMP NOT NULL,
FOREIGN KEY (station_id) REFERENCES station(station_id)
);

CREATE TABLE resale_flat_txn (
resale_id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
resale_date DATE NOT NULL,
town_name VARCHAR(20) NOT NULL,
flat_type VARCHAR(20) NOT NULL,
block_no VARCHAR(5) NOT NULL,
street_name VARCHAR(30) NOT NULL,
storey_range VARCHAR(10) NOT NULL,
floor_area_sqm FLOAT NOT NULL,
flat_model VARCHAR(30) NOT NULL,
lease_commence_year INTEGER NOT NULL,
remaining_lease	VARCHAR(20) NOT NULL,
resale_price FLOAT NOT NULL,
FOREIGN KEY (town_name) REFERENCES locations(town_name));

CREATE TABLE humidity (
humidity_id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
station_id VARCHAR(4) NOT NULL,
humidity_date TIMESTAMP NOT NULL,
humidity_readings DECIMAL(3,1) NOT NULL,
FOREIGN KEY (station_id) REFERENCES station(station_id)
);
```

#### Insert Data into station and locations Tables

For the initial setup, we prepare the station dimension table and the locations mapping table for a one-time insertion to avoid dependency complexities during the load stage.

```sql
INSERT INTO Station VALUES ('S109','Ang Mo Kio Avenue 5',1.3764,103.8492);
INSERT INTO Station VALUES ('S117','Banyan Road',1.256,103.679);
INSERT INTO Station VALUES ('S50','Clementi Road',1.3337,103.7768);
INSERT INTO Station VALUES ('S107','East Coast Parkway',1.3135,103.9625);
INSERT INTO Station VALUES ('S43','Kim Chuan Road',1.3399,103.8878);
INSERT INTO Station VALUES ('S108','Marina Gardens Drive',1.2799,103.8703);
INSERT INTO Station VALUES ('S44','Nanyang Avenue',1.34583,103.68166);
INSERT INTO Station VALUES ('S121','Old Choa Chu Kang Road',1.37288,103.72244);
INSERT INTO Station VALUES ('S111','Scotts Road',1.31055,103.8365);
INSERT INTO Station VALUES ('S60','Sentosa',1.25,103.8279);
INSERT INTO Station VALUES ('S24','Upper Changi Road North',1.3678,103.9826);
INSERT INTO Station VALUES ('S116','West Coast Highway',1.281,103.754);
INSERT INTO Station VALUES ('S104','Woodlands Avenue 9',1.44387,103.78538);
INSERT INTO Station VALUES ('S102','Semakau Landfill',1.20525,103.77117);
INSERT INTO Station VALUES ('S115','Tuas South Avenue 3',1.29377,103.68166);
INSERT INTO Station VALUES ('S106','Pulau Ubin',1.412382,103.958509);

INSERT INTO locations VALUES ('ANG MO KIO', 'S109');
INSERT INTO locations VALUES ('BEDOK', 'S107');
INSERT INTO locations VALUES ('BISHAN', 'S109');
INSERT INTO locations VALUES ('BUKIT BATOK', 'S121');
INSERT INTO locations VALUES ('BUKIT MERAH', 'S60');
INSERT INTO locations VALUES ('BUKIT PANJANG', 'S121');
INSERT INTO locations VALUES ('BUKIT TIMAH', 'S111');
INSERT INTO locations VALUES ('CENTRAL AREA', 'S108');
INSERT INTO locations VALUES ('CHOA CHU KANG', 'S121');
INSERT INTO locations VALUES ('CLEMENTI', 'S50');
INSERT INTO locations VALUES ('GEYLANG', 'S43');
INSERT INTO locations VALUES ('HOUGANG', 'S43');
INSERT INTO locations VALUES ('JURONG EAST', 'S117');
INSERT INTO locations VALUES ('JURONG WEST', 'S44');
INSERT INTO locations VALUES ('KALLANG/WHAMPOA', 'S108');
INSERT INTO locations VALUES ('MARINE PARADE', 'S108');
INSERT INTO locations VALUES ('PASIR RIS', 'S24');
INSERT INTO locations VALUES ('PUNGGOL', 'S43');
INSERT INTO locations VALUES ('QUEENSTOWN', 'S116');
INSERT INTO locations VALUES ('SEMBAWANG', 'S104');
INSERT INTO locations VALUES ('SENGKANG', 'S43');
INSERT INTO locations VALUES ('SERANGOON', 'S43');
INSERT INTO locations VALUES ('TAMPINES', 'S24');
INSERT INTO locations VALUES ('TOA PAYOH', 'S43');
INSERT INTO locations VALUES ('WOODLANDS', 'S104');
INSERT INTO locations VALUES ('YISHUN', 'S104');

#### Data Loading

Once all tables are set up, the fact table data (including air temperature, humidity, and resale flat price) can be loaded into PostgreSQL using Python, as detailed in the EXTRACT section of the documentation.