Skip to content

Mahdi-Moosa/US_Housing_Price_ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL pipeline for US Housing Price Data Warehouse (AWS S3 & Redshift)

Project goals & data source

This pipeline generates a Data Warehouse to perform analytics on US housing. The warehouse is designed to perform analytics based on zip codes. Data for the warehouse comes from five different sources:

  1. FHFA appraisal data (UAD Dataset).
  2. Realtor research data.
  3. Redfin research data.
  4. Zillow housing index.
  5. Zip code data from GeoNames.

Steps

  • Step 1: Prepare appraisal data for zip code-based query (extract data from different sources, transform and join/merge).
  • Step 2: Extract and transform realtor, redfin, zillow & zip code data.
  • Step 3: Merge/ join transformed FHFA appraisal data, realtor data, redfin data and zillow data to prepare master house price dataset.
  • Step 4: Prepare zip code dimension table.
  • Step 4: Load tables to AWS S3 bucket.
  • Step 5: Load tables to AWS redshift database.

Database is primarily designed to perform queries(and join) on the "zip" column of respective tables.

Tools/resources used in the project:

Schematic outline of the ETL process:

My Image

About

ETL of housing price data from different sources.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published