This pipeline generates a Data Warehouse to perform analytics on US housing. The warehouse is designed to perform analytics based on zip codes. Data for the warehouse comes from five different sources:
- FHFA appraisal data (UAD Dataset).
- Realtor research data.
- Redfin research data.
- Zillow housing index.
- Zip code data from GeoNames.
- Step 1: Prepare appraisal data for zip code-based query (extract data from different sources, transform and join/merge).
- Step 2: Extract and transform realtor, redfin, zillow & zip code data.
- Step 3: Merge/ join transformed FHFA appraisal data, realtor data, redfin data and zillow data to prepare master house price dataset.
- Step 4: Prepare zip code dimension table.
- Step 4: Load tables to AWS S3 bucket.
- Step 5: Load tables to AWS redshift database.
Database is primarily designed to perform queries(and join) on the "zip" column of respective tables.