Skip to content

Crowd-Quest: ETL Journey for Crowdfunding Data is a repository showcasing the ETL (Extract, Transform, Load) process. It involves extracting data from Excel files, transforming it into CSV format, designing an ERD and database schema, and loading the data into PostgreSQL. Tools used: Jupyter Notebook, VSCode, PostgreSQL, Quick DBD, Excel.

Notifications You must be signed in to change notification settings

AnushDeCosta/Crowdfunding_ETL

Repository files navigation

Crowd-Quest: An ETL Journey for Crowdfunding Data

CrowdFund

Introduction

The Crowd-Quest is a concise venture that illustrates the ETL (Extract, Transform, Load) process, a prevalent data movement methodology. In this project, data will be extracted from two Excel files, transformed to yield four CSV files, an Entity Relationship Diagram (ERD) will be sketched, a database schema will be designed, and finally, data will be loaded into a PostgreSQL database.

Project Outline

The structure of this project can be segmented into four major components:

  • Data Extraction
  • Data Transformation
  • Data Loading
  • Drafting of Entity Relationship Diagram (ERD) and Database Schema Design

Data Extraction and Transformation

The initial step of the ETL process involves extracting data from the source. Here, we have two Excel files at hand: crowdfunding.xlsx and contacts.xlsx. The crowdfunding.xlsx file offers details on crowdfunding campaigns, while the contacts.xlsx file provides information about campaign contacts.

Data Transformation

Data extracted from the crowdfunding.xlsx and contacts.xlsx files is then transformed to the required format. The transformation of the crowdfunding data results in three dataframes: category, subcategory, and campaign. Unique category and subcategory values from the crowdfunding data are used to create the category and subcategory dataframes. The campaign dataframe is assembled by merging data from several columns in the crowdfunding data and changing the datatype of some columns. Transformation of contacts data is handled in two alternative ways. The first option iterates through each row of the contacts data, extracting values and building a new dataframe from them. The second option employs regular expressions to pick specific columns from the contacts data and generates a new dataframe from those columns.

Data Loading

In this project phase, the transformed data is funneled into a PostgreSQL database. A new database named "crowdfunding_db" is brought into existence, and the schema defined in the previous stage is employed to build the necessary tables. CSV files housing the transformed data are loaded into their matching tables via the COPY command in PostgreSQL. To verify the successful data loading into the database tables, a series of queries are executed.

Crafting of Entity Relationship Diagram (ERD) and Database Schema Design

This project segment involves the creation of an ERD based on the data extracted and transformed from Excel files, followed by the design of a database schema conforming to the ERD. The schema is subsequently used to create tables in a Postgres database, ready for the loading of transformed data.

Tools

  • Jupyter Notebook
  • VSCode
  • Quick DBD
  • PostgreSQL
  • pgAdmin4
  • Microsoft Excel

Files

Notes

  • The Tables were populated using the Import function in PostgreSQL

About

Crowd-Quest: ETL Journey for Crowdfunding Data is a repository showcasing the ETL (Extract, Transform, Load) process. It involves extracting data from Excel files, transforming it into CSV format, designing an ERD and database schema, and loading the data into PostgreSQL. Tools used: Jupyter Notebook, VSCode, PostgreSQL, Quick DBD, Excel.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published