Developed and automated data pipelines to ingest, transform, and generate daily revenue reports from PostgreSQL databases to a data lake HDFS and data warehouse HIVE using Pyspark and Apache Airflow.
Set up postgres data and create users, user_details, orders, order_details, products, product_inventories table
Get env example in project and update it.
python src/jobs/ingest.py --table_name=users --execution_date=2024-06-01
python src/jobs/transform.py --execution_date=2024-06-01
src/flows/dag.py