The objective of this project is to orchestarate a data pipeline using Airflow which runs in docker to acquire data from a subreddit - r/dataengineering using reddit's API, cleanse acquired data and finally load reporting level data to Amazon RDS MySQL table.
- Airflow: Workflow orchestration management platform
- AWS S3: Object storage service to store raw, cleansed and aggregated formats of data
- AWS RDS: Relational data service to store final aggregated - reporting layer data in a table
- AWS IAM: Identity and Access management service to create roles to access AWS S3
![Reddit_datapipeline](https://private-user-images.githubusercontent.com/64268620/309633873-fdd383da-60c6-4b45-9c2c-f3f3b4053764.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE1NDgzOTYsIm5iZiI6MTcyMTU0ODA5NiwicGF0aCI6Ii82NDI2ODYyMC8zMDk2MzM4NzMtZmRkMzgzZGEtNjBjNi00YjQ1LTljMmMtZjNmM2I0MDUzNzY0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIxVDA3NDgxNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc2NDY4MGRmMzVhZWYzZDI4MDgwZWRmMTEwNDIwZjUwMTgzZmE2NDM0MmIyZjdlNjE5YTU0YmE1NDBhNzY4ZGYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.IgHgQ0tfJR1-JDBjg4C9LQrKOcAmGKLzw1Xal9gmR5I)
![Screenshot 2024-03-02 at 12 23 57 PM](https://private-user-images.githubusercontent.com/64268620/309633895-15ce5c5c-a39a-46d7-95b4-ddf28ce659bc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE1NDgzOTYsIm5iZiI6MTcyMTU0ODA5NiwicGF0aCI6Ii82NDI2ODYyMC8zMDk2MzM4OTUtMTVjZTVjNWMtYTM5YS00NmQ3LTk1YjQtZGRmMjhjZTY1OWJjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIxVDA3NDgxNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRiYjEyMmI4OGRmNjc1YWJhNjY4OGZlNTY0MjQyZDA2M2NlNDczY2NhNjZlM2U2MWEyZjgxYzU2MTEzMDMyZDAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Z6SdZFRRw7ICOK2wcsn1zCuZMe-HubnGMkOCEyo27N4)
![image](https://private-user-images.githubusercontent.com/64268620/309634124-b6a726b3-a8d2-43c8-8d4a-aab1e7e1c174.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE1NDgzOTYsIm5iZiI6MTcyMTU0ODA5NiwicGF0aCI6Ii82NDI2ODYyMC8zMDk2MzQxMjQtYjZhNzI2YjMtYThkMi00M2M4LThkNGEtYWFiMWU3ZTFjMTc0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIxVDA3NDgxNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJiNmNhNWZiOWI1OWVlY2ViMmNiYjNhZTA2YTExZDkxMmQzOGY5ODc4ZDNmZTBlZTg4MDhkODFiMzEzYTExY2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.m46tNHaQUEfv8r-lpggLH2mYTXVmo0BWZnX660sdIJY)