# Data Engineering Project: Formula 1 Stats Exploration
## Objective
This project demonstrates data engineering skills by collecting, storing, processing, and analyzing Formula 1 statistics using data sourced from a REST API.
## Data Sources
Rapid API: JSON data fetched using Postman.
## Architecture Overview
### Tools and Services Used
* Postman: For fetching data from Rapid API.
* Amazon S3: For storing raw and processed data.
* AWS Glue: For ETL tasks.
* Amazon Athena: For data analysis.
* AWS RDS: For database management.
* MySQL Workbench: For managing and querying the database.
* Apache Spark: For data processing and transformation.
## Steps
### Data Collection
* Fetch JSON data from Rapid API using Postman.
* Upload JSON data to S3 buckets.
### Data Processing
* Process and transform JSON data using Apache Spark.
* Convert JSON files to Parquet format and store them in S3.
### Data Cataloging
* Use AWS Glue to create tables from Parquet files.
* Query and verify tables using Amazon Athena.
### Database Integration
* Create an RDS instance and set up MySQL Workbench.
* Migrate data from AWS Glue tables to RDS using ETL jobs in AWS Glue Studio.
### Data Analysis
* Use MySQL Workbench to query the data.
* Optionally connect to other visualization tools (e.g., Tableau, Power BI) for advanced analytics.
## How to Run the Project
* Set up AWS credentials.
* Configure and run the data collection scripts in Postman.
* Upload JSON data to S3.
* Process and transform data using Apache Spark.
* Use AWS Glue to catalog the data and create tables.
* Set up an RDS instance and configure MySQL Workbench.
* Migrate data from AWS Glue to RDS.
* Analyze the data using MySQL Workbench or other visualization tools.
## Acknowledgments
Rapid API: Source of JSON data.
## Conclusion
This project showcases the ability to collect, store, process, and analyze data using various tools and methods, highlighting skills in data engineering and cloud services integration.