Skip to content

Real World Project on Formula1 Racing using Azure Databricks, Delta Lake, Unity Catalog, Azure Data Factory [DP203]

Notifications You must be signed in to change notification settings

Muhyd33n/Formula1RacingProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Cloud Data Platform: Formula 1 Race Data Analysis with Azure Databricks

This project showcases my journey through learning Azure Databricks, and Data Engineering Lifecycle

Project Overview

In this project, I'm learning Data engineering Lifecycle which Data generation, Storage, Ingestion, Transformation and Serving. Here are the main components and steps I've undertaken:

Data Ingestion:Currently the dataset is ingested manually into the datalake object storage. Later, I will be integrating an External API, specifically the ErgastAPI, to pull Formula 1 race data. This data is will be ingested into Azure Data Lake Storage using Azure Data Factory pipelines.

Data Transformation: Raw race data is transformed into a structured format suitable for analysis. I've employed data cleansing techniques and performed enrichment to enhance its quality.

Presentation Layer: The transformed data is organized in a way that simplifies access and analysis, providing a solid foundation for insights generation.

Interactive Dashboards: I will create an interactive and visually appealing dashboards using Power BI, allowing users to intuitively explore the analyzed data.

Data Lakehouse Architecture with Delta Lake: I will be exploring the emerging data lakehouse architecture and implementing it using Delta Lake.

Security: Recognizing the importance of data security, I've implemented access controls and security measures to ensure data integrity and compliance.

Technologies and Tools

This project enables me to gain production knowledge in a variety of technologies and tools, including:

Microsoft Azure Portal: For managing cloud resources and services.

ErgastAPI: The source of Formula 1 race data.

Azure Data Factory: For orchestrating data pipelines and automating data ingestion.

Azure Databricks: Empowering advanced processing and transformation.

Power BI: Enabling the creation of insightful visualizations and dashboards

Screenshot 2023-08-22 144704

About

Real World Project on Formula1 Racing using Azure Databricks, Delta Lake, Unity Catalog, Azure Data Factory [DP203]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages