Football-Data-Analytics

Project Overview

This project involves a complete pipeline for extracting football data from Wikipedia, storing it in Azure Data Lake, transforming the data using Azure Databricks, querying the data through Azure Synapse, and visualizing the results in Tableau. It's designed to provide comprehensive analytics on football data for enthusiasts and analysts.

Architecture :- End-To-End Data Engineering Project

Prerequisites

Apache Airflow
Azure Data Lake Storage Gen2 account
Azure Databricks workspace
Azure Synapse Analytics workspace
Tableau Desktop or Tableau Public account

Installation and Setup

Clone the Repository: git clone https://github.com/AnishmMore/Football-Data-Analytics.git
Azure Setup:
- Set up Azure Data Lake Storage Gen2.
- Configure Azure Databricks workspace.
- Initialize Azure Synapse Analytics workspace.
Airflow Setup: Detail how to set up Apache Airflow to run your DAGs.

Project Components

Extracting Data with Apache Airflow

File: wikipedia_azure.py within the dags directory.
Description: This is the primary DAG file containing the Apache Airflow code.
Execution:
1. Run Airflow on localhost.
2. Initiate the DAG to begin data extraction from Wikipedia.
3. Data is subsequently stored in Azure Data Lake Storage Gen2.

Data Transformation with Azure Databricks

File: Football Analytics.ipynb.
Process:
1. Data retrieved from Azure Data Lake Storage Gen2.
2. Transformation is executed using the Azure Databricks compute engine.
3. Transformed data is then stored back in Azure Data Lake Storage Gen2.
Usage: Execute the notebook on Azure Databricks to transform the raw_data.

Querying Data with Azure Synapse

File: Synapse.sql.
Functionality: Contains a collection of SQL queries used for data analysis.
Utility: Use these queries in Azure Synapse to derive insights and prepare data for visualization.

Data Visualization in Tableau

File: Football_Analytics.twb.
Tool: Tableau is employed for creating visual representations of the data.
Visualization: The dashboard within the Tableau workbook provides an interactive view of the football data.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dags		dags
data		data
.DS_Store		.DS_Store
Dockerfile		Dockerfile
Football Analytics.ipynb		Football Analytics.ipynb
Football_Analytics.twb		Football_Analytics.twb
README.md		README.md
airflow_settings.yaml		airflow_settings.yaml
docker-compose.yml		docker-compose.yml
end-to-end-architecture.png		end-to-end-architecture.png
packages.txt		packages.txt
requirements.txt		requirements.txt
synapase.sql		synapase.sql
tableau_visualisation.png		tableau_visualisation.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Football-Data-Analytics

Project Overview

Architecture :- End-To-End Data Engineering Project

Prerequisites

Installation and Setup

Project Components

Extracting Data with Apache Airflow

Data Transformation with Azure Databricks

Querying Data with Azure Synapse

Data Visualization in Tableau

About

Releases

Packages

Languages

AnishmMore/Football-Data-Analytics

Folders and files

Latest commit

History

Repository files navigation

Football-Data-Analytics

Project Overview

Architecture :- End-To-End Data Engineering Project

Prerequisites

Installation and Setup

Project Components

Extracting Data with Apache Airflow

Data Transformation with Azure Databricks

Querying Data with Azure Synapse

Data Visualization in Tableau

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages