Azure Data Factory is a robust, fully managed Platform as a Service (PaaS) solution designed for Extract, Transform, Load (ETL) as well as Extract, Load, Transform (ELT) operations on data. It empowers users to efficiently orchestrate, automate, and manage the movement and transformation of data across various sources and destinations in the cloud.
This project focuses on addressing the pressing need for understanding the impact of COVID-19 on society by creating a robust end-to-end data platform for reporting and predicting COVID-19 outbreaks.
Our organization seeks comprehensive insights into COVID-19, including historical data analysis on mortality rates and the spread of the virus. To achieve this, we aim to:
- Establish a data platform for reporting and predicting COVID-19 outbreaks.
- Create a Data Lake to aggregate data from authoritative sources such as ECDC and Eurostat.
- Utilize tools like Databricks, HDInsights, and Dataflows for data transformation.
- Ingest transformed data into the Data Lake to facilitate predictive analytics.
- Utilize a Data Warehouse to store data for trend analysis and reporting.
- Integrate disparate datasets seamlessly using Azure Data Factory.
- Implement Business Intelligence (BI) tools for analyzing trends and test effectiveness.
- Develop robust monitoring pipelines with alerting capabilities.
Our approach encompasses several key phases:
- Ingestion: Gathering data from multiple sources into a centralized repository.
- Transformation: Structuring and cleansing data using various Azure tools.
- Preparation of Reporting Data: Aggregating and formatting data for reporting and analysis.
- Orchestration and Monitoring: Managing data pipelines and ensuring operational efficiency.
- Reporting Trends: Utilizing Power BI to visualize and analyze trends in COVID-19 data.
- DevOps: Implementing Continuous Integration (CI) and Continuous Deployment (CD) practices for maintaining the solution.
- Azure SQL Database
- Azure Blob Storage
- Azure Databricks
- Azure Data Factory
- Azure Data Lake Storage Gen2
- Azure HDInsights
- Power BI
- ARM Template: Infrastructure as Code(IaC) for Azure environment setup.
This project is licensed under the MIT License - see the LICENSE file for details.