This project demonstrates a complete ETL (Extract, Transform, Load) pipeline using Python and C#.
The objective is to simulate a real-world data engineering scenario where raw construction project data is collected, cleaned, transformed, loaded into another system, and finally analyzed.
- Implement a multi-language ETL workflow
- Automate data transformation and system integration
- Simulate a realistic data reporting process using SQL
Phase 1: Data Collection and Cleaning (Python)
Raw data is collected in CSV format. Python scripts analyze, clean, and prepare the data for further processing.
Phase 2: Data Transformation and Loading (C#)
Cleaned data is loaded into a new system, such as a SQLite database or XML file, simulating integration with external systems.
Phase 3: Querying and Analysis (SQL)
SQL queries extract insights and summarize key metrics from the loaded data.
Phase 4: Reporting and Presentation
Final outputs, including summaries and reports, are generated and stored for review or decision-making.
- Python – pandas, CSV processing
- C# – .NET Core for data transfer simulation
- SQLite – lightweight database simulation
- SQL – data querying and reporting
- Git – version control
- Clone the repository:
git clone https://github.com/defimaleji/capstone-project.git cd capstone-project - Run the Python ETL scripts (data cleaning).
- Execute the C# loader program to transfer data into SQLite/XML.
- Use provided SQL scripts to query and analyze data.
- Review generated reports in the
reports/folder.
This project is licensed under the MIT License – see the LICENSE file for details.