This project demonstrates the complete process of cleaning, preparing, and structuring raw data for analysis.
The goal is to turn messy data into clean, automated, and actionable datasets ready for visualization and reporting.
In real-world scenarios, datasets are often:
- Duplicated
- Inconsistent
- Incomplete
- Hard to analyze
This project simulates cleaning a business dataset to prepare it for dashboards and decision-making.
- Source: Example business dataset (Excel/CSV)
- Key columns: Customer ID, Sales, Product, Region, Date
- Issues addressed: duplicates, nulls, formatting inconsistencies
- Remove duplicates
- Handle missing values
- Standardize formats
- Validate ranges and data types
- Prepare dataset for analysis and dashboards
- SQL scripts automate cleaning and transformation
- Repeatable workflow ensures reproducibility
- Dashboards created in Power BI or Tableau
- KPI tracking, interactive filters, and charts
- Example insights: sales trends, top products, regional performance
- SQL
- Excel / CSV
- Power BI / Tableau
data_cleaning/
│
├── dataset/
│ └── raw_data.csv
├── scripts/
│ └── data_cleaning.sql
├── dashboards/
│ └── sales_dashboard.pbix
├── results/
│ └── cleaned_dataset.csv
└── README.md
Results
Cleaned and structured dataset
Automated SQL queries for repeatable cleaning
Ready for dashboard visualization and business insights
Future Improvements
Connect multiple data sources (CSV, Excel, databases)
Full ETL pipeline automation
Advanced dashboards with KPIs and alerts
Author
David Gol
Junior Data Analyst & BI Enthusiast