Data-Analysis-Project-MySQL-Python-PowerBi

A comprehensive end-to-end data analysis project that demonstrates the complete data pipeline: from database creation and data cleaning to machine learning and visualization. This project analyzes customer transaction data using MySQL, Python, and Power BI.

🎯 Objectives

Create and populate a MySQL database with transaction data
Clean and preprocess messy data using Python and Pandas
Perform exploratory data analysis
Build predictive models with scikit-learn
Visualize insights using Power BI dashboards

1) Create Database (MySQL)

Data Quality Issues (Intentional for Cleaning Practice)

Duplicate records
Special characters in names (?, ., /, _, 11)
Inconsistent capitalization
Missing values in critical fields
Typos (e.g., "expence" instead of "expense")

2) Data Cleaning and Regression (Python)

Step 1: Import the Database into Python

Step 2: Data cleaning

✅ Remove Duplicates

✅ Fix our columns (first and last names) using strip

✅ Fix the transaction_type and category columns using "replace"

✅ Fill in the blancks/Null values

✅ Drop down the rows without any currency

✅ Spelling correction

🧾 Raw Dataset (Before Cleaning)

📋 Dataset after Cleaning

Step 3: Random Forest (Machine Learning)

Classification Model

Algorithm: Random Forest Classifier
Target Variable: Category (bonus, rent, utilities)
Features: Amount, transaction_type, currency
Train/Test Split: 80/20

Predictions

1) Next 2 categories - utilities

2) Probability of this prediction - 78% (utilities)

3) Probability of another prediction:

bonus - 3%

rent - 14%

4) Accuracy: 0.5

3) 📈 Dashboard (Power Bi)

Key Metrics

Total Amount: €5,020
Year: 2024
Standard Deviation: 397
Total Customers: 9

Visualizations

Amount by Quarter - Bar chart showing transaction trends

Category Distribution - Pie chart breakdown

Customer Details - Interactive table with filters

Transaction Type Filter - Income vs Expense analysis

📊 Key Insights

Transaction Distribution

Bonuses account for nearly half of all transactions (44.58%)

Utilities represent a third of transactions (33.33%)

Rent payments are the smallest category (22.08%)

Quarterly Trends

Q1 and Q2 show highest transaction volumes

Q3 has significantly lower activity

Q4 shows moderate recovery

Currency Usage

EUR is the dominant currency (7 out of 9 customers)

USD is used by 2 customers

🛠️ Technologies Used

Database

MySQL 8.0 - Relational database management

Data Processing & Analysis

Python 3.11

pandas - Data manipulation and cleaning

numpy - Numerical computations

scikit-learn - Machine learning models

Business Intelligence

Power BI Desktop - Interactive dashboards

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md
Transaction.Project.ipynb		Transaction.Project.ipynb
Transaction.Project.pbix		Transaction.Project.pbix
Transaction.Project.sql		Transaction.Project.sql

Folders and files

Latest commit

History

Repository files navigation

Data-Analysis-Project-MySQL-Python-PowerBi

🎯 Objectives

1) Create Database (MySQL)

Data Quality Issues (Intentional for Cleaning Practice)

2) Data Cleaning and Regression (Python)

Step 1: Import the Database into Python

Step 2: Data cleaning

✅ Remove Duplicates

✅ Fix our columns (first and last names) using strip

✅ Fix the transaction_type and category columns using "replace"

✅ Fill in the blancks/Null values

✅ Drop down the rows without any currency

✅ Spelling correction

🧾 Raw Dataset (Before Cleaning)

📋 Dataset after Cleaning

Step 3: Random Forest (Machine Learning)

Predictions

1) Next 2 categories - utilities

2) Probability of this prediction - 78% (utilities)

3) Probability of another prediction:

bonus - 3%

rent - 14%

4) Accuracy: 0.5

3) 📈 Dashboard (Power Bi)

Key Metrics

Visualizations

📊 Key Insights

Transaction Distribution

Quarterly Trends

Currency Usage

🛠️ Technologies Used

Database

Data Processing & Analysis

Business Intelligence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages