Skip to content

Stathis-grk/Data-Analysis-Project-MySQL-Python-PowerBi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data-Analysis-Project-MySQL-Python-PowerBi

A comprehensive end-to-end data analysis project that demonstrates the complete data pipeline: from database creation and data cleaning to machine learning and visualization. This project analyzes customer transaction data using MySQL, Python, and Power BI.

🎯 Objectives

  1. Create and populate a MySQL database with transaction data
  2. Clean and preprocess messy data using Python and Pandas
  3. Perform exploratory data analysis
  4. Build predictive models with scikit-learn
  5. Visualize insights using Power BI dashboards

1) Create Database (MySQL)

Στιγμιότυπο οθόνης 2025-11-02 194017 Στιγμιότυπο οθόνης 2025-11-02 194033

Data Quality Issues (Intentional for Cleaning Practice)

  1. Duplicate records
  2. Special characters in names (?, ., /, _, 11)
  3. Inconsistent capitalization
  4. Missing values in critical fields
  5. Typos (e.g., "expence" instead of "expense")

2) Data Cleaning and Regression (Python)

Step 1: Import the Database into Python

image image

Step 2: Data cleaning

✅ Remove Duplicates

image

✅ Fix our columns (first and last names) using strip

image

✅ Fix the transaction_type and category columns using "replace"

image

✅ Fill in the blancks/Null values

image

✅ Drop down the rows without any currency

image

✅ Spelling correction

image

🧾 Raw Dataset (Before Cleaning)

image

📋 Dataset after Cleaning

image

Step 3: Random Forest (Machine Learning)

Classification Model

  1. Algorithm: Random Forest Classifier
  2. Target Variable: Category (bonus, rent, utilities)
  3. Features: Amount, transaction_type, currency
  4. Train/Test Split: 80/20
Στιγμιότυπο οθόνης 2025-11-02 202100 Στιγμιότυπο οθόνης 2025-11-02 202120

Predictions

1) Next 2 categories - utilities

2) Probability of this prediction - 78% (utilities)

3) Probability of another prediction:

bonus - 3%

rent - 14%

4) Accuracy: 0.5

3) 📈 Dashboard (Power Bi)

Στιγμιότυπο οθόνης 2025-11-02 184810

Key Metrics

  1. Total Amount: €5,020
  2. Year: 2024
  3. Standard Deviation: 397
  4. Total Customers: 9

Visualizations

Amount by Quarter - Bar chart showing transaction trends

Category Distribution - Pie chart breakdown

Customer Details - Interactive table with filters

Transaction Type Filter - Income vs Expense analysis

📊 Key Insights

Transaction Distribution

Bonuses account for nearly half of all transactions (44.58%)

Utilities represent a third of transactions (33.33%)

Rent payments are the smallest category (22.08%)

Quarterly Trends

Q1 and Q2 show highest transaction volumes

Q3 has significantly lower activity

Q4 shows moderate recovery

Currency Usage

EUR is the dominant currency (7 out of 9 customers)

USD is used by 2 customers

🛠️ Technologies Used

Database

MySQL 8.0 - Relational database management

Data Processing & Analysis

Python 3.11

pandas - Data manipulation and cleaning

numpy - Numerical computations

scikit-learn - Machine learning models

Business Intelligence

Power BI Desktop - Interactive dashboards

About

A comprehensive end-to-end data analysis project that demonstrates the complete data pipeline: from database creation and data cleaning to visualization and machine learning. This project analyzes customer transaction data using MySQL, Python, and Power BI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors