🧮 Sales Analytics Project using SQL and Python

This project demonstrates how to perform data analysis on the Online Retail II Dataset using MySQL and Python. It walks through the full data pipeline: importing data, setting up SQL schema, connecting SQL to Python, performing exploratory data analysis (EDA), and customer segmentation.

📁 Project Structure

Sales_Analytics/
│
├── data/
│   └── online_retail_II.csv           # Raw dataset
├── scripts/
│   ├── sql_schema.sql                 # SQL schema and insert queries
│   └── eda_analysis.py                # Python code for EDA and clustering
├── README.md                          # Project documentation
└── customer_segments.csv              # Output from clustering analysis

⚙️ Prerequisites

Python 3.x
MySQL Server
MySQL Workbench or CLI
Python Libraries:
- pandas
- sqlalchemy
- pymysql
- matplotlib
- seaborn
- scikit-learn

Install them using:

pip install pandas sqlalchemy pymysql matplotlib seaborn scikit-learn

🧵 Step-by-Step Guide

1️⃣ Load Dataset into Python

import pandas as pd

df = pd.read_csv("online_retail_II.csv")
df.dropna(subset=["Customer ID"], inplace=True)
df['Customer ID'] = df['Customer ID'].astype(int)
df['Total'] = df['Quantity'] * df['Price']
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])

2️⃣ Create SQL Tables in MySQL

Run the following in MySQL Workbench:

CREATE DATABASE Sales_Analytics;
USE Sales_Analytics;

CREATE TABLE RawRetail (
    Invoice VARCHAR(20),
    StockCode VARCHAR(20),
    Description TEXT,
    Quantity INT,
    InvoiceDate DATETIME,
    Price DECIMAL(10, 2),
    CustomerID INT,
    Country VARCHAR(100)
);

3️⃣ Connect Python to MySQL

🔐 Format:

from sqlalchemy import create_engine

engine = create_engine("mysql+pymysql://username:password@localhost:3306/Sales_Analytics")

Example:

engine = create_engine(f'mysql+pymysql://{username}:{password}@{host}:{port}/{database}')

4️⃣ Upload DataFrame to SQL

df.to_sql('RawRetail', con=engine, if_exists='replace', index=False)

5️⃣ Analyze the Data from SQL in Python

query = """
    SELECT InvoiceDate, StockCode, Description, Quantity, Price, CustomerID, Country, 
           (Quantity * Price) AS Total 
    FROM RawRetail;
"""
df = pd.read_sql(query, con=engine)

# Monthly Sales Trend
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
monthly_sales = df.groupby(df['InvoiceDate'].dt.to_period('M'))['Total'].sum()
monthly_sales.plot(kind='line', title='Monthly Sales Trend')

6️⃣ RFM Segmentation (Customer Clustering)

rfm = df.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (df['InvoiceDate'].max() - x.max()).days,
    'Invoice': 'nunique',
    'Total': 'sum'
})
rfm.columns = ['Recency', 'Frequency', 'Monetary']

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import seaborn as sns

scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm)

kmeans = KMeans(n_clusters=3, random_state=42)
rfm['Cluster'] = kmeans.fit_predict(rfm_scaled)

sns.scatterplot(x='Recency', y='Monetary', hue='Cluster', data=rfm)

📊 Key Outputs

Monthly sales trends
Top-selling products
Country-wise sales breakdown
Customer clusters based on RFM segmentation

💡 Highlights

🔗 SQL-to-Python pipeline using SQLAlchemy
📦 Clean, scalable project structure
📈 Visual analytics using Matplotlib & Seaborn
🔍 Insightful segmentation using machine learning

🤝 Credits

Dataset: Kaggle - Online Retail II
Author: Durga Rani

🛠️ Future Improvements

Add interactive dashboard using Streamlit or Dash
Schedule regular database updates
Deploy model predictions for marketing use

📌 License

This project is under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
Untitled.ipynb		Untitled.ipynb
online_retail_II.csv.zip		online_retail_II.csv.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧮 Sales Analytics Project using SQL and Python

📁 Project Structure

⚙️ Prerequisites

🧵 Step-by-Step Guide

1️⃣ Load Dataset into Python

2️⃣ Create SQL Tables in MySQL

3️⃣ Connect Python to MySQL

🔐 Format:

4️⃣ Upload DataFrame to SQL

5️⃣ Analyze the Data from SQL in Python

6️⃣ RFM Segmentation (Customer Clustering)

📊 Key Outputs

💡 Highlights

🤝 Credits

🛠️ Future Improvements

📌 License

About

Uh oh!

Releases

Packages

Languages

License

Djdurga/Sales-Analytics-Project-using-SQL-and-Python

Folders and files

Latest commit

History

Repository files navigation

🧮 Sales Analytics Project using SQL and Python

📁 Project Structure

⚙️ Prerequisites

🧵 Step-by-Step Guide

1️⃣ Load Dataset into Python

2️⃣ Create SQL Tables in MySQL

3️⃣ Connect Python to MySQL

🔐 Format:

4️⃣ Upload DataFrame to SQL

5️⃣ Analyze the Data from SQL in Python

6️⃣ RFM Segmentation (Customer Clustering)

📊 Key Outputs

💡 Highlights

🤝 Credits

🛠️ Future Improvements

📌 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages