Analyze Walmart's retail sales data end-to-end: from cleaning raw data using Python to answering business-critical questions using MySQL. This project provides actionable insights into product performance, sales volume trends, customer ratings, payment behavior, and branch efficiency.
This is a hands-on SQL + Python analytics project using real-world Walmart data.
- Data is cleaned and transformed using Python (pandas)
- Key fields like sales volume, customer behavior, and profitability margins are standardized
- Cleaned data is stored as
walmart_clean.csv - SQL queries (MySQL) explore retail KPIs, customer trends, and branch performance
- Python (pandas) β Data cleaning and preprocessing
- MySQL β SQL-based analytics and business intelligence
- Jupyter Notebook β For step-by-step Python execution
- Kaggle Dataset β Walmart Sales Dataset
git clone https://github.com/your-username/walmart-sales-analysis.git
cd walmart-sales-analysispip install pandasRun the provided Jupyter Notebook:
walmart_data_cleaning.ipynbThis notebook performs data cleaning steps β handling missing values, correcting types, removing duplicates β and exports a clean version of the dataset aswalmart_clean.csv.
Create the table schema in MySQL:
CREATE TABLE walmart_clean (
invoice_id VARCHAR(20),
branch VARCHAR(10),
city VARCHAR(50),
category VARCHAR(50),
unit_price FLOAT,
quantity INT,
date DATE,
time TIME,
payment_method VARCHAR(20),
rating FLOAT,
profit_margin FLOAT
);Then import the walmart_clean.csv file using MySQL Workbench or your preferred import tool.
This project answers real-world analytics questions relevant to Walmart and retail operations:
- π¦ What are the top-selling categories by units sold?
- πͺ Which categories perform best at each branch?
- π What are the busiest times of day and days of the week?
- β Do higher-rated categories correlate with higher sales?
- π° Which branches are most efficient in terms of sales volume and profit margin?
- π³ What are the most popular payment methods and average basket size?
β οΈ Which categories are underperforming in both units and customer ratings?- π What are the monthly sales trends (seasonality)?
- π§Ύ What is the average basket size per branch?
- π How are unit sales changing year-over-year by branch?
- π Food & Beverages consistently leads in total units sold.
- π Sales peak during evenings and weekends, especially in Branch B.
- β Higher-rated categories tend to sell more units, showing a positive link between customer satisfaction and demand.
- π³ Ewallet is the most preferred payment method across branches.
β οΈ Fashion Accessories shows both low units and low ratings β a clear candidate for product review or promotions.- π Branch C experienced a year-over-year decline in unit sales, highlighting areas needing operational focus.
π¦ walmart-sales-analysis/
βββ walmart_data_cleaning.ipynb # Python notebook for cleaning and exporting data
βββ walmart_clean.csv # Cleaned dataset exported from notebook
βββ queries.sql # MySQL queries answering business questions
βββ README.md # This documentation
Contributions are welcome! Fork the repo, make your changes, and open a pull request.