End-to-end business analytics project using the Brazilian Olist E-Commerce dataset.
Answers real business questions using advanced SQL and visualizes KPIs in Power BI.
An e-commerce company wants to understand:
- Which product categories drive 80% of revenue? (Pareto Analysis)
- Which customer segments are most valuable? (RFM Segmentation)
- Where are orders getting delayed in the supply chain?
- What causes revenue decline in certain months?
- Which sellers are underperforming?
ecommerce-sql-analytics/
│
├── sql/
│ ├── 01_schema_setup.sql # Database schema & table creation
│ ├── 02_data_exploration.sql # EDA queries
│ ├── 03_revenue_analysis.sql # Revenue & Pareto analysis
│ ├── 04_customer_rfm.sql # RFM segmentation (Window Functions)
│ ├── 05_delivery_analysis.sql # Supply chain & delay analysis
│ ├── 06_seller_performance.sql # Seller KPI ranking
│ ├── 07_cohort_analysis.sql # Customer cohort retention
│ └── 08_stored_procedures.sql # Reusable stored procedures
│
├── data/
│ └── README.md # How to download Olist dataset
│
├── powerbi/
│ └── README.md # Power BI dashboard guide
│
└── README.md
Source: Olist Brazilian E-Commerce — Kaggle
Size: 100,000+ orders | 9 tables | 2016–2018
Tables Used:
| Table | Description |
|---|---|
| olist_orders | Order status, timestamps |
| olist_order_items | Products, prices, freight |
| olist_customers | Customer location |
| olist_products | Product categories |
| olist_sellers | Seller location |
| olist_order_payments | Payment methods, values |
| olist_order_reviews | Customer ratings |
| Technique | Where Used |
|---|---|
| CTEs (WITH clause) | Revenue analysis, RFM, Cohort |
| Window Functions (RANK, DENSE_RANK, LAG, LEAD, ROW_NUMBER) | RFM, Seller ranking, MoM growth |
| Multi-table JOINs (5+ tables) | All business queries |
| Subqueries & Correlated Subqueries | Pareto analysis |
| CASE WHEN | Customer segmentation |
| GROUP BY + HAVING | Category performance |
| DATE functions | Cohort, delivery analysis |
| Stored Procedures | Reusable KPI reports |
| Indexes | Query optimization |
- Top 5 categories contribute 62% of total revenue — Pareto rule confirmed
- Average delivery delay: 12 days — São Paulo orders 3x faster than northern states
- Champions segment (RFM Score 555) — 8% of customers, 34% of revenue
- Month-over-month revenue dropped 18% in Sep 2017 — traced to seller stockout
- Bottom 20% sellers have 4.2x higher cancellation rate than top performers
- SQL: MySQL / PostgreSQL
- Visualization: Power BI (Star Schema, DAX, Drill-Through)
- Python: Pandas (data loading helper script)
- Tools: MySQL Workbench / DBeaver
# 1. Download dataset from Kaggle (see data/README.md)
# 2. Run schema setup
mysql -u root -p ecommerce < sql/01_schema_setup.sql
# 3. Load data (use Python helper or MySQL LOAD DATA)
# 4. Run analysis queries in order (02 → 08)Suhas Dhamapurkar — Data Analyst
LinkedIn | GitHub | Portfolio