GitHub - Margaret-Mwangi/Shop-Orders-Analytics-Project

🧠 End-to-End Data Analytics Project with Python & SQL

This project demonstrates an end-to-end data analytics workflow — from data ingestion to data cleaning, database integration, and SQL-based business insights.

The goal of this project is to simulate a real-world analytics pipeline where raw data is obtained via an API, processed in Python, stored in a relational database, and queried to answer business-driven questions.

📊 Project Overview

Tech Stack:

Python (Data processing & cleaning)

Pandas (Data wrangling)

PostgreSQL (Database)

Supabase (Cloud-hosted Postgres instance)

Kaggle API (Dataset retrieval)

SQL (Business insights & reporting)

⚙️ Workflow Summary

Data Collection: Retrieved the dataset using the Kaggle API.

Data Processing & Cleaning (Python): Used Pandas for data wrangling and transformation.

Handled missing values, corrected data types, and standardized column names.

Data Loading: Loaded the cleaned dataset into a PostgreSQL database hosted on Supabase.

Created and managed database tables using psycopg2 and SQL scripts.

Data Analysis (SQL): Wrote SQL queries to extract insights and answer key business questions.

💡 Business Questions Answered

Top 10 highest revenue-generating products
Top 5 highest selling products in each region — by quantity, then by revenue
Month-over-month growth comparison for 2022 and 2023 sales (e.g., Jan 2022 vs Jan 2023)
For each category, which month had the highest sales by revenue?
Which sub-category had the highest profit growth in 2023 compared to 2022?

🚀 How to Run the Project

Clone the repository git clone https://github.com/Margaret-Mwangi/SQL-Python-Projects.git cd SQL-Python-Projects
Set up environment variables

Create a .env file based on .env.example and fill in your credentials:

database_user=<your_postgres_db_username> password=<your_postgres_db_password> host=<your_superbase_host> port=<database_port> dbname=<your_database_name>

Install dependencies pip install -r requirements.txt
Run the pipeline python orders data analysis.ipynb
Execute SQL queries

Connect to your Supabase PostgreSQL instance using any SQL client and run queries from orders_analysis.sql.

📈 Insights & Results

All SQL queries provide key business insights such as:

Product performance by sales and revenue

Regional demand patterns

Year-over-year and month-over-month sales growth trends

Category and sub-category profitability analysis

🧠 Key Learnings

How to use Kaggle API for automated data retrieval

Data cleaning and transformation with Pandas

Setting up and interacting with PostgreSQL on Supabase

Writing analytical SQL queries for real-world business use cases

Using .env files for secure credential management

🏁 Next Steps

Potential improvements:

Automate the pipeline with Airflow or Prefect

Build dashboards in Power BI or Tableau

Add dbt for modular SQL transformations

🧑‍💻 Author

Margaret Mwangi 💼 Aspiring Data Engineer | Passionate about Analytics, SQL, and Cloud Data Solutions 📫 LinkedIn | GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
README.md		README.md
orders data analysis.ipynb		orders data analysis.ipynb
orders.csv		orders.csv
orders_analysis.sql		orders_analysis.sql
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Margaret-Mwangi/Shop-Orders-Analytics-Project

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages