End-to-End Data Analytics Project
- This project illustrates the full journey of working with large datasets — from acquisition and preparation to analysis and interpretation.
- Data Extraction: Datasets were accessed programmatically through the Kaggle API, ensuring reproducibility and efficiency.
- Data Cleaning & Preprocessing: Leveraged Python with Pandas to handle incomplete records, standardize formats, normalize data, and remove inconsistencies.
- Database Integration: Transformed and loaded the refined dataset into SQL Server to enable structured querying and scalable analytics.
- Data Analysis: Applied advanced SQL queries to explore the data, conduct aggregations, and derive insights that drive decision-making.
- Kaggle API: Automated dataset download for streamlined access.
- Python & Pandas: Executed cleaning operations, including:
- Addressing missing or inconsistent values
- Transforming column formats for uniformity
- Detecting and eliminating duplicate entries
- SQL : Stored and queried the cleaned dataset to conduct detailed analysis.
- SQL-Based Analysis: Designed queries to:
Aggregate and summarize sales data Detect customer and product-level trends Generate actionable insights for strategy and planning
- Python Expertise: Practical application of Pandas and related libraries for data transformation and wrangling.
- SQL Proficiency: Advanced use of SQL for querying, grouping, and analyzing datasets.
- ETL Workflow Design: Built an efficient end-to-end Extract–Transform–Load pipeline.
- Analytical Thinking: Tackled data quality issues and ensured analysis accuracy to support reliable outcomes.
- Install Required Libraries: pip install -r requirements.txt
- Download the Dataset: Use the Kaggle API (instructions provided in the notebook). Preprocess the Data: → Order Data Analysis.ipynb (interactive notebook with detailed steps) orders data analysis.py (script version for automation)
→ Load Data into SQL Server: Follow the included setup guide. → Execute SQL Queries: Run SQLQuery3.sql to replicate the analysis.
- Identified highest-revenue products and their share of total sales.
- Analyzed customer buying behavior to guide marketing initiatives.
- Determined seasonal and peak demand periods for inventory optimization.
- Segmented customers by order frequency and value, enabling targeted promotions.
✅Python (Pandas, Matplotlib, Seaborn): Analyzed 50K+ orders, identified average sales per transaction ~₹350, median order quantity = 2, and profit distribution trends across categories.
✅SQL (MySQL): Extracted Top 10 revenue products, Top 5 regional bestsellers, and delivered 2022–23 YoY growth analysis showing >15% increase in sales for key sub-categories.
This project provides a holistic view of the data analytics lifecycle, covering every step from raw input to strategic recommendations. It demonstrates technical fluency, an eye for data quality, and the ability to transform information into meaningful business insights — critical capabilities for a data analyst career path.