Welcome to my end-to-end data analytics project! This repository showcases a complete data analysis workflow using Python and SQL, focused on retail order data. It highlights my ability to handle real-world datasets, clean and preprocess data, and derive actionable insights, making it a perfect fit for data analyst positions.
This project demonstrates how to work with large datasets, from extraction and cleaning to analysis and visualization.
1.Data Extraction: Leveraged the Kaggle API to download datasets programmatically.
2.Data Cleaning and Preprocessing: Used Python and Pandas to handle missing values, normalize data, and prepare it for analysis.
3.Database Integration: Loaded the cleaned data into an SQL Server database for querying and analysis.
4.Data Analysis: Conducted exploratory data analysis (EDA) and derived insights using SQL queries.
Workflow Breakdown:
Kaggle API: Accessed datasets efficiently without manual downloads.
Python + Pandas: Performed data cleaning, including:
1.Handling missing data
2.Formatting and transforming columns
3.Removing duplicates
SQL Server: Loaded the cleaned dataset into SQL Server and conducted in-depth analysis using SQL queries.
Data Analysis: Used SQL to:
1.Aggregate data
2.Identify trends
3.Generate insights for decision-making
Skills Demonstrated
Python: Proficient use of libraries like Pandas for data manipulation and analysis.
SQL: Strong command over SQL queries for data aggregation, filtering, and exploration.
ETL Workflow: Implemented a seamless Extract-Transform-Load process.
Problem-Solving: Identified and resolved data quality issues to ensure reliable analysis.
Install the required Python libraries:
pip install -r requirements.txt
Use the Kaggle API to download the dataset (instructions included in the notebook).
Run the Python scripts for data cleaning and preprocessing:
1.Order Data Analysis.ipynb (Jupyter Notebook for detailed cleaning steps)
2.orders data analysis.py (Python script version for automation)
3.Load the cleaned data into an SQL Server database (setup instructions provided).
4.Execute the SQL queries to analyze the data using SQLQuery_Analysis.sql.
Order Data Analysis.ipynb: Jupyter notebook for data cleaning and preprocessing.
orders data analysis.py: Python script to clean and prepare the data.
SQLQuery_Analysis.sql: Collection of SQL queries for data analysis.
orders.csv: Raw dataset containing retail order information.
project architecture.png: Visual representation of the project workflow.
README.md: Project documentation.
1.Identified top-selling products and their revenue contributions.
2.Analyzed customer purchasing patterns to inform marketing strategies.
3.Determined peak sales periods for inventory management optimization.
4.Segmented customers based on order frequency and value for targeted promotions.
Why This Project Matters
This project demonstrates a solid understanding of the data analytics lifecycle, from raw data to actionable insights. It showcases my technical skills, attention to detail, and ability to work with multiple tools and technologies—all essential for a career in data analytics.
Let's Connect
Feel free to explore the project and reach out with any questions or feedback. I'm excited to connect with like-minded professionals and recruiters in the data analytics field.
LinkedIn: https://www.linkedin.com/in/devisuhitha/
Email: sahilranga03@gmail.com