- Project Overview
- Objectives
- Dataset
- Tools & Libraries
- Data Cleaning
- Data Exploration & Analysis
- Methods & Techniques Used
- Project Structure
- How to Run the Project
- Future Improvements
- Author & Contact
This project is an Exploratory Data Analysis (EDA) on 12 months worth of sales data from an electronics store.
The dataset contains hundreds of thousands of purchases, including details such as month, product type, cost, and purchase address.
The goal of this project is to clean the dataset and then analyze it to answer key business questions using Python Pandas and Matplotlib.
- Clean and prepare the sales dataset.
- Perform exploratory data analysis.
- Answer important business-related questions such as:
- What was the best month for sales? How much was earned that month?
- What city sold the most products?
- What time should advertisements be displayed to maximize purchases?
- What product sold the most, and why?
- Source: 12 separate monthly CSV files combined into one dataset.
- Location:Multiple CSV files located in /data/ folder
- Features include:
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
- Python 🐍
- Pandas – Data manipulation & analysis
- Matplotlib – Data visualization
- Jupyter Notebook – Development & exploration
- Github
Before analysis, the dataset was cleaned with the following steps:
- Dropped NaN values from DataFrame.
- Removed rows based on conditions (e.g., invalid entries).
- Converted column types using:
pd.to_numeric
pd.to_datetime
.astype()
After cleaning, we performed analysis to answer business questions:
-
Best Month for Sales
- Calculated total sales per month.
- Identified the month with the highest revenue.
-
City with Most Sales
- Extracted city information from purchase addresses.
- Grouped sales by city and visualized results.
-
Optimal Advertisement Time
- Converted
Order Date
to datetime. - Analyzed purchase frequency by hour.
- Suggested peak hours for displaying ads.
- Converted
-
Most Sold Product & Reason
- Counted product quantities sold.
- Compared with pricing trends.
- Derived insights into popularity vs affordability.
- Concatenating CSVs →
pd.concat()
- Feature Engineering → Creating new columns from existing ones
- String Parsing →
.str
methods - Apply Function →
.apply()
for transformations - Grouping & Aggregation →
.groupby()
- Visualizations → Bar charts & line graphs
- Graph Labeling for better interpretation
📦 Insurance-Cost-Analysis-EDA-Regression-Python
│
├── README.md
├── .gitignore
├── notebooks/ # Jupyter notebooks
│ ├── exploratory_data_analysis.ipynb
├── data/
└──data/Sales_January_2019.csv
└──data/Sales_February_2019.csv
└──data/Sales_March_2019.csv
└──data/Sales_April_2019.csv
└──data/Sales_May_2019.csv
└──data/Sales_June_2019.csv
└──data/Sales_July_2019.csv
└──data/Sales_August_2019.csv
└──data/Sales_September_2019.csv
└──data/Sales_October_2019.csv
└──data/Sales_November_2019.csv
└──data/Sales_DEcember_2019.csv
- Clone the repository:
git clone https://github.com/codewithchirag18/Insurance-Cost-Analysis-EDA-Regression-Python.git
- Navigate to the folder:
bash
Copy code
cd sales-analysis-eda
- Install required libraries:
bash
Copy code
pip install -r requirements.txt
- Open Jupyter Notebook:
bash
Copy code
jupyter notebook exploratory_data_analysis.ipynb
-
Automate data cleaning & analysis pipeline.
-
Add interactive dashboards using Streamlit or Tableau.
-
Use Machine Learning to forecast future sales.
Chirag Tomar Data Analyst 📧 Email: tomarchirag431@gmail.com 🔗 LinkedIn 🔗 LeetCode