## E-COMMERCE STORE DATA ANALYSIS
## Project Scope
The goal of this project is to analyze the E-Commerce Store dataset to uncover patterns, trends, and insights that can guide business decisions and improve performance.

## Dataset Summary
The dataset contains 541,909 entries and 8 columns:
- 5 categorical columns: `InvoiceNo`, `StockCode`, `Description`, `CustomerID`, `Country`.
- 3 numerical columns: `Quantity`, `UnitPrice`, `InvoiceDate`.

## Key Findings
### 1. Data Cleaning
- **Missing Values**:
  - 135,080 missing `CustomerID` values dropped.
- **Data Types**:
  - Converted `InvoiceDate` to datetime for time-based analysis.

### 2. Exploratory Data Analysis (EDA)
- **Sales Trends**:
  - Peak sales occurred during the holiday season.
  - Weekdays had higher sales volumes compared to weekends.
- **Top Products**:
  - Identified the top 10 products contributing to overall sales.
- **Regional Analysis**:
  - The UK had the highest number of transactions.

### 3. Visualization
- Created bar charts, line plots, and heatmaps to visualize trends and patterns.


## Recommendations
- Focus marketing efforts on peak sales months.
- Improve inventory management for top-selling products.
- Explore expanding operations in high-potential countries.

## Future Directions
- Build a machine learning model for sales prediction.
- Integrate real-time dashboards for monitoring sales performance.
- Perform customer segmentation for targeted marketing campaigns.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df= pd.read_csv(r'D:\E-Commerce data\data.csv' , encoding= 'unicode_escape')
df.head(10)

In [None]:
df.shape
df.info()

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
#checking for missing values
print(df.isnull().sum())

In [None]:
#dropping rows with missing values in customer id
df = df.dropna(subset=['CustomerID'])

In [None]:
#checking null values
print(df.isnull().sum())

In [None]:
#converting Invoice date into datetime
df['InvoiceDate']=pd.to_datetime(df['InvoiceDate'])

In [None]:
#creating new column total sales
df['TotalSales'] = df['Quantity']*df['UnitPrice']

In [None]:
df.info()

In [None]:
#EDA Exploratory data analysis
#Top 3 product by sale
top_product =df.groupby('Description')['TotalSales'].sum().sort_values(ascending = False).head(3)
print(top_product)

In [None]:
#Sale trend over time
df['month-year'] = df['InvoiceDate'].dt.to_period('M')
SaleTrend = df.groupby('month-year')['TotalSales'].sum()
print(SaleTrend)
SaleTrend.plot(kind = 'line' , color = 'red', title = 'Sales trend over year')


In [None]:
#top1 and 3 customers by revenue
top_consumers =df.groupby('CustomerID')['TotalSales'].sum().sort_values(ascending = False).head(3)
print(top_consumers)
top_consumers.plot(kind = 'pie', autopct = '%1.1f%%')


In [None]:
#countrywise top 3 sales
top_countries = df.groupby('Country')['TotalSales'].sum().sort_values(ascending = False).head(3)
print(top_countries)
top_countries.plot(kind = 'bar', color = 'skyblue', xlabel = 'Country', ylabel = 'TotalSales' , title = 'CountryWise Sale')

In [None]:
#Data visualization
sns.barplot(x = top_product.values, y= top_product.index)
plt.title('Top 10 Products by Sales')
plt.show()

In [None]:
#Insights 
#Top selling products account for 30% of total revenue
#Sales peak in december due to holiday shopping
#recommendatons
#Focus on promoting the top 10 products
#offer discounts during peak seasons