# Superstore Sales Data Analysis Project
---
### Project Overview
This project analyzes the sales data of a fictional retail store (Superstore) to extract meaningful business insights.

**Dataset:** `Sample - Superstore.csv`

**Objectives:**
- Explore sales patterns by category, sub-category, and geography
- Analyze profit and discounts
- Study sales trends over time
- Perform customer segmentation using RFM analysis

**Tools:** Python, Pandas, Matplotlib, Seaborn, Jupyter Notebook


## 1. Import Libraries & Load Data

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')
sns.set_palette('muted')

# Load data
df = pd.read_csv('Sample - Superstore.csv', encoding='latin1')

# Preview data
df.head()

## 2. Data Cleaning & Preprocessing

In [None]:
# Check for missing values
print(df.isnull().sum())

# Drop duplicates if any
df = df.drop_duplicates()

# Convert 'Order Date' to datetime
df['Order Date'] = pd.to_datetime(df['Order Date'], errors='coerce')

# Verify changes
df.info()

## 3. Exploratory Data Analysis (EDA)

### 3.1 Sales by Category

In [None]:
sales_by_category = df.groupby('Category')['Sales'].sum().sort_values(ascending=False)
plt.figure(figsize=(8,5))
sns.barplot(x=sales_by_category.index, y=sales_by_category.values)
plt.title('Total Sales by Category')
plt.xlabel('Category')
plt.ylabel('Sales')
plt.show()

### 3.2 Top 10 Sub-Categories by Sales

In [None]:
top_subcategories = df.groupby('Sub-Category')['Sales'].sum().sort_values(ascending=False).head(10)
plt.figure(figsize=(10,6))
sns.barplot(x=top_subcategories.index, y=top_subcategories.values)
plt.title('Top 10 Sub-Categories by Sales')
plt.xticks(rotation=45)
plt.show()

### 3.3 Sales Over Time

In [None]:
df['Year'] = df['Order Date'].dt.year
df['Month'] = df['Order Date'].dt.month
sales_by_month = df.groupby(['Year', 'Month'])['Sales'].sum().reset_index()
sales_by_month['Year-Month'] = sales_by_month['Year'].astype(str) + '-' + sales_by_month['Month'].astype(str)

plt.figure(figsize=(12,6))
sns.lineplot(data=sales_by_month, x='Year-Month', y='Sales', marker='o')
plt.title('Monthly Sales Over Time')
plt.xticks(rotation=45)
plt.xlabel('Year-Month')
plt.ylabel('Sales')
plt.show()

### 3.4 Profit and Discount Analysis

In [None]:
profit_by_category = df.groupby('Category')['Profit'].sum().sort_values(ascending=False)
plt.figure(figsize=(8,5))
sns.barplot(x=profit_by_category.index, y=profit_by_category.values)
plt.title('Total Profit by Category')
plt.xlabel('Category')
plt.ylabel('Profit')
plt.show()

In [None]:
plt.figure(figsize=(8,6))
sns.scatterplot(x='Discount', y='Profit', data=df)
plt.title('Discount vs Profit')
plt.show()

## 4. Geographic Analysis

In [None]:
sales_by_region = df.groupby('Region')['Sales'].sum().sort_values(ascending=False)
plt.figure(figsize=(8,5))
sns.barplot(x=sales_by_region.index, y=sales_by_region.values)
plt.title('Total Sales by Region')
plt.xlabel('Region')
plt.ylabel('Sales')
plt.show()

In [None]:
top_states = df.groupby('State')['Sales'].sum().sort_values(ascending=False).head(10)
plt.figure(figsize=(12,6))
sns.barplot(x=top_states.index, y=top_states.values)
plt.title('Top 10 States by Sales')
plt.xticks(rotation=45)
plt.xlabel('State')
plt.ylabel('Sales')
plt.show()

## 5. Customer Segmentation Using RFM Analysis

In [None]:
# Recency, Frequency, Monetary (RFM) analysis
import datetime as dt

# Reference date (day after last order)
snapshot_date = df['Order Date'].max() + pd.Timedelta(days=1)

# Calculate R, F, M
rfm = df.groupby('Customer ID').agg({
    'Order Date': lambda x: (snapshot_date - x.max()).days,
    'Order ID': 'nunique',
    'Sales': 'sum'
})
rfm.rename(columns={'Order Date': 'Recency', 'Order ID': 'Frequency', 'Sales': 'Monetary'}, inplace=True)

# Display head
rfm.head()

### 5.1 RFM Score and Segmentation

In [None]:
# Assign scores 1-5 for each R, F, M
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=range(5,0,-1))
rfm['F_Score'] = pd.qcut(rfm['Frequency'], 5, labels=range(1,6))
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=range(1,6))

# Calculate RFM Segment
rfm['RFM_Segment'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str)

# Calculate RFM Score
rfm['RFM_Score'] = rfm[['R_Score', 'F_Score', 'M_Score']].astype(int).sum(axis=1)

rfm.head()

### 5.2 Visualize RFM Segments Distribution

In [None]:
plt.figure(figsize=(10,5))
sns.countplot(x='RFM_Score', data=rfm)
plt.title('Distribution of RFM Scores')
plt.show()

## 6. Conclusion
This analysis provided valuable insights into the Superstore's sales performance, profitability, and customer segments. 

Key points:
- Electronics and Furniture are the top sales categories
- Sales and profit trends show seasonality
- Customer segmentation via RFM can guide marketing efforts

Next steps could include predictive modeling for sales forecasting or deeper customer behavior analysis.