# 📊 EDA on Global Superstore Sales Dataset

This notebook performs an exploratory data analysis (EDA) on the Global Superstore dataset. We'll analyze sales, profit, discount, product categories, and customer segments to extract actionable insights.

In [None]:
# Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
# Step 2: Load the Dataset
df = pd.read_excel("Global Superstore Dataset.xlsx")
df.head()

In [None]:
# Step 3: Data Overview
df.shape
df.info()
df.describe()

In [None]:
# Step 4: Data Cleaning
df.drop(['Postal Code'], axis=1, inplace=True)
df.drop_duplicates(inplace=True)
df.isnull().sum()

In [None]:
# Step 5: Univariate Analysis
sns.histplot(df['Sales'], bins=30, kde=True)
plt.title("Distribution of Sales")
plt.show()

sns.countplot(data=df, x='Category')
plt.title("Order Count by Category")
plt.show()

In [None]:
# Step 6: Bivariate Analysis
sns.scatterplot(x='Sales', y='Profit', data=df)
plt.title("Sales vs Profit")
plt.show()

sns.boxplot(x='Category', y='Profit', data=df)
plt.title("Profit by Category")
plt.show()

In [None]:
# Step 7: Correlation Matrix
corr = df[['Sales', 'Quantity', 'Discount', 'Profit']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

In [None]:
# Step 8: Feature Engineering
df['ProfitMargin'] = (df['Profit'] / df['Sales']) * 100
df.groupby('Segment')['ProfitMargin'].mean().plot(kind='bar', title='Avg Profit Margin by Segment')
plt.ylabel("Profit Margin (%)")
plt.show()

### ✅ Key Insights
- Technology is the most profitable category.
- High discounts often result in losses.
- The Corporate segment shows higher profit margins on average.
- The West region has the highest total profit.