# Lesson 1 (Demo): Selecting a Chart Type with SuperStore Dataset


We will practice selecting appropriate chart types using a real-world dataset: **SuperStore Sales**.

Focus: **Categorical data** for comparisons, rankings, and proportions.

Charts covered: Vertical bar, Horizontal bar, Grouped/Clustered bar, Stacked bar, 100% Stacked bar, Pie, Donut, Stacked Area, 100% Area, Treemap.

**Dataset:** `superstore_dataset.csv`


## Step 1: Load the Dataset

In [None]:

import pandas as pd

# Load dataset (upload to Colab or mount from Drive)
df = pd.read_csv("superstore_dataset.csv")
df.head()


## Step 2: Explore the Data

In [None]:

df.info()
df.describe(include="all").T.head(10)


## Step 3: Preprocess Data

In [None]:

# Sales by Category
category_sales = df.groupby("Category")["Sales"].sum().reset_index().sort_values("Sales", ascending=False)

# Sales by Category & Region
cat_region_sales = df.groupby(["Category", "Region"])["Sales"].sum().reset_index()

# Top 8 Subcategories by Sales
subcat_sales = df.groupby("Sub-Category")["Sales"].sum().reset_index().sort_values("Sales", ascending=False).head(8)

category_sales, cat_region_sales.head(), subcat_sales


## Step 4: Visualizations

### 4.1 Vertical & Horizontal Bar Charts

In [None]:

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(6,4))
sns.barplot(data=category_sales, x="Category", y="Sales")
plt.title("Sales by Category (Vertical Bar)")
plt.show()

plt.figure(figsize=(6,4))
sns.barplot(data=category_sales, y="Category", x="Sales")
plt.title("Sales by Category (Horizontal Bar)")
plt.show()


### 4.2 Grouped / Clustered Bar Chart

In [None]:

plt.figure(figsize=(8,5))
sns.barplot(data=cat_region_sales, x="Category", y="Sales", hue="Region")
plt.title("Sales by Category & Region (Grouped Bar)")
plt.show()


### 4.3 Stacked & 100% Stacked Bar Charts

In [None]:

pivot_data = cat_region_sales.pivot(index="Category", columns="Region", values="Sales")

pivot_data.plot(kind="bar", stacked=True, figsize=(8,5), title="Sales by Category & Region (Stacked Bar)")
plt.show()

(pivot_data.div(pivot_data.sum(axis=1), axis=0)).plot(kind="bar", stacked=True, figsize=(8,5), title="Sales by Category & Region (100% Stacked Bar)")
plt.show()


### 4.4 Pie & Donut Charts

In [None]:

plt.figure(figsize=(5,5))
plt.pie(category_sales["Sales"], labels=category_sales["Category"], autopct="%1.1f%%")
plt.title("Sales by Category (Pie Chart)")
plt.show()

plt.figure(figsize=(5,5))
plt.pie(category_sales["Sales"], labels=category_sales["Category"], wedgeprops=dict(width=0.4), autopct="%1.1f%%")
plt.title("Sales by Category (Donut Chart)")
plt.show()


### 4.5 Stacked & 100% Area Charts

In [None]:

pivot_data.plot(kind="area", figsize=(8,5), title="Sales by Category & Region (Stacked Area)")
plt.show()

(pivot_data.div(pivot_data.sum(axis=1), axis=0)).plot(kind="area", figsize=(8,5), title="Sales by Category & Region (100% Stacked Area)")
plt.show()


### 4.6 Treemap

In [None]:

!pip install squarify
import squarify

plt.figure(figsize=(8,6))
squarify.plot(sizes=subcat_sales["Sales"], label=subcat_sales["Sub-Category"], alpha=0.8)
plt.title("Top 8 Subcategories by Sales (Treemap)")
plt.axis("off")
plt.show()


## Step 5: Wrap-Up & Discussion


- **Bar Charts:** Best for comparisons & rankings  
- **Stacked Charts:** Show part-to-whole within categories  
- **Pie/Donut:** Simple proportions (avoid too many slices)  
- **Area Charts:** Show composition trends  
- **Treemap:** Compact hierarchical proportions  

👉 Which chart best communicates the insight with least cognitive load?
