<a href="https://colab.research.google.com/github/cpython-projects/da_vn/blob/main/lesson_08_part_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Legend**

**Dataset: `ecommerce_orders_2024.csv`**  
This dataset contains synthetic e-commerce order data for the year 2024. It is designed for sales analysis, including ABC/XYZ classification and sales dynamics studies.

---

**Column Descriptions**

| Column Name     | Type         | Description |
|-----------------|--------------|-------------|
| `order_id`      | `str`        | Unique identifier for each order. |
| `order_date`    | `datetime`   | Date when the order was placed (format: YYYY-MM-DD). |
| `customer_id`   | `str`        | Unique identifier of the customer who placed the order. |
| `customer_name` | `str`        | Full name of the customer. |
| `customer_city` | `str`        | City where the customer is located. |
| `product`       | `str`        | Full product name (including brand and model). |
| `category`      | `str`        | Product category (e.g., Electronics, Apparel, Home Goods). |
| `brand`         | `str`        | Brand or manufacturer of the product. |
| `sku`           | `str`        | Stock Keeping Unit — internal product code. |
| `quantity`      | `int`        | Number of units purchased in the order. |
| `price`         | `float`      | Price per unit (before discount). |
| `discount`      | `float`      | Discount amount applied to the order line (absolute, not %). |
| `total`         | `float`      | Final total for the order line (quantity × price − discount). |
| `payment_method`| `str`        | Payment type used (e.g., Credit Card, PayPal, Bank Transfer). |
| `shipping_cost` | `float`      | Cost of shipping this order. |
| `shipment_id`   | `str`        | Unique identifier for the shipment associated with the order. |
| `shipment_date` | `datetime`   | Date the order was shipped. |
| `is_returned`   | `bool`       | Whether the product was returned (`True` or `False`). |
| `sales_channel` | `str`        | How the product was sold (e.g., Website, Mobile App, Marketplace). |
| `region`        | `str`        | Regional classification (e.g., East, West, Central). |

---

### **Data Reading**

In [2]:
# Import Required Libraries
import pandas as pd
import plotly.express as px
from google.colab import files

In [3]:
uploaded = files.upload()

Saving ecommerce_orders_2024.csv to ecommerce_orders_2024.csv


In [4]:
df = pd.read_csv("ecommerce_orders_2024.csv")

In [5]:
df.head()

Unnamed: 0,order_date,order_id,product,category,brand,quantity,price,discount,total,customer_id,country
0,2024-09-24,10682d3c-2370-4cb6-8671-0264e33002db,Acer Swift 5,,,3,1052.85,,2526.84,e4c7a35c-ec48-4aeb-8952-0f2c4a7e6180,Saint Barthelemy
1,2024-11-13,cf48ccf7-d627-45c6-bda0-0a271bfe4fb3,Bose QC45,,,2,187.89,,300.62,917e33e2-ce20-4beb-8d02-c7de45dbfd5a,Belize
2,2024-05-08,ac46015e-906e-435a-894e-b0682265785a,Sennheiser Momentum 4,,,3,434.37,,1303.11,fc89338c-1df9-48f7-9037-3e7bec498864,Israel
3,2024-03-28,36facbfd-a75e-4506-ae15-3c5559153000,Garmin Forerunner 265,,,4,952.62,,3429.43,ab5c9f13-e1cd-4c57-884c-bc6f4b379aac,Lithuania
4,2024-01-31,cf5b98ef-4969-4157-9214-67df55f2a360,MacBook Pro 16,,,4,1917.9,,7288.02,05285afa-0f5e-4b90-8f7c-31e549a03788,Uzbekistan


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50500 entries, 0 to 50499
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   order_date   50500 non-null  object 
 1   order_id     50500 non-null  object 
 2   product      50500 non-null  object 
 3   category     49979 non-null  object 
 4   brand        49979 non-null  object 
 5   quantity     50500 non-null  object 
 6   price        50195 non-null  float64
 7   discount     49979 non-null  float64
 8   total        50500 non-null  float64
 9   customer_id  50500 non-null  object 
 10  country      50500 non-null  object 
dtypes: float64(3), object(8)
memory usage: 4.2+ MB


In [7]:
df.describe()

Unnamed: 0,price,discount,total
count,50195.0,49979.0,50500.0
mean,1275.067036,0.083579,2929.28369
std,706.186833,0.074445,2325.638137
min,50.0,0.0,40.41
25%,662.385,0.0,1164.6175
50%,1280.22,0.1,2285.56
75%,1881.58,0.15,4150.995
max,2499.92,0.2,21663.81


### **Data Cleaning**

**Convert columns to proper numeric types (handle errors gracefully)**

In [8]:
df["quantity"] = pd.to_numeric(df["quantity"], errors="coerce")
df["price"] = pd.to_numeric(df["price"], errors="coerce")
df["discount"] = pd.to_numeric(df["discount"], errors="coerce")
df["total"] = pd.to_numeric(df["total"], errors="coerce")
df["order_date"] = pd.to_datetime(df["order_date"], errors="coerce")

**Drop rows with missing key values**

In [9]:
df_clean = df.dropna(subset=["product", "quantity", "price", "discount", "total", "order_date"])

### **ABC ANALYSIS**

**ABC Analysis** is an inventory categorization method that helps you identify the most important products based on their **contribution to overall sales or revenue**. It's based on the **Pareto Principle (80/20 rule)** — where a small percentage of products usually contribute to a large percentage of revenue.

#### ABC Classification Logic

- **A-class items**  
  🔹 ~10-20% of products  
  🔹 Contribute ~70-80% of total sales  
  🔹 High priority: closely monitored and optimized

- **B-class items**  
  🔹 ~30% of products  
  🔹 Contribute ~15-25% of total sales  
  🔹 Moderate priority: managed regularly

- **C-class items**  
  🔹 ~50% of products  
  🔹 Contribute ~5% of total sales  
  🔹 Low priority: reviewed occasionally

**Why Use ABC Analysis?**
- Focus on the most **profitable products**
- Improve **inventory planning** and **cash flow**
- Set smarter **pricing**, **discount**, and **restocking** strategies

#### Steps

In [10]:
# Group sales by product
product_sales = df_clean.groupby("product")["total"].sum().reset_index()
product_sales = product_sales.sort_values(by="total", ascending=False).reset_index(drop=True)

In [11]:
# Calculate cumulative totals and percentages
product_sales["cum_total"] = product_sales["total"].cumsum()
product_sales["cum_percent"] = product_sales["cum_total"] / product_sales["total"].sum()

In [12]:
# Define ABC classification based on cumulative percentage
def classify_abc(p):
    if p <= 0.8:
        return "A"
    elif p <= 0.95:
        return "B"
    else:
        return "C"

product_sales["ABC_class"] = product_sales["cum_percent"].apply(classify_abc)

In [13]:
# Summary - count of each class
abc_counts = product_sales["ABC_class"].value_counts()
print("ABC Category Counts:")
print(abc_counts)

ABC Category Counts:
ABC_class
A    33
B     7
C     3
Name: count, dtype: int64


In [14]:
# Show the 10 top products and their class
display(product_sales.head(10))

Unnamed: 0,product,total,cum_total,cum_percent,ABC_class
0,Asus ROG Zephyrus,3734471.24,3734471.24,0.025801,A
1,iPhone 15 Pro,3638403.46,7372874.7,0.050939,A
2,Nintendo Switch OLED,3601261.29,10974135.99,0.07582,A
3,Steam Deck,3559929.46,14534065.45,0.100415,A
4,MacBook Air M2,3537246.62,18071312.07,0.124854,A
5,MacBook Pro 16,3517634.57,21588946.64,0.149157,A
6,KitchenAid Stand Mixer,3511806.57,25100753.21,0.17342,A
7,PlayStation 5,3493720.76,28594473.97,0.197557,A
8,Lenovo Tab P12,3489488.15,32083962.12,0.221666,A
9,Acer Swift 5,3474675.57,35558637.69,0.245672,A


In [15]:
# Optional: Save results to CSV
product_sales.to_csv("abc_classified_products.csv", index=False)
print("ABC classification saved to 'abc_classified_products.csv'")
files.download("abc_classified_products.csv")

ABC classification saved to 'abc_classified_products.csv'


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

#### ABC Analysis Insight

After performing the ABC analysis on the 2024 e-commerce sales data, the following distribution was observed:

| Class | Number of Products | Share of Total |
|-------|--------------------|----------------|
| **A** | 33 products        | ~77%           |
| **B** | 7 products         | ~16%           |
| **C** | 3 products         | ~7%            |

---

🔍 **Key Insight:**

> **A significant majority (33 out of 43) of products fall into Class A**, indicating that most items in the catalog are responsible for a substantial share of total revenue.  
> This is a notable deviation from the typical Pareto pattern, where Class A usually comprises a smaller portion of the product range.

---

🧠 **Interpretation:**

- **Class A products** are not only high-value but also widespread across the assortment. This suggests that the business currently relies on a broad portfolio of strong-selling products, rather than a few key SKUs.
- **Class B products** form a small middle tier, possibly under-leveraged or under-promoted.
- **Class C products** are minimal, suggesting low assortment inefficiency — the catalog is relatively optimized with very few underperforming items.

---

✅ **Strategic Recommendations:**

- **Double down on A-class products**: Since many products are performing well, focus on maintaining high availability, dynamic pricing, and cross-sell strategies.
- **Review B-class potential**: Explore ways to move B-class items into A (via bundling, targeting, or promotions).
- **Minimal attention needed for C-class**: With only 3 items in this class, consider whether they are niche essentials or candidates for removal.

---

### **XYZ ANALYSIS**
**XYZ Analysis** is a product segmentation method that classifies items based on the **consistency and variability of their sales over time**. Unlike ABC analysis (which is based on revenue), XYZ focuses on **demand stability** and **predictability**, helping businesses improve **forecasting and inventory control**.

#### XYZ Classification Logic

- **X-class items**  
  🔹 Stable, regular sales over time  
  🔹 Low demand variability (Coefficient of Variation < 0.5)  
  🔹 Easy to forecast → high priority for automation & planning  

- **Y-class items**  
  🔹 Moderate variability in sales  
  🔹 Some seasonality or trends (CV between 0.5 and 1.0)  
  🔹 Requires manual review & adjusted forecasting  

- **Z-class items**  
  🔹 Highly irregular sales  
  🔹 High demand variability (CV > 1.0)  
  🔹 Difficult to forecast → avoid overstocking

**Why Use XYZ Analysis?**

- Improve **forecast accuracy**
- Reduce **stockouts** and **excess inventory**
- Identify **volatile vs. reliable** products
- Optimize **reordering and safety stock levels**

#### Steps

In [16]:
# Monthly sales per product
monthly_sales = df_clean.copy()
monthly_sales["month"] = monthly_sales["order_date"].dt.to_period("M")
product_monthly = monthly_sales.groupby(["product", "month"])["total"].sum().reset_index()

In [17]:
# Calculate coefficient of variation (CoV) for each product
xyz_stats = product_monthly.groupby("product")["total"].agg(["mean", "std"]).reset_index()
xyz_stats["cov"] = xyz_stats["std"] / xyz_stats["mean"]

In [18]:
# Classify XYZ based on CoV
def classify_xyz(cov):
    if cov <= 0.5:
        return "X"
    elif cov <= 1.0:
        return "Y"
    else:
        return "Z"
xyz_stats["XYZ_class"] = xyz_stats["cov"].apply(classify_xyz)

In [19]:
# Merge ABC and XYZ classifications
abc_xyz = pd.merge(product_sales[["product", "ABC_class"]], xyz_stats[["product", "XYZ_class"]], on="product", how="inner")

In [20]:
# Output combined ABC-XYZ classification
print("Combined ABC-XYZ Classification (sample):")
display(abc_xyz.sample(10))

Combined ABC-XYZ Classification (sample):


Unnamed: 0,product,ABC_class,XYZ_class
0,Asus ROG Zephyrus,A,X
42,Instant Pot Duo Evo,C,X
9,Acer Swift 5,A,X
36,Apple Watch Series 9,B,X
6,KitchenAid Stand Mixer,A,X
35,Dell XPS 13,B,X
17,iPad Pro 12.9,A,X
40,Fitbit Charge 6,C,X
22,Sony Alpha a6400,A,X
39,Galaxy S23,B,X


In [21]:
# Count of each ABC-XYZ segment
abc_xyz['Segment'] = abc_xyz['ABC_class'] + abc_xyz['XYZ_class']
segment_counts = abc_xyz['Segment'].value_counts().sort_index()

print("Product count per ABC-XYZ segment:")
display(segment_counts)

Product count per ABC-XYZ segment:


Unnamed: 0_level_0,count
Segment,Unnamed: 1_level_1
AX,33
BX,7
CX,3


In [22]:
# Matrix view (pivot table)
segment_matrix = abc_xyz.pivot_table(index='ABC_class', columns='XYZ_class', aggfunc='size', fill_value=0)
print("ABC-XYZ Segment Matrix:")
display(segment_matrix)

ABC-XYZ Segment Matrix:


XYZ_class,X
ABC_class,Unnamed: 1_level_1
A,33
B,7
C,3


In [25]:
# Optional: Save to CSV
abc_xyz.to_csv("abc_xyz_classification.csv", index=False)
print("ABC-XYZ classification to 'abc_xyz_classification.csv'")
files.download("abc_xyz_classification.csv")

ABC-XYZ classification to 'abc_xyz_classification.csv'


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

#### XYZ Analysis Insight

After applying XYZ analysis and plotting sales dynamics using your 2024 e-commerce data, we observed the following:

🔎 XYZ Classification Results:  

| XYZ Class | Number of Products |
|-----------|--------------------|
| **X**     | 43 products         |
| **Y**     | 0                   |
| **Z**     | 0                   |

✅ **All products were classified as X-class**, meaning they have **stable, consistent sales patterns throughout the year**. This is uncommon and indicates a **well-performing, mature product portfolio**.

### **SALES DYNAMICS ANALYSIS**

**Sales Dynamics Analysis** involves tracking and visualizing how sales change over time — typically month by month. This helps:

- Detect **growth or decline trends**
- Identify **seasonal patterns**
- Analyze **marketing campaign impact**
- Improve **demand planning**

#### Steps

In [26]:
# Aggregate monthly total sales
monthly_dynamics = df_clean.copy()
monthly_dynamics["month"] = monthly_dynamics["order_date"].dt.to_period("M")
sales_by_month = monthly_dynamics.groupby("month")["total"].sum().reset_index()
sales_by_month["month"] = sales_by_month["month"].astype(str)

In [27]:
# Display monthly sales dynamics
print("Monthly Sales Dynamics:")
display(sales_by_month)

Monthly Sales Dynamics:


Unnamed: 0,month,total
0,2024-01,12522301.27
1,2024-02,11556846.55
2,2024-03,11975867.0
3,2024-04,12300847.84
4,2024-05,12222858.84
5,2024-06,11599428.65
6,2024-07,12542102.85
7,2024-08,11841024.13
8,2024-09,11920618.71
9,2024-10,12176330.57


In [28]:
fig = px.line(
    sales_by_month,
    x="month",
    y="total",
    markers=True,
    title="Monthly Sales Dynamics",
    labels={"month": "Month", "total": "Total Sales"},
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Total Sales",
    xaxis_tickangle=45,
    template="plotly_white",
    width=900,
    height=500
)

fig.show()

### Salse Dynamic Analysis Insight

- The **line chart** of monthly total sales shows a **relatively smooth and stable trend**, without major spikes or drops.
- There are **no extreme seasonal peaks**, suggesting steady customer demand and strong operational consistency.
- Such stability supports the accuracy of **long-term forecasting** and **automated inventory management**.



### Recommendations by ABC-XYZ Segment

In [29]:
recommendations = {
    "AX": "✅ Maintain high stock, automate replenishment. Core product line.",
    "AY": "📊 Monitor seasonal trends, adjust stock dynamically.",
    "AZ": "⚠️ High value but unstable – keep low stock, analyze instability causes.",
    "BX": "✅ Stable and mid-value – maintain availability, support with marketing.",
    "BY": "🌀 Moderate risk – watch trends, avoid overstocking.",
    "BZ": "❗ Volatile and mid-value – consider reducing SKUs or targeting promos.",
    "CX": "✅ Reliable fillers – keep small stock, ensure availability.",
    "CY": "🧊 Irregular demand, low value – stock very limited.",
    "CZ": "🚫 Consider delisting or heavy discounting – low value & unstable.",
}

# Display recommendations
print("Recommendations for ABC-XYZ Segments:")
for segment, advice in recommendations.items():
    print(f"{segment}: {advice}")

Recommendations for ABC-XYZ Segments:
AX: ✅ Maintain high stock, automate replenishment. Core product line.
AY: 📊 Monitor seasonal trends, adjust stock dynamically.
AZ: ⚠️ High value but unstable – keep low stock, analyze instability causes.
BX: ✅ Stable and mid-value – maintain availability, support with marketing.
BY: 🌀 Moderate risk – watch trends, avoid overstocking.
BZ: ❗ Volatile and mid-value – consider reducing SKUs or targeting promos.
CX: ✅ Reliable fillers – keep small stock, ensure availability.
CY: 🧊 Irregular demand, low value – stock very limited.
CZ: 🚫 Consider delisting or heavy discounting – low value & unstable.
