<a href="https://colab.research.google.com/github/cpython-projects/da_vn/blob/main/lesson_08_part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Legend**
You are given an e-commerce dataset (`ecommerce_data.csv`).

### E-commerce Legend


| Column Name         | Description |
|---------------------|-------------|
| `order_id`          | Unique identifier for each order |
| `customer_id`       | Unique identifier for the customer |
| `order_date`        | Date when the order was placed |
| `product_id`        | Unique identifier for the product |
| `product_name`      | Name of the purchased product |
| `category`          | Product category (e.g. Electronics, Fashion) |
| `price`             | Unit price of the product (in USD) |
| `quantity`          | Quantity of the product ordered |
| `weight`            | Weight of the product (e.g., "0.5kg") |
| `discount`          | Discount applied on the product (in decimal, e.g. 0.15 = 15%) |
| `shipping_cost`     | Cost to ship the product |
| `payment_method`    | Method used for payment (e.g., Credit Card, PayPal, Debit) |
| `delivery_status`   | Status of delivery (e.g., Delivered, Shipped, Processing) |
| `customer_city`     | Customer's city |
| `customer_state`    | Customer's state |
| `customer_country`  | Customer's country |
| `return_requested`  | 1 if a return was requested, 0 otherwise |
| `review_score`      | Customer review rating (1 to 5) |
| `days_to_deliver`   | Number of days it took to deliver the product |

---

### **Data Reading**

In [1]:
# Import Required Libraries
import pandas as pd
import plotly.express as px
from google.colab import files

In [2]:
uploaded = files.upload()

Saving ecommerce_data.csv to ecommerce_data.csv


In [3]:
df = pd.read_csv("ecommerce_data.csv")

In [4]:
df.head()

Unnamed: 0,order_id,customer_id,order_date,product_id,product_name,category,price,quantity,weight,discount,shipping_cost,payment_method,delivery_status,customer_city,customer_state,customer_country,return_requested,review_score,days_to_deliver
0,1001,C101,2023-01-15,P001,Smartphone X,Electronics,599.99,1,0.5kg,0.1,5.99,Credit Card,Delivered,New York,NY,USA,0,5.0,3.0
1,1002,C102,2023-01-16,P002,Laptop Pro,Electronics,1299.99,1,2.2kg,0.15,12.99,paypal,Delivered,los angeles,CA,USA,1,4.0,5.0
2,1003,C103,2023-01-17,P003,Wireless Earbuds,Electronics,79.99,2,0.1kg,0.0,,Credit Card,Shipped,Chicago,IL,USA,0,,
3,1004,C104,2023-01-18,P004,Smart Watch,Electronics,199.99,1,0.3kg,0.05,4.99,debit,Delivered,Houston,TX,USA,0,5.0,4.0
4,1005,C105,2023-01-19,P005,Tablet Mini,Electronics,299.99,1,0.7kg,,6.99,credit,Processing,PHOENIX,AZ,USA,1,2.0,


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 19 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   order_id          40 non-null     int64  
 1   customer_id       40 non-null     object 
 2   order_date        40 non-null     object 
 3   product_id        40 non-null     object 
 4   product_name      40 non-null     object 
 5   category          40 non-null     object 
 6   price             40 non-null     float64
 7   quantity          40 non-null     int64  
 8   weight            40 non-null     object 
 9   discount          26 non-null     float64
 10  shipping_cost     30 non-null     float64
 11  payment_method    40 non-null     object 
 12  delivery_status   40 non-null     object 
 13  customer_city     40 non-null     object 
 14  customer_state    40 non-null     object 
 15  customer_country  40 non-null     object 
 16  return_requested  40 non-null     int64  
 17 

In [6]:
df.describe()

Unnamed: 0,order_id,price,quantity,discount,shipping_cost,return_requested,review_score,days_to_deliver
count,40.0,40.0,40.0,26.0,30.0,40.0,38.0,21.0
mean,1040.975,263.615,1.425,0.078846,8.956667,0.15,4.026316,4.047619
std,156.727837,273.068949,0.873763,0.07372,5.005055,0.36162,1.102499,1.023533
min,1001.0,24.99,1.0,0.0,2.99,0.0,1.0,3.0
25%,1009.0,77.49,1.0,0.0,4.99,0.0,3.25,3.0
50%,1016.5,179.99,1.0,0.1,6.99,0.0,4.0,4.0
75%,1024.25,312.49,1.0,0.1,12.99,0.0,5.0,5.0
max,2006.0,1299.99,4.0,0.25,19.99,1.0,5.0,6.0


### **Data Cleaning**

**Convert columns to proper numeric types (handle errors gracefully)**

**Drop rows with missing key values**

### **ABC ANALYSIS**

**ABC Analysis** is an inventory categorization method that helps you identify the most important products based on their **contribution to overall sales or revenue**. It's based on the **Pareto Principle (80/20 rule)** — where a small percentage of products usually contribute to a large percentage of revenue.

#### ABC Classification Logic

- **A-class items**  
  🔹 ~10-20% of products  
  🔹 Contribute ~70-80% of total sales  
  🔹 High priority: closely monitored and optimized

- **B-class items**  
  🔹 ~30% of products  
  🔹 Contribute ~15-25% of total sales  
  🔹 Moderate priority: managed regularly

- **C-class items**  
  🔹 ~50% of products  
  🔹 Contribute ~5% of total sales  
  🔹 Low priority: reviewed occasionally

**Why Use ABC Analysis?**
- Focus on the most **profitable products**
- Improve **inventory planning** and **cash flow**
- Set smarter **pricing**, **discount**, and **restocking** strategies

#### Steps

**ABC Analysis Insight:**

### **XYZ ANALYSIS**
**XYZ Analysis** is a product segmentation method that classifies items based on the **consistency and variability of their sales over time**. Unlike ABC analysis (which is based on revenue), XYZ focuses on **demand stability** and **predictability**, helping businesses improve **forecasting and inventory control**.

#### XYZ Classification Logic

- **X-class items**  
  🔹 Stable, regular sales over time  
  🔹 Low demand variability (Coefficient of Variation < 0.5)  
  🔹 Easy to forecast → high priority for automation & planning  

- **Y-class items**  
  🔹 Moderate variability in sales  
  🔹 Some seasonality or trends (CV between 0.5 and 1.0)  
  🔹 Requires manual review & adjusted forecasting  

- **Z-class items**  
  🔹 Highly irregular sales  
  🔹 High demand variability (CV > 1.0)  
  🔹 Difficult to forecast → avoid overstocking

**Why Use XYZ Analysis?**

- Improve **forecast accuracy**
- Reduce **stockouts** and **excess inventory**
- Identify **volatile vs. reliable** products
- Optimize **reordering and safety stock levels**

#### Steps

**XYZ Analysis Insight:**