# Improving Delivery Efficiency and Customer Segmentation in E-Commerce

#### Domain: E-commerce

## **Problem Statement**  
In the fast-paced e-commerce industry, timely delivery and personalized customer engagement are critical for maintaining a competitive edge. However, delays and cancellations in order deliveries are common and occur due to inconsistent logistics operations, lack of proactive planning, and unforeseen supply chain disruptions. These inefficiencies are often accepted as part of operations but have a significant impact on customer satisfaction, loyalty, and business reputation. These delays not only frustrate customers but also lead to increased returns, cancellations, and negative brand perception.  

Additionally, a lack of proper customer segmentation results in generic marketing strategies and poorly targeted loyalty programs. Without understanding customer behavior, businesses fail to differentiate between high-value customers and those driven by discounts, leading to missed opportunities to enhance customer retention and profitability. These issues collectively impact customer satisfaction, operational efficiency, and long-term growth.

Similarly, most e-commerce platforms rely on broad, one-size-fits-all marketing strategies, missing opportunities to cater to diverse customer preferences. Customers exhibit varied behaviors, such as high-frequency purchasing, discount sensitivity, or region-specific needs, which often go unaddressed. This lack of tailored approaches leaves room for improvement in both operational efficiency and customer engagement.

### **How AI Helps**  
AI presents a transformative opportunity to address these challenges by leveraging advanced data analytics and machine learning techniques. Predictive models can analyze historical order and shipping data to identify patterns that lead to delivery delays or cancellations. By proactively flagging high-risk orders, businesses can optimize their logistics, reallocate resources, and notify customers in advance, significantly improving delivery reliability and customer satisfaction.  

Similarly, AI-driven clustering techniques can segment customers based on their purchasing behavior, order frequency, and profitability. This allows businesses to uncover hidden patterns, design personalized marketing campaigns, and prioritize high-value customers. By turning data into actionable insights, AI enables e-commerce platforms to move beyond generic strategies, creating tailored solutions that enhance customer loyalty and operational efficiency.  

## **Objective**  
1. **Classification Task:** Predict delivery delays and cancellations using machine learning to proactively address potential disruptions, improve logistics, and enhance delivery reliability, ultimately reducing customer dissatisfaction and operational costs.  
2. **Clustering Task:** Segment customers based on spending habits, order frequency, and profitability to enable personalized marketing, loyalty programs, and region-specific strategies, improving customer retention and resource optimization.  

In [1]:
# importing required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Data Overview

In [2]:
# importing the data
data = pd.read_csv(r"Ecommerce_data.csv")
data.head()

Unnamed: 0,customer_id,customer_first_name,customer_last_name,category_name,product_name,customer_segment,customer_city,customer_state,customer_country,customer_region,...,order_date,order_id,ship_date,shipping_type,days_for_shipment_scheduled,days_for_shipment_real,order_item_discount,sales_per_order,order_quantity,profit_per_order
0,C_ID_45866,Mary,Fuller,Office Supplies,Xerox 1913,Corporate,New Rochelle,New York,United States,East,...,11/5/2022,O_ID_3001072,11/7/2022,Second Class,2,2,35.0,500.0,5,223.199997
1,C_ID_44932,Alan,Edelman,Office Supplies,#6 3/4 Gummed Flap White Envelopes,Corporate,Houston,Texas,United States,Central,...,20-06-2022,O_ID_3009170,23-06-2022,Second Class,2,3,85.0,500.0,5,199.199997
2,C_ID_70880,Mary,Gayman,Office Supplies,Belkin 8 Outlet Surge Protector,Consumer,Louisville,Kentucky,United States,South,...,25-06-2022,O_ID_3047567,30-06-2022,Standard Class,4,5,75.0,44.0,5,195.5
3,C_ID_33157,Raymond,Eason,Office Supplies,GBC VeloBinder Manual Binding System,Corporate,Chicago,Illinois,United States,Central,...,10/6/2022,O_ID_3060575,10/10/2022,Second Class,2,4,60.0,254.0,1,220.0
4,C_ID_58303,Mary,Gonzalez,Furniture,Eldon Pizzaz Desk Accessories,Home Office,Philadelphia,Pennsylvania,United States,East,...,2/5/2022,O_ID_3064311,8/1/2022,First Class,1,2,125.0,500.0,1,97.5


Here's a brief description of each column in the dataset:

1. **customer_id**: Unique identifier for each customer, used for tracking individual purchases and behaviors.
2. **customer_first_name**: First name of the customer for personal identification.
3. **customer_last_name**: Last name of the customer for personal identification.
4. **category_name**: Category of the product purchased, such as "Office Supplies" or "Furniture."
5. **product_name**: The name of the specific product ordered, e.g., "Xerox 1913" or "Belkin 8 Outlet Surge Protector."
6. **customer_segment**: Classification of customers into segments like "Corporate," "Home Office," or "Consumer" based on their business relationship or purchasing behavior.
7. **customer_city**: The city where the customer resides.
8. **customer_state**: The state or region where the customer resides.
9. **customer_country**: The country where the customer is located.
10. **customer_region**: The broader geographical region (e.g., East, West, South, North) where the customer is based.
11. **delivery_status**: The delivery status of the order, which could be "Shipping on time," "Late delivery," or "Shipping canceled."
12. **order_date**: The date when the customer placed the order.
13. **order_id**: Unique identifier for each order, helping to link various order details together.
14. **ship_date**: The date when the order was actually shipped out.
15. **shipping_type**: The shipping method used, such as "Standard Class" or "Second Class."
16. **days_for_shipment_scheduled**: The number of days the order was scheduled to take for delivery.
17. **days_for_shipment_real**: The actual number of days it took for the order to be delivered.
18. **order_item_discount**: The discount applied to the items in the order.
19. **sales_per_order**: The total sales value for the order.
20. **order_quantity**: The quantity of items ordered in the specific order.
21. **profit_per_order**: The profit earned on the order after all expenses.

These columns capture a blend of customer demographics, product details, and logistical aspects, making the dataset useful for analyzing operational performance and customer behaviors in e-commerce.

In [17]:
print("No of rows and columns in the data: ", data.shape)

No of rows and columns in the data:  (113270, 21)


* The dataset mirrors the type of data used by e-commerce companies to track key customer behaviors, order details, and logistics, making it highly relevant for tackling real-world operational challenges. It offers insights into issues such as **delivery delays** and **cancellations**, as well as **customer segmentation** for personalized marketing strategies. With features like shipping details, order information, and customer demographics, the dataset can be used to predict delivery outcomes (classification) and segment customers based on purchasing patterns and profitability (clustering). 


* The size and complexity of the dataset further enhance its value. It includes a variety of features such as numerical data (e.g., sales, profit), categorical data (e.g., customer segments, delivery status), and time-related attributes (e.g., order and shipping dates). This diversity makes the dataset suitable for both classification and clustering tasks, offering a comprehensive view of e-commerce operations.



* By analyzing this data, businesses can address critical challenges that directly impact **customer satisfaction**, **loyalty**, and **operational efficiency**. It enables companies to enhance **logistics operations**, reduce delays, and offer better customer service. Additionally, it aids in **targeting the right customer segments** with tailored marketing strategies, ultimately improving **customer retention** and boosting **revenue**. This makes the dataset a valuable resource for driving improvements in e-commerce business performance and decision-making.

**Let's have deeper look into the data**

We'll look into the data understand its size, the null values, view the numerical and categorical columns check if there are any inconsistensis, check for duplicated records, chcek if there are any outliers present in numerical columns 

### Data Exploration

In this section, we will perform a detailed exploration of the dataset to understand its structure and quality. We will examine the following aspects:

- **Size of the Data**: Check the dimensions of the dataset to understand its scale.
- **Missing Values**: Identify any null or missing values in the dataset.
- **Numerical and Categorical Columns**: Categorize and analyze the numerical and categorical features to ensure correct data representation.
- **Inconsistencies**: Look for any inconsistencies or errors within the data.
- **Duplicated Records**: Detect and handle any duplicate entries in the dataset.
- **Outliers**: Investigate if any numerical columns contain outliers that could impact model performance. 

This step will help ensure the dataset is clean and ready for further analysis and model development.