# Capstone Project: Customer Order Insights and Delivery Tracker

Week 2 – Data Processing with Python
Tools: Python (Pandas, NumPy, Requests)

Capstone Tasks:
- Use Python to load customer order data from an API or CSV
- Clean missing fields and convert timestamps
- Use NumPy to calculate delivery delays
- Show top delayed customers and most common delivery issues

Deliverables:
- Cleaned and processed order dataset
- Python script that prints delay summary by customer

##  Step 1: Upload CSVs to Colab

In [1]:
from google.colab import files
uploaded = files.upload()

Saving customers.csv to customers.csv
Saving delivery_status.csv to delivery_status.csv
Saving orders.csv to orders.csv


## STEP 2: Load the Data

In [2]:
import pandas as pd
import numpy as np

# Load the CSVs
customers_df = pd.read_csv("customers.csv")
orders_df = pd.read_csv("orders.csv")
delivery_status_df = pd.read_csv("delivery_status.csv")

## STEP 3: Clean Missing Fields and Convert Timestamps

In [3]:
# Check for missing values
print(customers_df.isnull().sum())
print(orders_df.isnull().sum())
print(delivery_status_df.isnull().sum())

# Convert date columns
orders_df['order_date'] = pd.to_datetime(orders_df['order_date'])
orders_df['delivery_date'] = pd.to_datetime(orders_df['delivery_date'])
delivery_status_df['updated_at'] = pd.to_datetime(delivery_status_df['updated_at'])

customer_id     0
name            0
contact_info    0
address         0
dtype: int64
order_id         0
customer_id      0
product_id       0
order_date       0
delivery_date    0
status           0
dtype: int64
status_id         0
order_id          0
current_status    0
updated_at        0
dtype: int64


## STEP 4: Calculate Delivery Delays

In [5]:
# Merge orders with delivery status
merged_df = pd.merge(orders_df, delivery_status_df, left_on='order_id', right_on='order_id', how='inner')

# Calculate delay in days: actual delivery - expected delivery
merged_df['delay_days'] = (merged_df['updated_at'] - merged_df['delivery_date']).dt.days

# If delay_days is negative, set it to 0
merged_df['delay_days'] = merged_df['delay_days'].apply(lambda x: x if x > 0 else 0)

# Add a delayed flag
merged_df['delayed'] = np.where(merged_df['delay_days'] > 0, 1, 0)


## STEP 5: Top Delayed Customers

In [6]:
# Merge with customers to get names
final_df = pd.merge(merged_df, customers_df, left_on='customer_id', right_on='customer_id', how='inner')

# Group by customer and count delays
delay_summary = final_df.groupby(['customer_id', 'name'])['delayed'].sum().reset_index()
delay_summary = delay_summary.sort_values(by='delayed', ascending=False)

print("Top Delayed Customers:")
print(delay_summary)


Top Delayed Customers:
   customer_id          name  delayed
0            1  Anjali Mehta        0
1            2  Rohit Sharma        0
2            3   Priya Reddy        0
3            4   Karan Patel        0
4            5    Neha Singh        0
5            6    Amit Verma        0
6            7    Sneha Iyer        0
7            8  Rajiv Kapoor        0
8            9    Divya Nair        0
9           10   Manish Jain        0


## STEP 6: Most Common Delivery Issues

In [7]:
# Count most common current_status updates
issue_summary = merged_df['current_status'].value_counts()

print("Most Common Delivery Statuses:")
print(issue_summary)


Most Common Delivery Statuses:
current_status
Delivered     5
Shipped       2
Processing    2
Cancelled     1
Name: count, dtype: int64


## STEP 7: Save Cleaned Data

In [8]:
final_df.to_csv("cleaned_orders.csv", index=False)
files.download("cleaned_orders.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [9]:
# Read and display the saved CSV
df1 = pd.read_csv("cleaned_orders.csv")
display(df1)

Unnamed: 0,order_id,customer_id,product_id,order_date,delivery_date,status,status_id,current_status,updated_at,delay_days,delayed,name,contact_info,address
0,1,1,101,2025-07-01,2025-07-04,Delivered,1,Delivered,2025-07-04 18:00:00,0,0,Anjali Mehta,anjali@example.com,"Mumbai, India"
1,2,2,102,2025-07-02,2025-07-05,Delivered,2,Delivered,2025-07-05 17:30:00,0,0,Rohit Sharma,rohit@example.com,"Delhi, India"
2,3,3,103,2025-07-03,2025-07-06,Shipped,3,Shipped,2025-07-06 12:45:00,0,0,Priya Reddy,priya@example.com,"Hyderabad, India"
3,4,4,104,2025-07-04,2025-07-07,Processing,4,Processing,2025-07-07 10:15:00,0,0,Karan Patel,karan@example.com,"Ahmedabad, India"
4,5,5,105,2025-07-05,2025-07-09,Cancelled,5,Cancelled,2025-07-08 09:00:00,0,0,Neha Singh,neha@example.com,"Pune, India"
5,6,6,106,2025-07-06,2025-07-10,Delivered,6,Delivered,2025-07-10 19:10:00,0,0,Amit Verma,amit@example.com,"Chennai, India"
6,7,7,107,2025-07-07,2025-07-11,Shipped,7,Shipped,2025-07-11 13:20:00,0,0,Sneha Iyer,sneha@example.com,"Bangalore, India"
7,8,8,108,2025-07-08,2025-07-12,Delivered,8,Delivered,2025-07-12 20:30:00,0,0,Rajiv Kapoor,rajiv@example.com,"Kolkata, India"
8,9,9,109,2025-07-09,2025-07-13,Processing,9,Processing,2025-07-13 11:50:00,0,0,Divya Nair,divya@example.com,"Kochi, India"
9,10,10,110,2025-07-10,2025-07-14,Delivered,10,Delivered,2025-07-14 16:40:00,0,0,Manish Jain,manish@example.com,"Jaipur, India"
