<a href="https://colab.research.google.com/github/AnamHJ24/datascience-python-challenges/blob/main/notebooks/Day_12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Day 12 - Walmart
You are a Data Analyst on the **Walmart**.com Insights team investigating customer return patterns. The team aims to develop a predictive approach to understanding customer return behaviors across different time periods. Your goal is to leverage transaction data to create a comprehensive view of customer return likelihood.

In [2]:
# Import required libraries
import pandas as pd
import numpy as np

# Import data file
url = "https://raw.githubusercontent.com/AnamHJ24/datascience-python-challenges/refs/heads/main/Data/Day_12.txt"
customer_returns = pd.read_csv(url)
customer_returns.head()

Unnamed: 0,order_id,order_date,customer_id,return_flag,order_amount
0,ORD0001,2024-07-05,CUST001,True,120.5
1,ORD0002,2024-07-10,CUST002,False,75.0
2,ORD0003,2024-08-15,CUST001,True,90.0
3,ORD0004,2024/09/01,CUST003,False,45.0
4,ORD0005,2024-10-20,CUST004,True,200.0


## Question 1
Identify and list all unique customer IDs who have made returns between July 1st 2024 and June 30th 2025. This will help us understand the base set of customers involved in returns during the specified period.

## Solution

In [3]:
# Change the required column to datetime
customer_returns['order_date'] = pd.to_datetime(customer_returns['order_date'], errors = "coerce").dropna()

# Filter for returns between given dates
customer_returns_filtered = customer_returns[
  (customer_returns['order_date'].between(pd.to_datetime('2024-07-01', format='%Y-%m-%d'),pd.to_datetime('2025-06-30', format='%Y-%m-%d'))) &
  (customer_returns['return_flag'])]

# Find the unique customers
num_unique_customers = customer_returns_filtered['customer_id'].unique()
print("The list of unique customers are:\n",num_unique_customers)

The list of unique customers are:
 ['CUST001' 'CUST004' 'CUST002' 'CUST006' 'CUST009' 'CUST005' 'CUST003'
 'CUST007']


## Question 2
Convert the 'order_date' column to a datetime format and create a MultiIndex with 'customer_id' and 'order_date'. Then, calculate the total number of returns per customer for each month. This will provide insights into monthly return patterns for each customer.

## Solution

In [4]:
# Create MultiIndex with customer_id and order_date
customer_returns.set_index(['customer_id', 'order_date'], inplace=True)

# Filter only returns (where return_flag is True)
returns_only = customer_returns[customer_returns['return_flag']]

# Resample by month and count returns per customer
monthly_returns = (returns_only.groupby(['customer_id', pd.Grouper(level='order_date', freq='M')])
    .size()
    .unstack(fill_value=0)
)

print("Monthly returns per customer:")
print(monthly_returns)

Monthly returns per customer:
order_date   2024-07-31  2024-08-31  2024-09-30  2024-10-31  2024-11-30  \
customer_id                                                               
CUST001               2           1           0           0           0   
CUST002               0           0           0           0           1   
CUST003               0           0           1           0           0   
CUST004               0           0           0           1           0   
CUST005               0           1           0           0           1   
CUST006               0           0           0           0           0   
CUST007               0           1           0           1           0   
CUST009               0           0           0           1           0   

order_date   2024-12-31  2025-01-31  2025-02-28  2025-03-31  2025-04-30  \
customer_id                                                               
CUST001               0           2           0           1          

  monthly_returns = (returns_only.groupby(['customer_id', pd.Grouper(level='order_date', freq='M')])
