# <font size=8 color=steelblue >**Phase 3: Problem Statement — RetailSmart Advanced Analytics: Unleashing the Power of Data!**
------------------
-----------------

# <font size=5 color=lightseagreen >Business Context
---

RetailSmart is an omni-channel e-commerce company that has already built strong capabilities in data cleaning, exploratory analysis, and churn prediction.
While the predictive model (from Phase 2) helps the company identify which customers are likely to churn, senior management now wants to deepen analytical intelligence in three critical areas:  

- **Customer Segmentation** — to personalize campaigns and retention offers
- **Demand Forecasting** — to plan production, procurement, and logistics
- **Cross-Sell Recommendations** — to increase average order value through bundled sales

The goal of this phase is to use unsupervised learning, time-series analysis, and association rule mining to discover patterns and trends that cannot be easily captured through supervised modeling.

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from statsmodels.tsa.arima.model import ARIMA
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
warnings.filterwarnings('ignore')  # Shhh, ignore those pesky warnings

# Set up plotting style for pretty visuals
sns.set_style("darkgrid")
plt.style.use("seaborn-v0_8")

In [2]:
# Load cleaned data
customers = pd.read_csv('/content/customers_cleaned.csv')
sales = pd.read_csv('/content/sales_cleaned.csv')
marketing = pd.read_csv('/content/marketing_cleaned.csv')
products = pd.read_csv('/content/products_cleaned.csv')

In [4]:
# Collect info of all datasets
datasets = {
    "Customers": customers,
    "Marketing": marketing,
    "Products": products,
    "Sales": sales
}

for name, df in datasets.items():
    print(f"\n--- {name} Dataset Info ---")
    df.info()



--- Customers Dataset Info ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99441 entries, 0 to 99440
Data columns (total 10 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   customer_id               99441 non-null  object 
 1   customer_unique_id        99441 non-null  object 
 2   customer_zip_code_prefix  99441 non-null  int64  
 3   city                      99441 non-null  object 
 4   state                     99441 non-null  object 
 5   total_orders              99441 non-null  float64
 6   total_spent               99441 non-null  float64
 7   last_order                99441 non-null  object 
 8   days_since_last_order     99441 non-null  float64
 9   churn_flag                99441 non-null  int64  
dtypes: float64(3), int64(2), object(5)
memory usage: 7.6+ MB

--- Marketing Dataset Info ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 8 columns):
 

1. <font color=skyblue >Customer Segmentation (Unsupervised Learning)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- **Group customers into meaningful segments using behavioral and marketing features
(e.g., Recency, Frequency, Monetary, Avg Spend, Response Rate).**

------

- **Identify cluster profiles such as High-Value Loyalists, Price-Sensitive Frequent Buyers,
or At-Risk Customers.**

---------

- **Visualize segment separation using PCA and summarize behavioral differences
between clusters.**

---

- **Generate and save a summary of customer clusters as
data_outputs/cluster_summary.csv.**

---

2. <font color=skyblue >Demand Forecasting (Time Series Analysis)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- **Analyze historical order and revenue trends.**

----

- **Build a forecasting model to predict order volumes and total revenue for the next 6
months.**

---

- **Provide insights on seasonality and growth trends for operational planning.**

---

- **Save actual vs. forecasted monthly order volumes as
data_outputs/forecast_results.csv.**

-----

- **Save the top product-pair associations (support, confidence, lift) as
data_outputs/association_rules.csv.**

---