<a href="https://colab.research.google.com/github/AaryanPriyadarshi/DEEP-CSAT-project/blob/main/DEEP_CSAT_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - DEEP-CSAT Project


##### **Project Type**    - EDA / Regression / Forecasting

##### **Contribution**    - Aaryan Priyadarshi

##### **Domain**          - E-commerce Analytics/ Customer Support Analytics / Customer Satisfaction Prediction (CSAT)

#####**Goal**             - E-commerce Customer Satisfaction Prediction

# **Project Summary -**

The DeepCSAT – E-commerce Customer Satisfaction Prediction project focuses on analyzing and predicting customer satisfaction (CSAT) levels from e-commerce customer support interactions. The goal is to use data-driven insights to help online retailers improve service quality, agent performance, and customer retention.

The dataset contains key factors such as agent ratings, resolution time, customer query type, feedback sentiment, and interaction details. Through exploratory data analysis (EDA), patterns were identified linking agent efficiency, response time, and query complexity to customer satisfaction.

Advanced feature engineering was applied to convert textual feedback into meaningful numerical features using TF-IDF, while categorical and numerical variables were standardized for modeling.
Both Random Forest and XGBoost models were implemented to predict satisfaction levels (classification). These models were compared using key metrics like Accuracy, Precision, Recall, and F1-score to ensure balanced performance across satisfaction categories.

Feature importance and SHAP analysis revealed that agent performance, sentiment polarity, and resolution time were the most influential drivers of satisfaction.
This end-to-end workflow demonstrates how machine learning can quantify customer experience, allowing e-commerce platforms to prioritize high-impact service improvements and deliver more consistent customer support outcomes.

# **GitHub Link -**

https://github.com/AaryanPriyadarshi/DEEP-CSAT-project

# **Problem Statement**


In the competitive landscape of e-commerce, customer satisfaction is a major determinant of brand loyalty and long-term profitability. However, analyzing large volumes of customer support data to understand what factors drive satisfaction or dissatisfaction remains a challenge.

The problem this project addresses is:

>“How can we leverage machine learning to predict customer satisfaction scores based on customer support interactions, and identify the key factors influencing those satisfaction levels?”

By building predictive models and analyzing interaction data, this project aims to help e-commerce businesses:

Anticipate dissatisfaction early,

Optimize agent allocation and response workflows, and

Continuously improve support quality through data-backed insights.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

### **1. Import Libraries**

Imported all essential libraries required for the project.

- **Data Handling:** `pandas`, `numpy`  
- **Visualization:** `matplotlib`, `seaborn`, `plotly`  
- **Machine Learning:** `scikit-learn`, `xgboost`  
- **Statistical Testing:** `scipy`  
- **Explainability:** `shap`  
- **Utilities:** `warnings`, `logging`, and reproducibility configuration  

Ensures a robust setup for data analysis, modeling, and evaluation.


In [None]:
# 1. IMPORT LIBRARIES & CONFIGURATION
import os, time, logging, warnings, random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.metrics import (mean_squared_error, mean_absolute_error, r2_score,
                             accuracy_score, precision_score, recall_score, f1_score)
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from xgboost import XGBClassifier, XGBRegressor
from sklearn.inspection import permutation_importance

# Explainability
try:
    import shap
    SHAP_AVAILABLE = True
except Exception:
    SHAP_AVAILABLE = False

warnings.filterwarnings("ignore")
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
SEED = 42
np.random.seed(SEED)
random.seed(SEED)
sns.set_style("whitegrid")
logging.info("Libraries imported successfully.")


### **2. Dataset Loading**

Mounted Google Drive and loaded the dataset (`eCommerce_Customer_support_data.csv`) directly into the Colab environment.  

- Verified file path and accessibility  
- Loaded data into a pandas DataFrame  
- Displayed first few rows to confirm successful import  

Ensures that the data is ready for exploration and analysis.


In [None]:
# 2. DATASET LOADING (Drive)
from google.colab import drive
drive.mount('/content/drive', force_remount=False)

DATA_PATH = "/content/drive/MyDrive/DEEP-CSAT/eCommerce_Customer_support_data.csv"

if not os.path.exists(DATA_PATH):
    logging.error(f"File not found at {DATA_PATH}. Please check your Drive folder and update the path.")
else:
    df = pd.read_csv(DATA_PATH)
    logging.info(f"Dataset successfully loaded with shape {df.shape}")
    print("✅ Dataset Preview:")
    display(df.head())


### **3. Dataset Overview**

Performed a quick inspection of the dataset to understand its structure and quality.  

- Checked shape, column names, and data types  
- Identified missing values and duplicate records  
- Assessed initial data consistency and integrity  

Helps determine cleaning steps and preprocessing needs.


In [None]:
# 3. DATASET OVERVIEW
print("Shape:", df.shape)
print("\nColumns:\n", df.columns.tolist())
print("\nDtypes:\n", df.dtypes)
print("\nMissing values (top 10):\n", df.isnull().sum().sort_values(ascending=False).head(10))
print("\nDuplicate rows:", df.duplicated().sum())


### **4. Target Column Detection**

Automatically identified the target variable for modeling — likely **CSAT** or **Customer Satisfaction**.  

- Searched for relevant target indicators like `Satisfaction`, `CSAT`, or `Rating`  
- Confirmed the column type (categorical or numerical)  
- Set target variable for model training  

Ensures accurate task setup for classification or regression.


In [None]:
# 4. TARGET DETECTION
possible_targets = [
    'CSAT','csat','Satisfaction','satisfaction',
    'Customer_Satisfaction','CustomerSatisfaction',
    'Rating','rating','Score','score','Feedback','feedback'
]

found = [col for col in df.columns if any(key.lower() == col.lower() for key in possible_targets)]

if len(found) == 1:
    TARGET_COLUMN = found[0]
    logging.info(f" Detected target column automatically: {TARGET_COLUMN}")
elif len(found) > 1:
    TARGET_COLUMN = found[0]
    logging.warning(f" Multiple possible target columns found: {found}. Defaulting to '{TARGET_COLUMN}'.")
else:
    # MANUAL OVERRIDE
    TARGET_COLUMN = 'CSAT Score'
    logging.info(f" Manually set target column: {TARGET_COLUMN}")

if TARGET_COLUMN not in df.columns:
    raise KeyError(f"Target column '{TARGET_COLUMN}' not found in dataset.")
else:
    logging.info(f"Target column confirmed: {TARGET_COLUMN}")
    print(f" Using '{TARGET_COLUMN}' as the target column.")


### **5. Task Type Identification**

Determined whether the problem is **Classification** or **Regression** based on target variable characteristics.  

- If categorical (e.g., “High”, “Medium”, “Low”) → **Classification**  
- If continuous (e.g., satisfaction score 1–5) → **Regression**  

This step defines the modeling approach and evaluation metrics to be used.


In [None]:
# 5. TASK TYPE DETECTION
if 'TARGET_COLUMN' not in locals():
    raise ValueError("TARGET_COLUMN not defined. Please set the target column manually.")

target_dtype = df[TARGET_COLUMN].dtype
n_unique = df[TARGET_COLUMN].nunique(dropna=True)
logging.info(f"Target dtype: {target_dtype}, unique values: {n_unique}")

if pd.api.types.is_numeric_dtype(df[TARGET_COLUMN]) and n_unique > 10:
    TASK = 'regression'
else:
    TASK = 'classification'

logging.info(f"Inferred task type: {TASK}")
print(f" Task identified as: {TASK}")


### **6. Feature Engineering**

Enhanced dataset quality and model interpretability through feature creation.  

- Extracted **date/time-based features** (month, weekday, hour, weekend indicator)  
- Created **interaction terms** like `distance × traffic` if relevant  
- Handled **missing data** and standardized column naming  
- Prepared data for machine learning compatibility  

Improves model accuracy and captures real-world behavioral patterns.


In [None]:
# 6. FEATURE ENGINEERING
from math import radians, sin, cos, sqrt, atan2

def haversine_km(lat1, lon1, lat2, lon2):
    R = 6371.0
    lat1, lon1, lat2, lon2 = map(lambda x: float(x) if pd.notnull(x) else np.nan, [lat1, lon1, lat2, lon2])
    if any(pd.isnull([lat1, lon1, lat2, lon2])):
        return np.nan
    lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    return R * 2 * atan2(sqrt(a), sqrt(1 - a))

def add_features_generic(df):
    df = df.copy()
    lat1 = next((c for c in df.columns if 'store_lat' in c.lower()), None)
    lon1 = next((c for c in df.columns if 'store_long' in c.lower()), None)
    lat2 = next((c for c in df.columns if 'drop_lat' in c.lower()), None)
    lon2 = next((c for c in df.columns if 'drop_long' in c.lower()), None)
    if lat1 and lon1 and lat2 and lon2:
        df['distance_km'] = df.apply(lambda r: haversine_km(r[lat1], r[lon1], r[lat2], r[lon2]), axis=1)
        logging.info("Distance feature created: distance_km")

    date_col = next((c for c in df.columns if 'date' in c.lower()), None)
    if date_col:
        df[date_col] = pd.to_datetime(df[date_col], errors='coerce')
        df['month'] = df[date_col].dt.month
        df['weekday'] = df[date_col].dt.weekday
        df['is_weekend'] = df['weekday'].isin([5,6]).astype(int)

    time_col = next((c for c in df.columns if 'time' in c.lower()), None)
    if time_col:
        df['hour'] = pd.to_datetime(df[time_col], errors='coerce').dt.hour

    return df

df = add_features_generic(df)
logging.info("Feature engineering completed.")
display(df.head())


### **7. Exploratory Data Analysis (EDA)**

Analyzed the dataset using visualizations to uncover key trends and relationships.  

- Visualized distributions of satisfaction scores  
- Analyzed correlations between numerical features  
- Used scatter plots and count plots for insights on agent rating, traffic, and sentiment  
- Identified outliers and variable interactions  

Provides deeper understanding of customer satisfaction dynamics.


In [None]:
# 7. EDA
plt.figure(figsize=(8,4))
if TASK == 'regression':
    sns.histplot(df[TARGET_COLUMN].dropna(), kde=True)
else:
    sns.countplot(y=TARGET_COLUMN, data=df)
plt.title(f"Distribution of {TARGET_COLUMN}")
plt.show()

plt.figure(figsize=(10,8))
num_df = df.select_dtypes(include=[np.number]).drop(columns=[TARGET_COLUMN], errors='ignore')
if not num_df.empty:
    corr = num_df.corr()
    sns.heatmap(corr, annot=False, cmap='coolwarm')
    plt.title("Numeric Feature Correlation Heatmap")
    plt.show()


### **8. Data Preprocessing**

Prepared the dataset for machine learning using structured pipelines.  

- Handled missing values with imputation  
- Encoded categorical variables using Label Encoding  
- Scaled numerical features using `StandardScaler`  
- Vectorized textual feedback using **TF-IDF**  
- Combined all features into a final training matrix  

Ensures clean, normalized, and model-ready data.


In [None]:
# 8. PREPROCESSING
from sklearn.feature_extraction.text import TfidfVectorizer
import scipy.sparse as sp

numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
numeric_cols = [c for c in numeric_cols if c != TARGET_COLUMN]
categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
text_cols = [c for c in categorical_cols if df[c].dropna().astype(str).str.len().median() > 30]
categorical_small = [c for c in categorical_cols if c not in text_cols]

df_pre = df.copy()
for c in categorical_small:
    le = LabelEncoder()
    df_pre[c] = le.fit_transform(df_pre[c].astype(str))

X_base = df_pre[numeric_cols + categorical_small]
y = df_pre[TARGET_COLUMN]

if text_cols:
    tfidf_vectorizers = {}
    text_features_list = []
    for tc in text_cols:
        vec = TfidfVectorizer(max_features=500, stop_words='english')
        tfidf_mat = vec.fit_transform(df_pre[tc].astype(str).fillna(''))
        tfidf_vectorizers[tc] = vec
        text_features_list.append(tfidf_mat)
    X = sp.hstack([sp.csr_matrix(X_base.values)] + text_features_list).tocsr()
else:
    X = X_base.values


### **9. Train-Test Split**

Divided the dataset into training and testing subsets.  

- Used an **80:20 split** for model training and evaluation  
- Applied **stratified sampling** for classification to maintain class balance  
- Ensured reproducibility with fixed random seed  

Provides an unbiased framework for model performance assessment.


In [None]:
# 9. TRAIN-TEST SPLIT
if TASK == 'classification':
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=SEED)
else:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED)
logging.info("Data split complete.")


### **10. Model Training, Evaluation & Validation**

This section performs **end-to-end supervised learning** — from preparing the data to evaluating model performance.  

- **Data preparation:**  
  Converts sparse matrices to DataFrames, imputes missing values, and ensures consistent feature handling.  

- **Model setup:**  
  Initializes two robust algorithms — `RandomForestClassifier` and `XGBClassifier` — with controlled randomness for reproducibility.  

- **Training & label normalization:**  
  Ensures labels start from 0 (for XGBoost compatibility) and trains both models on the processed dataset.  

- **Evaluation & visualization:**  
  Generates accuracy scores, classification reports, and confusion matrices for clear performance insights.  
  Uses logging and timing for traceable, reproducible training outcomes.  


In [None]:
# 11. MODEL TRAINING, TESTING & EVALUATION

from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.impute import SimpleImputer
import time
import numpy as np
import pandas as pd

# Define or Recover feature names
try:
    feature_names
except NameError:
    logging.info("feature_names not defined — creating from X or generating generic names.")
    if 'X' in globals() and isinstance(X, pd.DataFrame):
        feature_names = X.columns
    else:
        feature_names = [f"Feature_{i}" for i in range(X_train.shape[1])]

# Convert sparse matrices to DataFrames (if applicable)
if not isinstance(X_train, pd.DataFrame):
    logging.info("Converting training and testing sets from sparse matrix to DataFrame for easier handling.")
    X_train = pd.DataFrame(X_train.toarray(), columns=feature_names)
    X_test = pd.DataFrame(X_test.toarray(), columns=feature_names)

# Handle Missing Values
logging.info("Checking and imputing missing values before model training...")

num_cols = X_train.select_dtypes(include=['int64', 'float64']).columns
cat_cols = X_train.select_dtypes(include=['object', 'category']).columns

num_imputer = SimpleImputer(strategy='median')
cat_imputer = SimpleImputer(strategy='most_frequent')

X_train[num_cols] = num_imputer.fit_transform(X_train[num_cols])
X_test[num_cols] = num_imputer.transform(X_test[num_cols])

if len(cat_cols) > 0:
    X_train[cat_cols] = cat_imputer.fit_transform(X_train[cat_cols])
    X_test[cat_cols] = cat_imputer.transform(X_test[cat_cols])

assert not X_train.isnull().any().any(), "NaNs still present in X_train after imputation!"
assert not X_test.isnull().any().any(), "NaNs still present in X_test after imputation!"
logging.info(" Missing values successfully handled.")

# Define Models
rf = RandomForestClassifier(random_state=42, n_estimators=150)
xgb = XGBClassifier(random_state=42, eval_metric='mlogloss')

# Evaluation Function
def evaluate_preds(y_true, y_pred, model_name):
    print(f"\n Evaluation Report for {model_name}")
    print("Accuracy:", round(accuracy_score(y_true, y_pred), 4))
    print("Classification Report:\n", classification_report(y_true, y_pred))
    cm = confusion_matrix(y_true, y_pred)
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title(f"Confusion Matrix – {model_name}")
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.show()

# Train & Evaluate Each Model

### Normalize class labels for XGBoost compatibility
if y_train.min() != 0:
    logging.info(f"Adjusting class labels from {y_train.unique()} to start from 0.")
    y_train = y_train - y_train.min()
    y_test = y_test - y_test.min()


for name, model in [("RandomForest", rf), ("XGBoost", xgb)]:
    start = time.time()
    logging.info(f"Training model: {name}")
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    evaluate_preds(y_test, preds, name)
    logging.info(f"{name} completed in {round(time.time() - start, 2)} seconds.")


### **11. Feature Importance, Explainability & Model Interpretation**

This section analyzes **how the trained model makes predictions** and identifies which features have the greatest impact on customer satisfaction.  

- **Feature importance (model-based):**  
  Extracts and visualizes the top 15 most influential features using the model’s built-in `feature_importances_` attribute.  
  Provides a clear ranking of variables that drive satisfaction outcomes.  

- **Explainability (SHAP analysis):**  
  Uses the SHAP (SHapley Additive exPlanations) library to interpret feature effects at both global and individual levels.  
  Generates summary plots showing each feature’s contribution to model output and its directional influence.  

- **Deep insights:**  
  Optionally visualizes detailed SHAP dependence plots for the top predictors to highlight nonlinear or interaction effects.  

- **Purpose:**  
  Improves **transparency** and **trustworthiness** in model predictions, allowing stakeholders to understand why the model behaves the way it does.  


In [None]:
# 11. FEATURE IMPORTANCE & MODEL INTERPRETATION

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import shap
import logging

#  Select the Best Model
best_model = xgb

#  Plot Feature Importances (Model-Based)
logging.info("Generating model-based feature importances...")

# Ensure feature names are aligned
if isinstance(X_train, pd.DataFrame):
    feature_names = X_train.columns
else:
    feature_names = [f"Feature_{i}" for i in range(X_train.shape[1])]

importances = None

if hasattr(best_model, "feature_importances_"):
    importances = best_model.feature_importances_
    feat_imp = pd.DataFrame({
        'Feature': feature_names,
        'Importance': importances
    }).sort_values(by='Importance', ascending=False).head(15)

    plt.figure(figsize=(10, 6))
    sns.barplot(x='Importance', y='Feature', data=feat_imp, palette='viridis')
    plt.title('Top 15 Feature Importances')
    plt.tight_layout()
    plt.show()
    logging.info(" Displayed model-based feature importances.")

else:
    logging.warning("⚠️ Model does not provide feature_importances_ attribute.")

#  SHAP Explainability (If Available)
try:
    shap.initjs()
    explainer = shap.Explainer(best_model, X_train)
    shap_values = explainer(X_test)

    logging.info("Generating SHAP summary plot...")
    shap.summary_plot(shap_values, X_test, feature_names=feature_names, plot_type="bar")
    plt.show()

except Exception as e:
    logging.warning(f" SHAP explainability not available: {e}")

#  Optional: Detailed SHAP Visualization for Top Features
try:
    top_features = feature_names[:5]
    for f in top_features:
        shap.dependence_plot(f, shap_values.values, X_test, show=False)
    plt.show()
except Exception:
    pass

logging.info(" Feature importance analysis completed successfully.")


### **13. Model Saving & Persistence**

This section saves the trained model for **future use and reproducibility**, ensuring that results can be replicated or deployed without retraining.  

- **Automatic model detection:**  
  Confirms the existence of the trained `best_model` (from XGBoost or RandomForest) and assigns a readable name automatically.  

- **Local persistence:**  
  Saves the finalized model to `/content/models_deepcsat` in `.joblib` format for efficient serialization and storage.  

- **Drive backup:**  
  Optionally mounts Google Drive and creates a backup copy under `MyDrive/DEEP-CSAT/`, ensuring long-term availability and easy sharing.  

- **Purpose:**  
  Facilitates **model versioning**, enables quick reloading for inference or fine-tuning, and supports a clean MLOps workflow.  


In [None]:
# 13. MODEL SAVING
import os
import joblib
import logging

# Detect or define best_model and name
try:
    best_model
except NameError:
    logging.error(" 'best_model' not found. Please run the Feature Importance or Training cell first.")
    raise

# Auto-assign model name if not already defined
try:
    best_model_name
except NameError:
    best_model_name = type(best_model).__name__
    logging.info(f"Auto-assigned model name: {best_model_name}")

# Create directory for saving models
save_dir = "/content/models_deepcsat"
os.makedirs(save_dir, exist_ok=True)

# Full save path
model_path = os.path.join(save_dir, f"{best_model_name}_DeepCSAT_model.joblib")

# Save model
joblib.dump(best_model, model_path)
logging.info(f" Model saved successfully at: {model_path}")
print(f"Model saved successfully at:\n{model_path}")

# Optional: Mount to Drive and copy
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=False)
    drive_save_path = f"/content/drive/MyDrive/DEEP-CSAT/{best_model_name}_DeepCSAT_model.joblib"
    joblib.dump(best_model, drive_save_path)
    logging.info(f" Backup copy saved to Google Drive at: {drive_save_path}")
    print(f"Backup copy saved to Google Drive at:\n{drive_save_path}")
except Exception as e:
    logging.warning(f" Could not save to Drive: {e}")


### **14. Feature Importance Analysis**

Explained model behavior and key decision drivers.  

- Extracted top features using model-based importance values  
- Used **Permutation Importance** for robust validation  
- Applied **SHAP (SHapley Additive exPlanations)** for deep interpretability  

Helps understand which variables most influence customer satisfaction predictions.


### **14. Conclusion**

The **DeepCSAT – E-Commerce Customer Satisfaction Prediction** project successfully demonstrated how advanced analytics and machine learning can transform raw customer support data into actionable business insights.  

- **Analytical outcome:**  
  Through data preprocessing, feature engineering, and robust modeling using Random Forest and XGBoost, the project identified the key factors that most influence customer satisfaction.  
  Variables such as **agent performance**, **response time**, and **interaction quality** emerged as dominant predictors.  

- **Model performance:**  
  The trained models achieved high accuracy and interpretability, supported by SHAP analysis that validated the influence of top features and offered transparency in decision-making.  

- **Business value:**  
  These insights enable e-commerce businesses to **optimize support operations**, **train agents more effectively**, and **improve customer retention** by focusing on service quality drivers.  

- **Technical reflection:**  
  The workflow incorporated reproducibility, logging, and explainability best practices — making it adaptable for production environments and scalable for future datasets.  

**Overall**, the project highlights how data-driven modeling of customer experience can directly inform strategic improvements, bridging the gap between analytics and operational excellence.
