# **Customer Spending Score System**


| Column Name              | Description                                                                                                 |
| ------------------------ | ----------------------------------------------------------------------------------------------------------- |
| `CustomerID`             | Unique identifier for each customer (we are not using it for prediction)                                    |
| `Gender`                 | Gender of the customer — categorical (Male / Female)                                                        |
| `Age`                    | Age of the customer (in years)                                                                              |
| `Annual Income (k$)`     | Annual income of the customer (in thousands of dollars)                                                     |
| `Spending Score (1-100)` | Score assigned to the customer based on their spending behavior (1 = lowest spender, 100 = highest spender) |

* **Target variable (what we want to predict):** `Spending Score (1-100)`
* **Input variables (used to make predictions):** `Gender`, `Age`, `Annual Income (k$)`

### **What is this code intended for?**

> This code builds a simple **regression model** that predicts a customer's **Spending Score** based on their **Gender, Age, and Annual Income**.
> It also saves the trained model and scaler so they can be reused in a web app (like our Flask app).


## Step 1: Imports

In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error
import joblib
import os

In [None]:
# Load customer data from a CSV file
df = pd.read_csv('Mall Customer Segmentation Data.csv')
df.head()

Unnamed: 0,CustomerID,Gender,Age,Annual Income (k$),Spending Score (1-100)
0,1,Male,19,15,39
1,2,Male,21,15,81
2,3,Female,20,16,6
3,4,Female,23,16,77
4,5,Female,31,17,40


## Step 2: Preprocess the data


In [3]:
# Encode 'Gender' as numerical values: Male = 0, Female = 1
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})

#### Select input features (X) and target variable (y)


In [4]:
# Features: Gender, Age, Income
X = df[['Gender', 'Age', 'Annual Income (k$)']]

# Target: Spending Score
y = df['Spending Score (1-100)']

## Step 3: Split the data


In [5]:
# Split data into training set (80%) and testing set (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Standardize features


In [6]:
# Initialize a scaler to standardize Age and Income (Gender is not scaled)
scaler = StandardScaler()

In [7]:
# Fit scaler on training data and transform both training and test sets
X_train_scaled_features = scaler.fit_transform(X_train[['Age', 'Annual Income (k$)']])
X_test_scaled_features = scaler.transform(X_test[['Age', 'Annual Income (k$)']])

In [8]:
# Convert scaled features back into DataFrames
X_train_scaled = pd.DataFrame(X_train_scaled_features, columns=['Age', 'Annual Income (k$)'])
X_test_scaled = pd.DataFrame(X_test_scaled_features, columns=['Age', 'Annual Income (k$)'])

In [9]:
# Add back the 'Gender' column (not scaled)
X_train_scaled['Gender'] = X_train['Gender'].values
X_test_scaled['Gender'] = X_test['Gender'].values

**Note: We are adding the `"Gender"` column back because we did not scale it and it wont make sense to scale gender because it's a `categorical nominal feature`**

## Step 5: Train the model


In [10]:
# Initialize and train a linear regression model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

## Step 6: Evaluate the model


In [13]:
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)
y_pred

array([59.82812944, 58.84945709, 37.0469889 , 54.91842692, 39.84329514,
       64.23552817, 55.67685009, 51.87565203, 46.07289703, 59.89547225,
       49.20969012, 48.94060312, 55.3180862 , 61.29134074, 45.59726863,
       59.57252226, 44.29084909, 50.23765619, 34.22423523, 62.12139172,
       62.15314859, 52.45877845, 56.11689254, 51.61981691, 62.70931536,
       32.02956041, 62.46718801, 34.51159927, 57.44975951, 61.0850273 ,
       55.29067064, 41.37333704, 45.39021502, 58.32910423, 49.62711422,
       45.23776457, 58.78719575, 55.82495924, 45.88868971, 48.35702079])

In [14]:
# Calculate Mean Absolute Error (MAE) — a simple evaluation metric
mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Absolute Error: {mae:.2f}')

Mean Absolute Error: 18.15


## Step 7: Save the model and scaler


In [16]:
# Create 'models' directory if it doesn't exist
os.makedirs('../models', exist_ok=True)

In [17]:
# Save the trained model
model_filename = '../models/spending_score_model.joblib'
joblib.dump(model, model_filename)

['../models/spending_score_model.joblib']

In [18]:
# Save the scaler (to apply same scaling during prediction)
scaler_dir = '../models/scaler.joblib'
joblib.dump(scaler, scaler_dir)

['../models/scaler.joblib']

## **Summary (to tell your students)**

* **Goal:** Predict customer's spending score using simple linear regression
* **Why scale Age/Income?** These values are on different scales and scaling helps the model perform better
* **Why save model + scaler?** So we can reuse them in a web app (like our Flask app) to make predictions on new data
* **Metric used:** Mean Absolute Error (lower MAE = better)