# IS4487 Week 9 - Practice Code

This notebook is designed to help you follow along with the **Week 9 Lecture and Reading**, introducing you to segmentation

The practice code demos are intended to give you a chance to see working code and can be a source for your lap and assignment work.  Each section contains short explanations and annotated code that reflect the steps in the reading.

### Topics for this demo:
- Use K-Means for Segmentation
- Plot the results

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Demos/demo_09_segmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### Context: Customer Segmentation
We will use a simple dataset using the following retail shopper characteristics

| Feature     | Description                                          | Type        |
| ----------- | ---------------------------------------------------- | ----------- |
| `age`       | Age of the customer                                    | Numeric     |
| `gender`   | Gencer (male/female)               | Categorical |
| `annual income`   | Annual income in thousands of dollars             | Numeric     |
| `customer ID`   | customer ID number                        | Numeric |

There is no target variable!  You will be grouping the customers 

### KMeans Segmentation

K-Means is an unsupervised learning algorithm that groups data into k clusters by minimizing the distance between points and their cluster’s center. It iteratively assigns points to the nearest centroid and updates centroids until the solution stabilizes.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Sample customer data
data = {
    'CustomerID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Male', 'Female'],
    'Age': [23, 45, 31, 22, 41, 36, 29, 48, 50, 27],
    'Annual Income (k$)': [15, 16, 17, 18, 100, 110, 120, 130, 140, 150]
}

# Create DataFrame
df = pd.DataFrame(data)

Prepare Data

In [None]:
# Convert gender to numeric
df['Gender'] = df['Gender'].replace({'Male': 0, 'Female': 1})

# Features for clustering
X = df[['Annual Income (k$)', 'Age', 'Gender']]

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Create Model

In [None]:
# Apply K-Means
kmeans = KMeans(n_clusters=3, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_scaled)

Visualize Model

In [None]:
# Plot clusters in using Age and Income
plt.figure(figsize=(8, 5))
for cluster in df['Cluster'].unique():
    cluster_data = df[df['Cluster'] == cluster]
    plt.scatter(cluster_data['Annual Income (k$)'], cluster_data['Age'], label=f'Cluster {cluster}')

plt.xlabel('Annual Income (k$)')
plt.ylabel('Age')
plt.title('Customer Segmentation (by Age, Income, Gender)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()