# Segmenting Customer Data

One of the world's biggest banks launched a machine learning competition in [Kaggle](https://www.kaggle.com/), an online community of data scientists and machine learning practitioners. They want to improve their marketing campaigns by identifying the optimal number of customer segments for their credit card clients. They offer a reward of $5,000 that gained your interest, so you decided to put your unsupervised learning skills into practice to participate in the competition.

The bank provided a dataset that consists of customer data that includes ten different features. The data columns were anonymized using generic names to protect customers' privacy, and data values were already normalized.

Use the starter code to accomplish the following tasks:

1. Load the raw data into a Pandas DataFrame.

2.1 Use the Elbow Method to determine the optimal number of clusters.

2.2 Segment the data with K-means using the optimal number of clusters.

3. Cluster the data using AgglomerativeClustering and Birch

    * Using your optimal number of clusters found above, additionally estimate clusters by using both `AgglomerativeClustering` and `Birch`. Save each of these models and their results for comparison.
    
4. Compare the cluster results from using Kmeans, AgglomerativeClustering, Birch.

    * Create a dataframe which is a copy of the original `customers_df` data.

    * Add all of the predicted labels (`kmeans_predictions`, `agglo_predictions`, and `birch_predictions`) as columns to this dataframe. 

    * For each algorithm, plot the clusters using the "feature_1" and "feature_2" columns.    

**Optional Challenge**: Loop through each clustering algorithm, using an alternative metric to determine the optimal number of clusters.

1. Create three lists (or a dictionary, or dataframe) to contain the metrics to measure optimal clusters.
2. Using a for loop, cycle through a list of cluster counts, fiting each of the three clustering algorithms.
3. When fitting the clustering algorithms in the loop, estimate the [`Variance Ratio Criterion (Calinski-Harabasz Index)`](https://scikit-learn.org/stable/modules/clustering.html#calinski-harabasz-index) and save that metric to your metrics lists in (1).
    Hint: Code samples for these and other metrics can be found in SKLearn's documention on [clustering performance evaluation](https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation).
4. Output each of the three lists. If larger metric values indicate a better number of clusters, what cluster count is best? Does it vary by the algorithm selected?

In [28]:
# Import the modules
import pandas as pd
import hvplot.pandas
from pathlib import Path

## Part 1: Load the raw data into a Pandas DataFrame

In [29]:
# Set the file path
file_path = Path("../Resources/customers.csv")

# Read the csv file into a pandas DataFrame
customers_df = pd.read_csv(file_path)

# Review the DataFrame
customers_df.head()

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10
0,1.148534,4.606077,2.699069,-2.661824,1.526433,1.236671,0.211421,1.482896,-4.445627,-1.936831
1,-1.14941,-1.650549,2.530167,-3.227088,0.572138,4.1626,-0.291679,-1.237575,3.604765,-1.635689
2,0.332427,-0.887985,-0.309216,0.399891,0.828492,3.641945,-0.916946,-1.978024,1.056772,-1.882747
3,2.245599,3.826309,0.264039,0.095471,1.98438,0.373991,-0.280279,1.602786,-5.993331,-2.258925
4,0.705503,-1.312329,0.895406,-0.405408,1.116187,3.699562,-1.427985,-1.494409,1.156908,-1.434964


In [30]:
# Use the "info()" Pandas function to validate data types and null values
customers_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   feature_1   1000 non-null   float64
 1   feature_2   1000 non-null   float64
 2   feature_3   1000 non-null   float64
 3   feature_4   1000 non-null   float64
 4   feature_5   1000 non-null   float64
 5   feature_6   1000 non-null   float64
 6   feature_7   1000 non-null   float64
 7   feature_8   1000 non-null   float64
 8   feature_9   1000 non-null   float64
 9   feature_10  1000 non-null   float64
dtypes: float64(10)
memory usage: 78.2 KB


In [31]:
# Use the Pandas "describe()" function to compute summary statistics
customers_df.describe()

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,-0.022428,0.805748,1.942896,-2.36403,0.85498,1.232422,0.146269,0.833486,-0.53432,-1.219393
std,2.382021,2.335796,1.411307,1.716566,1.742986,3.250231,1.635576,2.039563,4.211831,1.979172
min,-6.259471,-4.649286,-2.894995,-8.735778,-4.641509,-9.11147,-4.260013,-4.911903,-9.522425,-6.083462
25%,-2.091657,-1.214774,1.026128,-3.438149,-0.23531,-0.333722,-0.967569,-0.894817,-4.129561,-2.505366
50%,0.16167,1.096439,1.905107,-2.437602,1.084556,1.367371,-0.222299,1.519069,-0.536849,-1.706372
75%,2.030005,2.513648,2.851613,-1.22973,2.287268,3.637304,1.061269,2.298862,2.626514,-0.553571
max,6.275723,7.955158,5.897102,4.296552,4.74135,8.705423,7.123969,5.789222,10.047819,5.413623


In [32]:
# Import the KMeans, Birch, and AgglomerativeClustering modules from SKLearn
from sklearn.cluster import KMeans, AgglomerativeClustering, Birch

## Part 2.1. Use the Elbow Method to determine the optimal number of clusters for KMeans.

In [33]:
# Create a list to store inertia values and the values of k
inertia_values = []

# Create a list to set the range of k values to test
k = list(range(1,11))

In [34]:
# Create a for-loop where each value of k is evaluated using the K-means algorithm
# Fit the model using the "customers_df" DataFrame
# Append the value of the computed inertia from the `inertia_` attribute of the KMeans model instance

for i in k: 
    k_model = KMeans(n_clusters=i, random_state=1)
    k_model.fit(customers_df)
    inertia_values.append(k_model.inertia_)

  "KMeans is known to have a memory leak on Windows "


In [35]:
# Define a DataFrame to hold the values for k and the corresponding inertia
elbow_data = {'k' : k, 'inertia' : inertia_values}
elbow_df = pd.DataFrame(elbow_data)

# Review the DataFrame
elbow_df.head()

Unnamed: 0,k,inertia
0,1,58103.759171
1,2,32183.537923
2,3,17080.936423
3,4,14894.368711
4,5,12816.540877


In [36]:
# Plot the DataFrame to identify the optimal value for k
elbow_df.hvplot(x = 'k', y = 'inertia')

## Part 2.2: Segment the data with K-means using the optimal number of clusters

In [37]:
# Define the model with optimal number of clusters
model = KMeans(n_clusters=3, random_state=1)

# Fit the model
model.fit(customers_df)

# Make predictions
k_optimal = model.predict(customers_df)

kmeans_predictions = customers_df.copy()
kmeans_predictions['clusters'] = k_optimal 

kmeans_predictions.head()

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10,clusters
0,1.148534,4.606077,2.699069,-2.661824,1.526433,1.236671,0.211421,1.482896,-4.445627,-1.936831,1
1,-1.14941,-1.650549,2.530167,-3.227088,0.572138,4.1626,-0.291679,-1.237575,3.604765,-1.635689,0
2,0.332427,-0.887985,-0.309216,0.399891,0.828492,3.641945,-0.916946,-1.978024,1.056772,-1.882747,0
3,2.245599,3.826309,0.264039,0.095471,1.98438,0.373991,-0.280279,1.602786,-5.993331,-2.258925,1
4,0.705503,-1.312329,0.895406,-0.405408,1.116187,3.699562,-1.427985,-1.494409,1.156908,-1.434964,0


## Part 3. Cluster the data using AgglomerativeClustering and Birch

Using your optimal number of clusters found above, additionally estimate clusters by using both `AgglomerativeClustering` and `Birch`. Save each of these models and their results for comparison.

In [38]:
# AgglomerativeClustering Model
agglo_model = AgglomerativeClustering(n_clusters=3)
agglo_predictions = agglo_model.fit_predict(customers_df)

In [39]:
# Birch model 
birch_model = Birch(n_clusters=3)
birch_model.fit(customers_df)
birch_predictions = birch_model.predict(customers_df)

In [40]:
# Previewing the predicted customer classifcations for Birch
predictions_df = pd.DataFrame(kmeans_predictions)
predictions_df['agglo-labels'] = agglo_predictions
predictions_df['birch-labels'] = birch_predictions
predictions_df.head()

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10,clusters,agglo-labels,birch-labels
0,1.148534,4.606077,2.699069,-2.661824,1.526433,1.236671,0.211421,1.482896,-4.445627,-1.936831,1,1,0
1,-1.14941,-1.650549,2.530167,-3.227088,0.572138,4.1626,-0.291679,-1.237575,3.604765,-1.635689,0,0,1
2,0.332427,-0.887985,-0.309216,0.399891,0.828492,3.641945,-0.916946,-1.978024,1.056772,-1.882747,0,0,1
3,2.245599,3.826309,0.264039,0.095471,1.98438,0.373991,-0.280279,1.602786,-5.993331,-2.258925,1,1,0
4,0.705503,-1.312329,0.895406,-0.405408,1.116187,3.699562,-1.427985,-1.494409,1.156908,-1.434964,0,0,1


## Part 4. Compare the cluster results from using Kmeans, AgglomerativeClustering, Birch

1) Create a dataframe which is a copy of the original `customers_df` data.

2) Add all of the predicted labels (`kmeans_predictions`, `agglo_predictions`, and `birch_predictions`) as columns to this dataframe. 

3) For each algorithm, plot the clusters using the "feature_1" and "feature_2" columns.

In [42]:
# Create a copy of the customers_df DataFrame
customers_predictions_df = customers_df.copy()
# Add class columns with the labels to the new DataFrame
customers_predictions_df['KMeans-clusters'] = k_optimal
customers_predictions_df['agglo-labels'] = agglo_predictions
customers_predictions_df['birch-labels'] = birch_predictions
customers_predictions_df[['KMeans-clusters','agglo-labels', 'birch-labels']].head(3)

Unnamed: 0,KMeans-clusters,agglo-labels,birch-labels
0,1,1,0
1,0,0,1
2,0,0,1


In [45]:
# Plot the kmeans clusters using the "feature_1" and "feature_2" columns
customers_predictions_df.hvplot.scatter(
    x = 'feature_1',
    y = 'feature_2', 
    by = 'KMeans-clusters'
)

In [46]:
# Plot the agglomerative clusters using the "feature_1" and "feature_2" columns
customers_predictions_df.hvplot.scatter(
    x = 'feature_1',
    y = 'feature_2', 
    by = 'agglo-labels'
)

In [47]:
# Plot the birch clusters using the "feature_1" and "feature_2" columns
customers_predictions_df.hvplot.scatter(
    x = 'feature_1',
    y = 'feature_2', 
    by = 'birch-labels'
)

#### Optional Challenge: Loop through each clustering algorithm, using an alternative metric to determine the optimal number of clusters.

1. Create three lists (or a dictionary, or dataframe) to contain the metrics to measure optimal clusters.
2. Using a for loop, cycle through a list of cluster counts, fiting each of the three clustering algorithms.
3. When fitting the clustering algorithms in the loop, estimate the [`Variance Ratio Criterion (Calinski-Harabasz Index)`](https://scikit-learn.org/stable/modules/clustering.html#calinski-harabasz-index) and save that metric to your metrics lists in (1).
    Hint: Code samples for these and other metrics can be found in SKLearn's documention on [clustering performance evaluation](https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation).
4. Output each of the three lists. If larger metric values indicate a better number of clusters, what cluster count is best? Does it vary by the algorithm selected?

In [None]:
# Preview the predictions for one of the algorithms
birch_predictions[0:10]

In [None]:
# Equivalently, preview the labels_ attribute for one of the algorithms
birch_model.labels_[0:10]

In [None]:
# Create a list to store values and the values of k
# Your Code Here!

# Create a list to set the range of k values to test
# Your Code Here!

In [None]:
from sklearn import metrics

for i in k:
    # Your Code Here!

In [None]:
display(score_kmeans)

In [None]:
score_agglomerative

In [None]:
score_birch

**Optional Challenge Question:**: If larger metric values indicate a better number of clusters, what cluster count is best? Does it vary by the algorithm selected?