# Segmenting Customers

You have been asked to review how the customer ratings data looks when modeled with 3 and 4 clusters.

Using the information contained in this notebook, apply the K-means algrothim to the service_ratings data using both 3 and 4 clusters to segment the customer information.

In [1]:
# Import the modules
import pandas as pd
from pathlib import Path
import hvplot.pandas

# Import the K-means algorithm
from sklearn.cluster import KMeans

import warnings
#filter the warnings 
warnings.filterwarnings("ignore")

In [2]:
# Read in the CSV file as a Pandas DataFrame
service_ratings_df = pd.read_csv(
    Path("../Resources/service_ratings.csv")
)

# Review the DataFrame
service_ratings_df.head()

Unnamed: 0,mobile_app_rating,personal_banker_rating
0,3.5,2.4
1,3.65,3.14
2,2.9,2.75
3,2.93,3.36
4,2.89,2.62


In [3]:
# Visualize a scatter plot of the data
service_ratings_df.hvplot.scatter(
    x="mobile_app_rating", 
    y="personal_banker_rating"
)

## Run the k-means model with 3 clusters

In [4]:
# Create and initialize the K-means model instance for 3 clusters
# Set the random_state variable to 1
km=KMeans(n_clusters=3, random_state=1)

# Print the model
km

In [5]:
# Fit the data to the instance of the model
km.fit(service_ratings_df)

In [6]:
# Make predictions about the data clusters using the trained model
predict_df = km.predict(service_ratings_df)

# Print the predictions
predict_df

array([0, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 0, 2, 2, 0, 2, 1, 2, 1, 1, 2, 2,
       1, 2, 1, 0, 1, 2, 1, 0, 2, 1, 2, 0, 2, 1, 1, 1, 2, 1, 1, 2, 2, 2,
       2, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 0, 1, 2, 1, 2, 1, 2, 1, 1,
       0, 0, 1, 2, 0, 1, 1, 2, 1, 2, 0, 1, 2, 2, 2, 0, 1, 2, 2, 2, 1, 2,
       0, 1, 2, 1, 2, 2, 0, 2, 0, 2, 1, 2, 0, 1, 2, 1, 2, 2, 1, 1, 2, 2,
       1, 2, 2, 2, 0, 2, 1, 1, 1, 2, 0, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2,
       2, 2, 1, 0, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 0, 2, 1, 1, 1, 0, 2,
       2, 2, 0, 2, 2, 2, 1, 2, 0, 1, 2, 1, 1, 1, 1, 0, 2, 1, 0, 2, 2, 1,
       2, 2, 2, 1, 2, 0, 2])

In [10]:
# Create a copy of the DataFrame and name it as service_ratings_predictions_df
predict_df_copy = predict_df.copy()

# Add a column to the DataFrame that contains the customer_segment information
service_ratings_df['customer_segment'] = predict_df_copy

# Review the DataFrame
service_ratings_df

Unnamed: 0,mobile_app_rating,personal_banker_rating,customer_segment
0,3.50,2.40,0
1,3.65,3.14,2
2,2.90,2.75,1
3,2.93,3.36,1
4,2.89,2.62,1
...,...,...,...
178,3.44,3.00,2
179,2.40,2.80,1
180,3.25,2.88,2
181,3.50,2.40,0


In [16]:
# Plot the data points based on the customer rating
service_ratings_df.hvplot.scatter(
        x='mobile_app_rating',
        y='personal_banker_rating',
        c='customer_segment')

## Run the k-means model with 4 clusters

In [19]:
# Create and initialize the K-means model instance for 4 clusters
km_four=KMeans(n_clusters=4, random_state=1)

# Print the model
km_four

# Print the model
# YOUR CODE HERE

In [20]:
# Fit the data to the instance of the model
km_four.fit(service_ratings_df)

In [22]:
# Make predictions about the data clusters using the trained model
predict_df_four=km_four.predict(service_ratings_df)

# Print the predictions
predict_df_four

array([1, 0, 2, 2, 2, 2, 0, 0, 2, 3, 0, 1, 0, 0, 1, 0, 2, 0, 2, 2, 0, 0,
       2, 0, 2, 1, 3, 0, 2, 1, 0, 2, 0, 1, 0, 2, 2, 2, 0, 2, 2, 0, 0, 0,
       0, 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 1, 2, 0, 2, 0, 2, 0, 2, 2,
       1, 1, 2, 0, 1, 3, 2, 0, 2, 0, 1, 2, 0, 0, 0, 1, 2, 0, 0, 0, 3, 0,
       1, 3, 0, 2, 0, 0, 1, 0, 1, 0, 2, 0, 1, 2, 0, 2, 0, 0, 2, 3, 0, 0,
       2, 0, 0, 0, 1, 0, 3, 3, 2, 0, 1, 0, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0,
       0, 0, 2, 1, 2, 0, 0, 0, 0, 2, 2, 0, 0, 0, 2, 1, 0, 2, 3, 2, 1, 0,
       0, 0, 1, 0, 0, 0, 2, 0, 1, 2, 0, 2, 2, 2, 2, 1, 0, 2, 1, 0, 0, 3,
       0, 0, 0, 2, 0, 1, 0])

In [23]:
# Add a column to the service_ratings_predictions_df DataFrame that contains the customer_segment information
service_ratings_df['customer_segment'] = predict_df_four

# Review the DataFrame
service_ratings_df

Unnamed: 0,mobile_app_rating,personal_banker_rating,customer_segment
0,3.50,2.40,1
1,3.65,3.14,0
2,2.90,2.75,2
3,2.93,3.36,2
4,2.89,2.62,2
...,...,...,...
178,3.44,3.00,0
179,2.40,2.80,2
180,3.25,2.88,0
181,3.50,2.40,1


In [24]:
# Plot the data points based on the customer rating
service_ratings_df.hvplot.scatter(
    x='mobile_app_rating',
    y='personal_banker_rating',
    c='customer_segment')

## Answer the following question

**Question:** Can any additional information be gleaned from the customer segmentation data when clusters of 3 and 4 are applied?

**Answers:** No