# Segmenting Customers

You have been asked to review how the customer ratings data looks when modeled with 3 and 4 clusters.

Using the information contained in this notebook, apply the K-means algrothim to the `service_ratings` data using both 3 and 4 clusters to segment the customer information.

In [22]:
# Import the modules
import pandas as pd
from pathlib import Path
import hvplot.pandas

# Import the K-means algorithm
from sklearn.cluster import KMeans

In [23]:
# Read in the CSV file as a Pandas DataFrame
service_ratings_df = pd.read_csv(
    Path("../Resources/service_ratings.csv")
)

# Review the DataFrame
service_ratings_df.head()

Unnamed: 0,mobile_app_rating,personal_banker_rating
0,3.5,2.4
1,3.65,3.14
2,2.9,2.75
3,2.93,3.36
4,2.89,2.62


In [24]:
# Visualize a scatter plot of the data
service_ratings_df.hvplot.scatter(
    x="mobile_app_rating", 
    y="personal_banker_rating"
)

## Run the k-means model with 3 clusters

In [25]:
# Create and initialize the K-means model instance for 3 clusters
# Set the random_state variable to 1
# YOUR CODE HERE
from sklearn.cluster import KMeans
model = KMeans(n_clusters=3, random_state=1)
# Print the model
model
# YOUR CODE HERE

KMeans(n_clusters=3, random_state=1)

In [26]:
# Fit the data to the instance of the model
# YOUR CODE HERE
model.fit(service_ratings_df)


KMeans(n_clusters=3, random_state=1)

In [27]:
# Make predictions about the data clusters by using the trained model
# YOUR CODE HERE
customer_ratings = model.predict(service_ratings_df)
# Print the predictions
# YOUR CODE HERE
customer_ratings

array([0, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 0, 1, 1, 0, 1, 2, 1, 2, 2, 1, 1,
       2, 1, 2, 0, 2, 1, 2, 0, 1, 2, 1, 0, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1,
       1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 0, 2, 1, 2, 1, 2, 1, 2, 2,
       0, 0, 2, 1, 0, 2, 2, 1, 2, 1, 0, 2, 1, 1, 1, 0, 2, 1, 1, 1, 2, 1,
       0, 2, 1, 2, 1, 1, 0, 1, 0, 1, 2, 1, 0, 2, 1, 2, 1, 1, 2, 2, 1, 1,
       2, 1, 1, 1, 0, 1, 2, 2, 2, 1, 0, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1,
       1, 1, 2, 0, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 0, 1, 2, 2, 2, 0, 1,
       1, 1, 0, 1, 1, 1, 2, 1, 0, 2, 1, 2, 2, 2, 2, 0, 1, 2, 0, 1, 1, 2,
       1, 1, 1, 2, 1, 0, 1], dtype=int32)

In [28]:
# Create a copy of the DataFrame and name it as service_ratings_predictions_df
# YOUR CODE HERE
service_ratings_df = service_ratings_df.copy()


# Add a column to the DataFrame that contains the customer_segment information
# YOUR CODE HERE
service_ratings_df['customer segment information'] = customer_ratings
# Review the DataFrame
# YOUR CODE HERE
service_ratings_df.head()


Unnamed: 0,mobile_app_rating,personal_banker_rating,customer segment information
0,3.5,2.4,0
1,3.65,3.14,1
2,2.9,2.75,2
3,2.93,3.36,2
4,2.89,2.62,2


In [31]:
# Plot the data points based on the customer rating
# YOUR CODE HERE
service_ratings_df.hvplot.scatter(
    x="mobile_app_rating", 
    y="personal_banker_rating",
    by="customer segment information"
)

## Run the k-means model with 4 clusters

In [None]:
# Create and initialize the K-means model instance for 4 clusters
# YOUR CODE HERE

# Print the model
# YOUR CODE HERE

KMeans(n_clusters=4, random_state=1)

In [None]:
# Fit the data to the instance of the model
# YOUR CODE HERE

KMeans(n_clusters=4, random_state=1)

In [None]:
# Make predictions about the data clusters by using the trained model
# YOUR CODE HERE

# Print the predictions
# YOUR CODE HERE

[3 1 0 0 0 0 0 1 0 2 0 0 1 0 3 1 0 1 0 0 1 1 0 1 0 3 2 1 2 3 1 0 1 3 1 0 0
 0 1 0 0 1 1 0 1 0 0 1 1 1 0 1 1 1 0 1 0 3 0 0 0 1 0 0 0 0 3 3 0 0 3 2 0 1
 2 1 3 0 1 1 1 3 2 1 1 0 2 1 3 2 1 0 0 1 3 1 3 1 0 0 3 0 1 0 1 1 2 2 1 1 0
 0 1 1 3 1 2 2 0 1 3 0 0 0 0 0 0 1 0 0 0 1 1 1 0 3 0 1 1 0 1 0 0 1 1 0 0 0
 1 0 2 0 3 1 1 1 3 1 1 1 0 1 3 0 1 0 0 0 2 3 1 0 3 0 0 2 1 3 1 0 0 3 1]


In [None]:
# Add a column to the service_ratings_predictions_df DataFrame that contains the customer_segment information
# YOUR CODE HERE

# Review the DataFrame
# YOUR CODE HERE

Unnamed: 0,mobile_app_rating,personal_banker_rating,customer_segment_3,customer_segment_4
0,3.5,2.4,0,3
1,3.65,3.14,1,1
2,2.9,2.75,2,0
3,2.93,3.36,2,0
4,2.89,2.62,2,0


In [None]:
# Plot the data points based on the customer rating
# YOUR CODE HERE

## Answer the following question:

**Question:** Can any additional information be gleaned from the customer segmentation data when clusters of 3 and 4 are applied?

**Answer:** # YOUR ANSWER HERE 