# __Using k-means for customer segmentation__

### Customer Segmentation with K-Means:

- __Pre-processing__
- __Modeling__
- __Insights__

In [None]:
import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.cluster import KMeans 
from sklearn.datasets import make_blobs 
%matplotlib inline

### Customer Segmentation with K-Means

Apply customer segmentation on a customer dataset
Customer segmentation is the practice of dividing a customer base into groups with similar characteristics. This strategy helps businesses target specific customer groups and allocate marketing resources effectively

### Load Data From CSV File  
Before you can work with the data, let's use pandas to read the dataset from IBM Object Storage.


In [None]:
import pandas as pd
cust_df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%204/data/Cust_Segmentation.csv")
cust_df.head()

## __Pre-processing__

___Address___ in this dataset is a categorical variable ,drop this feature and run clustering.


In [None]:
df = cust_df.drop('Address', axis=1)
df.head()

## __Normalizing over the standard deviation__
Now let's normalize the dataset by using __StandardScaler()__.


In [None]:
from sklearn.preprocessing import StandardScaler
# Extracting values from the first column to the last one, starting from the second row to the last row
X = df.values[:,1:]

# Replacing NaN values with zero
X = np.nan_to_num(X)
X

In [None]:
Clus_dataSet = StandardScaler().fit_transform(X)
Clus_dataSet

### __Modeling__

Let's apply k-means on our dataset, and take a look at cluster labels.

In [None]:
clusterNum = 3

# __k-means++__ initialize the cluster centroids.
k_means = KMeans(init = "k-means++", n_clusters = clusterNum, n_init = 12)

k_means.fit(X)
labels = k_means.labels_

labels

## __Insights__

We assign the labels to each row in the dataframe.


In [None]:
df["km_Clus"] = labels
df.head(5)

___We can easily check the centroid values by averaging the features in each cluster.___


In [None]:
df.groupby('km_Clus').mean()

___Now, let's look at the distribution of customers based on their age and income:___


In [None]:
# calculates the area of circles, assuming the values in X[:, 1] represent radii.
area = np.pi * ( X[:, 1])**2  
plt.scatter(X[:, 0], X[:, 3], s=area, c=labels.astype(np.float64), alpha=0.5)
plt.xlabel('Age', fontsize=18)
plt.ylabel('Income', fontsize=16)

plt.show()


In [None]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np

fig = plt.figure(1, figsize=(8, 6))
plt.clf()
ax = fig.add_subplot(111, projection='3d', elev=48, azim=134)

ax.set_xlabel('Education')
ax.set_ylabel('Age')
ax.set_zlabel('Income')

ax.scatter(X[:, 1], X[:, 0], X[:, 3], c=labels.astype(np.float64))

plt.show()
