# Customer Segmentation
Customer segmentation is the process of dividing customers into groups based on common characteristics so companies can market to each group effectively and appropriately.

In business-to-business marketing, a company might segment customers according to a wide range of factors, including:

1.    Industry
1.    Number of employees
1.    Products previously purchased from the company
1.    Location

In business-to-consumer marketing, companies often segment customers according to demographics that include:

1.    Age
1.    Gender
1.    Marital status
1.    Location (urban, suburban, rural)
1.    Life stage (single, married, divorced, empty-nester, retired, etc.)


![ Customer Segmentation ](https://www.ebcg.com/wp-content/uploads/2017/09/Market-Segmentation-1-600x442.jpg) ![]()


# How to segment Customers
Customer segmentation requires a company to gather specific information – data – about customers and analyze it to identify patterns that can be used to create segments.

Some of that can be gathered from purchasing information – job title, geography, products purchased, for example. Some of it might be gleaned from how the customer entered your system. An online marketer working from an opt-in email list might segment marketing messages according to the opt-in offer that attracted the customer, for example. Other information, however, including consumer demographics such as age and marital status, will need to be acquired in other ways.

Typical information-gathering methods include:

   Face-to-face or telephone interviews
   Surveys
   General research using published information about market categories
   Focus groups


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [3]:
data = pd.read_csv('../input/customer-segmentation-tutorial-in-python/Mall_Customers.csv')
data.head()

In [4]:
data.shape

In [5]:
data.describe()

In [6]:
plt.figure(figsize=(15,6))
n = 0
for x in ['Age','Annual Income (k$)','Spending Score (1-100)']:
  n+=1
  plt.subplot(1,3,n)
  plt.subplots_adjust(hspace = 0.5,wspace = 0.5)
  sns.distplot(data[x],bins = 20)
  plt.title('distribution of {}'.format(x))



We see here that most of the people age's are from 35 to 40 and make around 60.000 a year

In [7]:
sns.countplot(data = data,y ='Gender')

Now we know that most of the customers are women

Now we want to know what range of age is buying the most 

In [8]:
Age_18_25 = data.Age[(data.Age >= 18)& (data.Age<=25)]
Age_26_35 = data.Age[(data.Age >= 26)& (data.Age<=35)]
Age_36_45 = data.Age[(data.Age >= 36)& (data.Age<=45)]
Age_46_55 = data.Age[(data.Age >= 46)& (data.Age<=55)]
Age_above_55 = data.Age[data.Age >= 56]

In [9]:
agex = ['Age_18_25','Age_26_35','Age_36_45','Age_46_55','Age_above_55']
agey = [len(Age_18_25.values),len(Age_26_35.values),len(Age_36_45.values),len(Age_46_55.values),len(Age_above_55.values)]


In [10]:
plt.figure(figsize=(15,6))
sns.barplot(x = agex,y = agey , palette='mako')
plt.title = (" Range of age ")
plt.xlable = (" Range of age ")
plt.ylabel = ('No of Customers')
plt.show()

# Classify the peoble based on their annual income

In [12]:
Annual_income_30 = data['Annual Income (k$)'][(data['Annual Income (k$)']>=0)&(data['Annual Income (k$)']<=30)]
Annual_income_60 = data['Annual Income (k$)'][(data['Annual Income (k$)']>=31)&(data['Annual Income (k$)']<=60)]
Annual_income_90 = data['Annual Income (k$)'][(data['Annual Income (k$)']>=61)&(data['Annual Income (k$)']<=90)]
Annual_income_120 = data['Annual Income (k$)'][(data['Annual Income (k$)']>=91)&(data['Annual Income (k$)']<=120)]
Annual_income_150 = data['Annual Income (k$)'][(data['Annual Income (k$)']>=121)&(data['Annual Income (k$)']<=150)]

In [13]:
AiX = ['0-30','30-60','60-90','90-120','120-150']
AiY =[len(Annual_income_30.values),len(Annual_income_60.values),len(Annual_income_90.values),len(Annual_income_120.values),len(Annual_income_150.values)]

In [14]:
plt.figure(figsize = (14,6))
sns.barplot(x = AiX,y = AiY,palette = 'Dark2_r')

Most of the people's salary are from 60k to 90k

# Cluster the data based on their Age and Annual Income

In [15]:
X1 = data.loc[:,['Age','Annual Income (k$)']].values

In [16]:
from sklearn.cluster import KMeans

**We want to know how many clusters to choose 
there is a common method called the elbow method and thats what we'll use it**

In [33]:
acc = []    #append how well the data is clustered when k is from 1 to 9
for k in range(1,10):
  kmeans = KMeans(n_clusters = k)
  kmeans.fit(X1)
  acc.append(kmeans.inertia_)  #Inertia measures how well a dataset was clustered by K-Means. 
                               #It is calculated by measuring the distance between each data point and its centroid,

plt.figure(figsize = (12,6))
plt.plot(range(1,10),acc,linewidth = 3)
plt.xlabel("no of clusters")

plt.show()


4 clusters seems like a good choise  

In [20]:
kmean = KMeans(n_clusters = 4)
labels = kmean.fit_predict(X1)
print(labels)

In [24]:
plt.figure(figsize = (12,8))
plt.scatter(X1[:,0],X1[:,1],c = kmean.labels_,cmap = 'rainbow')
plt.scatter(kmean.cluster_centers_[:,0],kmean.cluster_centers_[:,1],color = 'black')
plt.show()

# Cluster the data based on their Annual Income and Spending Score

In [22]:
X2 = data.loc[:,['Annual Income (k$)','Spending Score (1-100)']].values

In [34]:
acc = []
for k in range(1,10):
  kmeans = KMeans(n_clusters = k)
  kmeans.fit(X2)
  acc.append(kmeans.inertia_)
plt.figure(figsize = (12,6))
plt.plot(range(1,10),acc,linewidth = 3)
plt.xlabel("no of clusters")

plt.show()

In [26]:
kmean2 = KMeans(n_clusters = 5)
labels = kmean2.fit_predict(X2)
print(labels)

In [35]:
plt.figure(figsize = (14,6))
plt.scatter(X2[:,0],X2[:,1],c = kmean2.labels_,cmap = 'Dark2')
plt.scatter(kmean2.cluster_centers_[:,0],kmean2.cluster_centers_[:,1],color = 'black')
plt.xlabel("Annual Income")


plt.show()

# Cluster the data based on their Age and Annual Income and Spending Score

In [28]:
X3 = data.loc[:,['Age','Annual Income (k$)','Spending Score (1-100)']].values
X3

In [36]:
acc = []
for k in range(1,10):
  kmeans = KMeans(n_clusters = k)
  kmeans.fit(X3)
  acc.append(kmeans.inertia_)
plt.figure(figsize = (12,6))
plt.plot(range(1,10),acc,linewidth = 3)
plt.xlabel("no of clusters")


plt.show()

In [30]:
kmean3 = KMeans(n_clusters = 5)
clusters = kmean3.fit_predict(X2)
data['labels'] = clusters
print(labels)

# Visualize The Clusters in 3D

**We have to use a different method to viualize the clusters as we are using 3 dimentions so we'll use a different method**

In [31]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(111,projection = '3d')
ax.scatter(data.Age[data.labels==0],data['Annual Income (k$)'][data.labels == 0],data['Spending Score (1-100)'][data.labels == 0])
ax.scatter(data.Age[data.labels==1],data['Annual Income (k$)'][data.labels == 1],data['Spending Score (1-100)'][data.labels == 1])
ax.scatter(data.Age[data.labels==2],data['Annual Income (k$)'][data.labels == 2],data['Spending Score (1-100)'][data.labels == 2])
ax.scatter(data.Age[data.labels==3],data['Annual Income (k$)'][data.labels== 3],data['Spending Score (1-100)'][data.labels == 3])
ax.scatter(data.Age[data.labels==4],data['Annual Income (k$)'][data.labels == 4],data['Spending Score (1-100)'][data.labels == 4])
ax.view_init(30,185)

plt.show()