### A leading bank wants to develop a customer segmentation to give promotional offers to its customers. They collected a sample that summarizes the activities of users during the past few months. You are given the task to identify the segments based on credit card usage.

### Data Dictionary for Market Segmentation:

#### 1. spending: Amount spent by the customer per month (in 1000s)
#### 2. advance_payments: Amount paid by the customer in advance by cash (in 100s)
#### 3. probability_of_full_payment: Probability of payment done in full by the customer to the bank
#### 4. current_balance: Balance amount left in the account to make purchases (in 1000s)
#### 5. credit_limit: Limit of the amount in credit card (10000s)
#### 6. min_payment_amt : minimum paid by the customer while making payments for purchases made monthly (in 100s)
#### 7. max_spent_in_single_shopping: Maximum amount spent in one purchase (in 1000s)

### Importing all required Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score

In [2]:
import os
os.chdir('C:\\Users\\WELCOME\\Downloads\\PYTHON FILES\\4.Data Mining\\Project')

### 1.1 Read the data and do exploratory data analysis. Describe the data briefly.

In [3]:
df_clust = pd.read_csv("bank_marketing_part1_Data-1.csv")

### Checking the data

In [4]:
df_clust.head()

Unnamed: 0,spending,advance_payments,probability_of_full_payment,current_balance,credit_limit,min_payment_amt,max_spent_in_single_shopping
0,19.94,16.92,0.8752,6.675,3.763,3.252,6.55
1,15.99,14.89,0.9064,5.363,3.582,3.336,5.144
2,18.95,16.42,0.8829,6.248,3.755,3.368,6.148
3,10.83,12.96,0.8099,5.278,2.641,5.182,5.185
4,17.99,15.86,0.8992,5.89,3.694,2.068,5.837


In [5]:
df_clust.shape

(210, 7)

In [6]:
df_clust.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 210 entries, 0 to 209
Data columns (total 7 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   spending                      210 non-null    float64
 1   advance_payments              210 non-null    float64
 2   probability_of_full_payment   210 non-null    float64
 3   current_balance               210 non-null    float64
 4   credit_limit                  210 non-null    float64
 5   min_payment_amt               210 non-null    float64
 6   max_spent_in_single_shopping  210 non-null    float64
dtypes: float64(7)
memory usage: 11.6 KB


In [7]:
df_clust.isnull().sum()

spending                        0
advance_payments                0
probability_of_full_payment     0
current_balance                 0
credit_limit                    0
min_payment_amt                 0
max_spent_in_single_shopping    0
dtype: int64

### Checking Summary Statistic

In [8]:
df_clust.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
spending,210.0,14.847524,2.909699,10.59,12.27,14.355,17.305,21.18
advance_payments,210.0,14.559286,1.305959,12.41,13.45,14.32,15.715,17.25
probability_of_full_payment,210.0,0.870999,0.023629,0.8081,0.8569,0.87345,0.887775,0.9183
current_balance,210.0,5.628533,0.443063,4.899,5.26225,5.5235,5.97975,6.675
credit_limit,210.0,3.258605,0.377714,2.63,2.944,3.237,3.56175,4.033
min_payment_amt,210.0,3.700201,1.503557,0.7651,2.5615,3.599,4.76875,8.456
max_spent_in_single_shopping,210.0,5.408071,0.49148,4.519,5.045,5.223,5.877,6.55


### Checking for Duplicates

In [9]:
dups = df_clust.duplicated()
print('Number of duplicate rows = %d' % (dups.sum()))

Number of duplicate rows = 0


### Scaling the data

In [10]:
from sklearn.preprocessing import StandardScaler

In [11]:
X = StandardScaler()

In [12]:
scaled_clust = pd.DataFrame(X.fit_transform(df_clust), columns=df_clust.columns)

In [13]:
scaled_clust.head()

Unnamed: 0,spending,advance_payments,probability_of_full_payment,current_balance,credit_limit,min_payment_amt,max_spent_in_single_shopping
0,1.754355,1.811968,0.17823,2.367533,1.338579,-0.298806,2.328998
1,0.393582,0.25384,1.501773,-0.600744,0.858236,-0.242805,-0.538582
2,1.4133,1.428192,0.504874,1.401485,1.317348,-0.221471,1.509107
3,-1.384034,-1.227533,-2.591878,-0.793049,-1.639017,0.987884,-0.454961
4,1.082581,0.998364,1.19634,0.591544,1.155464,-1.088154,0.874813


### Creating Clusters using KMeans

In [14]:
k_means = KMeans(n_clusters = 2,random_state=1)

In [15]:
k_means.fit(scaled_clust)

KMeans(n_clusters=2, random_state=1)

### Cluster Output for all the observations

In [16]:
k_means.labels_

array([1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1,
       0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1,
       1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0,
       1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1])

### Calculating WSS for other values of K - Elbow Method

In [17]:
wss =[] 

In [18]:
for i in range(1,11):
    KM = KMeans(n_clusters=i,random_state=1)
    KM.fit(scaled_clust)
    wss.append(KM.inertia_)

In [19]:
wss

[1469.9999999999998,
 659.171754487041,
 430.6589731513006,
 371.38509060801096,
 327.21278165661346,
 289.31599538959495,
 262.98186570162267,
 241.81894656086033,
 223.91254221002725,
 206.39612184786694]

In [20]:
k_means = KMeans(n_clusters = 3,random_state=1)
k_means.fit(scaled_clust)
labels = k_means.labels_

In [21]:
silhouette_score(scaled_clust,labels,random_state=1)

0.4007270552751299

In [22]:
k_means = KMeans(n_clusters = 4,random_state=1)
k_means.fit(scaled_clust)
labels = k_means.labels_

In [23]:
silhouette_score(scaled_clust,labels,random_state=1)

0.3276547677266193

### Cluster evaluation