<hr>
<h3>Axe Bank Credit Card Customer Segmentation</h3>
<hr>

<b>Background</b>: Axe Bank wants to focus on its credit card customer base in the next financial year. They have been advised by their marketing research team, that the penetration in the market can be improved. Based on this input, the Marketing team proposes to run personalised campaigns to target new customers as well as upsell to existing customers. Another insight from the market research was that the customers perceive the support services of the back poorly. Based on this, the Operations team wants to upgrade the service delivery model, to ensure that customers queries are resolved faster. Head of Marketing and Head of Delivery both decide to reach out to the Data Science team for help.

<b>Data Description</b>: Data is of various customers of a bank with their credit limit, the total number of credit cards the customer has, and different channels through which customer has contacted the bank for any queries, different channels include visiting the bank, online and through a call centre. 

<b>Key Questions:</b> 
1. How many different segments of customers are there?
2. How are these segments different from each other?
3. What are your recommendations to the bank on how to better market to and service these customers?

### Importing the Libraries

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import pandas as pd

from sklearn.preprocessing import StandardScaler

import seaborn as sns 
import matplotlib.pyplot as plt

from scipy.spatial.distance import cdist
from sklearn.cluster import KMeans
import numpy as np 

from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import dendrogram, linkage,cophenet

In [None]:
#Reading the dataset 
df=pd.read_excel('CreditCardCustomerDataSet.xlsx')
df.shape

In [None]:
#Viewing few records
df

In [None]:
df['Customer Key'].drop_duplicates(keep='last').shape

In [None]:
df['Customer Key'].value_counts()

In [None]:
df[df['Customer Key']==50706]

In [None]:
# Dropping duplicates based on unique customer key
df = df.iloc[df['Customer Key'].drop_duplicates(keep='last').index]
df.shape

The cols : Sl_No and CustomerKey are IDs which can be eliminated as they are unique and will not have any relevant role in forming the clusters so we remove them

In [None]:
cols_to_consider=['Avg_Credit_Limit','Total_Credit_Cards','Total_visits_bank','Total_visits_online','Total_calls_made']

In [None]:
subset=df[cols_to_consider]  #Selecting only the above columns 

In [None]:
subset

### EDA 

#### Checking for Missing Values 

In [None]:
subset.isna().sum() 

No missing values were found 

#### Checking for the statistically summary 

In [None]:
subset.describe()

The min and max value of 'Avg_Credit_Limit' is very larger as compared to the other columns 
To bring the data to the same scale let's standardize the data.



## Feature Correlations

In [None]:
# Use Corr function to create correlation matrix
subset.corr()

**Plot Correlation Matrix**

In [None]:
## Use Seaborn Heatmap to visualize correlation matrix
sns.heatmap(subset.corr(),annot=True);

## Visualize feature distributions

In [None]:
sns.pairplot(subset,diag_kind='kde');

## Check Skewness

In [None]:
subset.skew()

### Log Transformation (Box cox transormation)

In [None]:
subset_2=subset.copy()

In [None]:
# Use Log transformation to scale features
## Hint : use np.log function 
subset_2['Avg_Credit_Limit'] = np.log(subset_2['Avg_Credit_Limit']+0.1) #can't take log(0) and so add a small number
subset_2['Total_visits_online'] = np.log(subset_2['Total_visits_online']+0.1)

In [None]:
subset_2.skew()

## Visualize the Normalized data

In [None]:
# Produce a scatter matrix for each pair of features in the data
sns.pairplot(subset_2,diag_kind='kde');

In [None]:
sns.heatmap(subset_2.corr(),annot=True);

##  Feature Scaling For Standardization -  Standard Scaler ( Z Score )

In [None]:
scaler=StandardScaler()
subset_scaled=scaler.fit_transform(subset_2)   

In [None]:
subset_scaled_df=pd.DataFrame(subset_scaled,columns=subset_2.columns)   #Creating a dataframe of the above results

In [None]:
subset_scaled_df

In [None]:
subset_scaled_df.skew()

## Execute K-Means Algorithm

In [None]:
## Iterate the K-Means for different values of clusters. Compute the error term and store in an object

cluster_range = range( 1, 15)
cluster_errors = []

for num_clusters in cluster_range:
    clusters = KMeans( num_clusters, n_init = 100,init='k-means++')
    clusters.fit(subset_scaled_df)
    cluster_errors.append( clusters.inertia_ )    # capture the intertia

In [None]:
# combine the cluster_range and cluster_errors into a dataframe by combining them
clusters_df = pd.DataFrame( { "num_clusters":cluster_range, "cluster_errors": cluster_errors} )
clusters_df

## Elbow Method

In [None]:
plt.figure(figsize=(12,6))
plt.plot( clusters_df.num_clusters, clusters_df.cluster_errors, marker = "o" );

## Execute the K-Means again with optimal cluster number 

In [None]:
kmeans = KMeans(n_clusters=3, n_init = 15, random_state=2345)
kmeans.fit(subset_scaled_df)

In [None]:
centroids = kmeans.cluster_centers_
centroids

In [None]:
centroid_df = pd.DataFrame(centroids, columns = subset_scaled_df.columns )

In [None]:
centroid_df

The above are the centroids for the different clusters 

#### Adding Label to the dataset

In [None]:
dataset=subset_scaled_df[:]  #creating a copy of the data 

In [None]:
dataset['KmeansLabel']=kmeans.labels_

In [None]:
dataset.head(10)

In [None]:
dataset.groupby('KmeansLabel').mean()

## Customer Profiling - Visualizing the clusters

In [None]:
sns.pairplot(dataset,diag_kind='kde',hue='KmeansLabel');

In [None]:
subset['KmeansLabel']=kmeans.labels_
subset

In [None]:
subset.groupby('KmeansLabel').mean()

### Analyse the Clusters 

Let us make a visualization to observe the different clusters by making boxplots , 
for the clusters we expect to observe statistical properties which differentiates clusters with each other 

In [None]:
dataset.boxplot(by = 'KmeansLabel',  layout=(2,4), figsize=(20, 15))
plt.show()

Looking the box plot we can observe differentiated clusters 

## Silhoutte Analysis For K-Means Clustering

In [None]:
from __future__ import print_function
%matplotlib inline

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np

print(__doc__)

In [None]:
dataset

In [None]:
X=dataset.drop('KmeansLabel',axis=1).values
y=dataset['KmeansLabel'].values

range_n_clusters = [2, 3, 4, 5, 6,7,8,9,10]

for n_clusters in range_n_clusters:
    # Create a subplot with 1 row and 2 columns
    fig, (ax1, ax2) = plt.subplots(1, 2)
    fig.set_size_inches(18, 7)

    # The 1st subplot is the silhouette plot
    # The silhouette coefficient can range from -1, 1 but in this example all
    # lie within [-0.1, 1]
    ax1.set_xlim([-0.1, 1])
    # The (n_clusters+1)*10 is for inserting blank space between silhouette
    # plots of individual clusters, to demarcate them clearly.
    ax1.set_ylim([0, len(X) + (n_clusters + 1) * 10])

    # Initialize the clusterer with n_clusters value and a random generator
    # seed of 10 for reproducibility.
    clusterer = KMeans(n_clusters=n_clusters,n_init = 100,init='k-means++',random_state=0)
    cluster_labels = clusterer.fit_predict(X)

    # The silhouette_score gives the average value for all the samples.
    # This gives a perspective into the density and separation of the formed
    # clusters
    silhouette_avg = silhouette_score(X, cluster_labels)
    print("For n_clusters =", n_clusters,
          "The average silhouette_score is :", silhouette_avg)

    # Compute the silhouette scores for each sample
    sample_silhouette_values = silhouette_samples(X, cluster_labels)

    y_lower = 10
    for i in range(n_clusters):
        # Aggregate the silhouette scores for samples belonging to
        # cluster i, and sort them
        ith_cluster_silhouette_values = \
            sample_silhouette_values[cluster_labels == i]

        ith_cluster_silhouette_values.sort()

        size_cluster_i = ith_cluster_silhouette_values.shape[0]
        y_upper = y_lower + size_cluster_i

        color = cm.Spectral(float(i) / n_clusters)
        ax1.fill_betweenx(np.arange(y_lower, y_upper),
                          0, ith_cluster_silhouette_values,
                          facecolor=color, edgecolor=color, alpha=0.7)

        # Label the silhouette plots with their cluster numbers at the middle
        ax1.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))

        # Compute the new y_lower for next plot
        y_lower = y_upper + 10  # 10 for the 0 samples

    ax1.set_title("The silhouette plot for the various clusters.")
    ax1.set_xlabel("The silhouette coefficient values")
    ax1.set_ylabel("Cluster label")

    # The vertical line for average silhouette score of all the values
    ax1.axvline(x=silhouette_avg, color="red", linestyle="--")

    ax1.set_yticks([])  # Clear the yaxis labels / ticks
    ax1.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])

    # 2nd Plot showing the actual clusters formed
    colors = cm.Spectral(cluster_labels.astype(float) / n_clusters)
    ax2.scatter(X[:, 0], X[:, 1], marker='.', s=30, lw=0, alpha=0.7,
                c=colors)

    # Labeling the clusters
    centers = clusterer.cluster_centers_
    # Draw white circles at cluster centers
    ax2.scatter(centers[:, 0], centers[:, 1],
                marker='o', c="white", alpha=1, s=200)

    for i, c in enumerate(centers):
        ax2.scatter(c[0], c[1], marker='$%d$' % i, alpha=1, s=50)

    ax2.set_title("The visualization of the clustered data.")
    ax2.set_xlabel("Feature space for the 1st feature")
    ax2.set_ylabel("Feature space for the 2nd feature")

    plt.suptitle(("Silhouette analysis for KMeans clustering on sample data "
                  "with n_clusters = %d" % n_clusters),
                 fontsize=14, fontweight='bold')

    plt.show()

# <center> Hierarchical Clustering 

Now that we have tried Kmeans , let's try hierarchical clustering with different dendograms for the same dataset and choosing the best using the cophenetic coefficient by using different types of linkages

In [None]:
linkage_methods=['single','complete','average','ward','median']
results_cophenetic_coef=[]
for i in linkage_methods :
    plt.figure(figsize=(15, 13))
    plt.xlabel('sample index')
    plt.ylabel('Distance')
    Z = linkage(subset_scaled_df, i)
    cc,cophn_dist=cophenet(Z,pdist(subset_scaled_df))
    dendrogram(Z,leaf_rotation=90.0,p=5,leaf_font_size=10,truncate_mode='level')
    plt.tight_layout()
    plt.title("Linkage Type: "+ i +" having cophenetic coefficient : "+str(round(cc,3)) )
    plt.show()
    results_cophenetic_coef.append((i,cc))
    print (i,cc)

In [None]:
results_cophenetic_coef_df=pd.DataFrame(results_cophenetic_coef,columns=['LinkageMethod','CopheneticCoefficient'])
results_cophenetic_coef_df

Looking at the best cophenetic coefficient we get is for "Average" linkage.

But looking at dendogram 'ward' and 'complete' show good difference between clusters.

So choosing 'complete' because it has high cophenetic coefficirnt and good cluster segregation.

Lets make a dendogram for the last 25 formed clusters using complete linkage to have a better view since the above dendograms are very populated 

In [None]:
#use truncate_mode='lastp' to select last p formed clusters
plt.figure(figsize=(10,8))
Z = linkage(subset_scaled_df, 'average', metric='euclidean')

dendrogram(
    Z,
    truncate_mode='lastp',  # show only the last p merged clusters
    p=25 # show only the last p merged clusters
)
plt.show()

Let's take a maximum distance around 5 to form the different clusters as clearly visible it cuts the tallest vertical lines.

In [None]:
max_d=3.2
from scipy.cluster.hierarchy import fcluster
clusters = fcluster(Z, max_d, criterion='distance')

In [None]:
set(clusters)  # So there are 3 clusters which are formed 

### Assign the clusters label to the  data set

In [None]:
dataset2=subset_scaled_df[:] #Create a duplicate of the dataset

In [None]:
dataset2['HierarchicalClusteringLabel']=clusters

In [None]:
dataset2

### Analyse the clusters 

In [None]:
dataset2.boxplot(by = 'HierarchicalClusteringLabel',  layout=(2,4), figsize=(20, 15))
plt.show()

Here also we observe differentiated clusters.

### Silhouette Score

In [None]:
from sklearn.metrics import silhouette_score
silhouette_score(dataset.drop('KmeansLabel',axis=1),dataset['KmeansLabel'])

In [None]:
from sklearn.metrics import silhouette_score
silhouette_score(dataset2.drop('HierarchicalClusteringLabel',axis=1),dataset2['HierarchicalClusteringLabel'])

Silhouette Score is better when closer 1 and worse when closer to -1

Here Kmeans score is slightly better tha Hierarchical

### Comparing Kmeans and Hierarchical Results

In [None]:
Kmeans_results=dataset.groupby('KmeansLabel').mean()
Kmeans_results

In [None]:
dataset.groupby('KmeansLabel').count()

In [None]:
Hierarchical_results=dataset2.groupby('HierarchicalClusteringLabel').mean()
Hierarchical_results

In [None]:
dataset2.groupby('HierarchicalClusteringLabel').count()

#### Carefully observing the above results we can say that : 



Cluster 0 of Kmeans appears similar to Cluster 2 of Hierarchical 


Cluster 1 of Kmeans appears similar to Cluster 3 of Hierarchical 


Cluster 2 of Kmeans appears similar to Cluster 1 of Hierarchical 



#### Let's rename 


Cluster 0 of Kmeans  and Cluster 2 of Hierarchical as G1

Cluster 1 of Kmeans  and Cluster 3 of Hierarchical as G2

Cluster 2 of Kmeans  and Cluster 1 of Hierarchical as G3



In [None]:
Kmeans_results.index=['G1','G2','G3']
Kmeans_results

In [None]:
Hierarchical_results.index=['G3','G1','G2']
Hierarchical_results.sort_index(inplace=True)
Hierarchical_results

In [None]:
Kmeans_results.plot.bar();

In [None]:
Hierarchical_results.plot.bar();

#### By both the methods of Clustering we get comparable clusters

## Cluster Profiles and Marketing Recommendation

Since both the clustering alogrithms are giving similar clusters so we can assign labels from any one of the algorithm to the original (non scaled) data  to analyse clusters profiles
( here we are assigning labels of Kmeans , same could be done using hierarchical labels) 

In [None]:
subset['KmeansLabel']=dataset['KmeansLabel']


In [None]:
subset

In [None]:
subset.groupby('KmeansLabel').mean()

#### Understanding each feature characterstics within different clusters 

In [None]:
for each in cols_to_consider:
    print (each)
    print ( subset.groupby('KmeansLabel').describe().round()[each][['count','mean','min','max']])
    
    print ("\n\n")
    
    

### Analysis of clusters and questions answered :
    

#### 1. How many different segments of customers are there? 

Answer : Total numbers of segments are 3
    
    
  

#### 3. What are your recommendations to the bank on how to better market to and service these customers? (Business Recommendations )