## **Credit Card Churn Analysis and Data Visualitaions** 

### Import data and libraries
Before I create the visualisations I need to import the clean data and the necessary libraries.

In [11]:
import pandas as pd
import numpy as np
import seaborn as sns 
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import plotly.io as pio


In [12]:
df = pd.read_csv('../data/BankChurners_Clean.csv')
df.head()

Unnamed: 0,Clientnum,Attrition Flag,Customer Age,Gender,Dependent Count,Education Level,Marital Status,Income Category,Card Category,Months On Book,...,Credit Limit,Total Revolving Bal,Avg Open To Buy,Total Amt Chng Q4 Q1,Total Trans Amt,Total Trans Ct,Total Ct Chng Q4 Q1,Avg Utilization Ratio,Naive Bayes Classifier Attrition Flag Card Category Contacts Count 12 Mon Dependent Count Education Level Months Inactive 12 Mon 1,Naive Bayes Classifier Attrition Flag Card Category Contacts Count 12 Mon Dependent Count Education Level Months Inactive 12 Mon 2
0,768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,...,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,9.3e-05,0.99991
1,818770008,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,44,...,8256.0,864,7392.0,1.541,1291,33,3.714,0.105,5.7e-05,0.99994
2,713982108,Existing Customer,51,M,3,Graduate,Married,$80K - $120K,Blue,36,...,3418.0,0,3418.0,2.594,1887,20,2.333,0.0,2.1e-05,0.99998
3,769911858,Existing Customer,40,F,4,High School,Unknown,Less than $40K,Blue,34,...,3313.0,2517,796.0,1.405,1171,20,2.333,0.76,0.000134,0.99987
4,709106358,Existing Customer,40,M,3,Uneducated,Married,$60K - $80K,Blue,21,...,4716.0,0,4716.0,2.175,816,28,2.5,0.0,2.2e-05,0.99998


List all the column names so I can write them correctly when creating visualations. 

In [13]:
print(df.columns)

Index(['Clientnum', 'Attrition Flag', 'Customer Age', 'Gender',
       'Dependent Count', 'Education Level', 'Marital Status',
       'Income Category', 'Card Category', 'Months On Book',
       'Total Relationship Count', 'Months Inactive 12 Mon',
       'Contacts Count 12 Mon', 'Credit Limit', 'Total Revolving Bal',
       'Avg Open To Buy', 'Total Amt Chng Q4 Q1', 'Total Trans Amt',
       'Total Trans Ct', 'Total Ct Chng Q4 Q1', 'Avg Utilization Ratio',
       'Naive Bayes Classifier Attrition Flag Card Category Contacts Count 12 Mon Dependent Count Education Level Months Inactive 12 Mon 1',
       'Naive Bayes Classifier Attrition Flag Card Category Contacts Count 12 Mon Dependent Count Education Level Months Inactive 12 Mon 2'],
      dtype='object')


### Summary of Statistic

The main aim of the analysis is to find which customers are likely to churn in their credit card services. I will use a range of statistics to locate these.

### Find the lower, upper bound and any outliers of age

In [14]:
# Select the column
s = df['Customer Age']

# Calculate quartiles and IQR
q1, q3 = s.quantile([0.25, 0.75])
iqr = q3 - q1

# Define bounds
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr

# Detect outliers
outliers = s[(s < lower_bound) | (s > upper_bound)]

print("Lower bound:", lower_bound)
print("Upper bound:", upper_bound)
print("Outliers:\n", outliers)


Lower bound: 24.5
Upper bound: 68.5
Outliers:
 251    73
254    70
Name: Customer Age, dtype: int64


Locate the mean age

In [15]:
df.describe()

Unnamed: 0,Clientnum,Customer Age,Dependent Count,Months On Book,Total Relationship Count,Months Inactive 12 Mon,Contacts Count 12 Mon,Credit Limit,Total Revolving Bal,Avg Open To Buy,Total Amt Chng Q4 Q1,Total Trans Amt,Total Trans Ct,Total Ct Chng Q4 Q1,Avg Utilization Ratio,Naive Bayes Classifier Attrition Flag Card Category Contacts Count 12 Mon Dependent Count Education Level Months Inactive 12 Mon 1,Naive Bayes Classifier Attrition Flag Card Category Contacts Count 12 Mon Dependent Count Education Level Months Inactive 12 Mon 2
count,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0,10127.0
mean,739177600.0,46.32596,2.346203,35.928409,3.81258,2.341167,2.455317,8631.953698,1162.814061,7469.139637,0.759941,4404.086304,64.858695,0.712222,0.274894,0.159997,0.840003
std,36903780.0,8.016814,1.298908,7.986416,1.554408,1.010622,1.106225,9088.77665,814.987335,9090.685324,0.219207,3397.129254,23.47257,0.238086,0.275691,0.365301,0.365301
min,708082100.0,26.0,0.0,13.0,1.0,0.0,0.0,1438.3,0.0,3.0,0.0,510.0,10.0,0.0,0.0,8e-06,0.00042
25%,713036800.0,41.0,1.0,31.0,3.0,2.0,2.0,2555.0,359.0,1324.5,0.631,2155.5,45.0,0.582,0.023,9.9e-05,0.99966
50%,717926400.0,46.0,2.0,36.0,4.0,2.0,2.0,4549.0,1276.0,3474.0,0.736,3899.0,67.0,0.702,0.176,0.000181,0.99982
75%,773143500.0,52.0,3.0,40.0,5.0,3.0,3.0,11067.5,1784.0,9859.0,0.859,4741.0,81.0,0.818,0.503,0.000337,0.9999
max,828343100.0,73.0,5.0,56.0,6.0,6.0,6.0,34516.0,2517.0,34516.0,3.397,18484.0,139.0,3.714,0.999,0.99958,0.99999


46 is the mean age.

# Visualisation 1: 

I want to see if there is a pattern with existing and attrited customers with their age. 

In [26]:
import plotly.express as px

fig = px.scatter(
    df, x= "Customer Age", y="Credit Limit", color="Attrition Flag",
    title="Customer Age vs Credit Limit colored by Attrition Flag"
)
fig.show()

This visual shows whether older or younger customers are likely to become attrited.

# Visualisation 2:

I want to see if their is a trend in the distribution of customer age by their credit limit.

In [23]:
fig = px.violin(
    df,
    x="Credit Limit",        
    y="Customer Age",          
    box=True,                  
    points="all",              
    title="Distribution of Customer Age by Credit Limit"
)
fig.show()


This visual shows that the majority of customers in their 40-50s have a higher credit limit and those in their 20s or 60s have a lower credit limit.


# Visualisation 3:

I want to see the spending behaviours with customers' income.

In [37]:
df_grouped = df.groupby("Income Category")["Total Trans Amt"].sum().reset_index()

fig = px.bar(
    df_grouped,
    x="Income Category",
    y="Total Trans Amt",
    title="Total Transaction Amount by Income Category",
    text_auto=True
)
fig.show()


This bar chart shows that the customers in the "Less than $40k" income category had the highest total transaction amount with just double over the second highest income category. This indicates that these customers might using their credit to manage expenses more frequently.

# Visualisation 4: 

I want to see which card category has the least customer churning.

In [38]:
df_grouped = df.groupby(["Card Category", "Attrition Flag"]).size().reset_index(name="Count")
px.bar(df_grouped, x="Card Category", y="Count", color="Attrition Flag", barmode="group",
       title="Customer Retention by Card Category")


The Blue cards has the most overall customers, especially with exisiting customers. Silver cards shows existing customers slightly than attrited. Gold and Platinum cards have very few customers with minimal retention or churn. 

# Visualisation 5: 

Building on the previous visualisation I want to see the percentage of customer rentention by card category. 

In [42]:
fig = px.bar(
    retention_rate,
    x="Card Category",
    y="Retention Rate",
    title="Retention Rate by Card Category",
    text_auto=True,
    color="Retention Rate",
    color_continuous_scale="Blues"
)
fig.show()


This bar chart shows the percentage of customers retained for each card type to compare performance. 