# **Credit Card Churn - PredictiveModelling**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

**Objective**

To build a predictive model that helps a bank identify customers at risk of churn, enabling the business to take proactive action in retaining valuable clients. This notebook focuses on the technical groundwork using Python — from data exploration to model development.

**Project Structure**

1.   Introduction
2.   Dataset Overview
3.   Data Cleaning & Exploratory Data Analysis(EDA)
4.   Feature Engineering
5.   Modelling
6.   Model Insights
7.   Conclusion



## 1. **Introduction**

**Business Problem**

The bank is facing a rising concern: an increasing number of customers are closing their credit card accounts. This churn trend is causing financial and strategic challenges, and leadership wants to understand which customers are likely to leave so they can intervene with tailored services or offers

**Project Objective**

This notebook (Part 1 of a 3-part analytical solution) tackles the problem using Python to:



*   Explore customer behavior and account data.
*   Engineer relevant features from raw variables.
*   Build predictive models that estimate churn risk.
*   Provide actionable insights that inform targeted customer retention strategies.



This groundwork leads into:



*   **Part 2**: Designing a Power BI dashboard for interactive visualizations.
*   **Part 3**: Conducting a business analysis to recommend retention strategies backed by data.




## 2. **Dataset Overview**

**Source**

This dataset is publicly available on [Kaggle: Credit Card Customers Dataset](https://www.kaggle.com/datasets/sakshigoyal7/credit-card-customers/data). It contains detailed information on 10,127 customers, including demographic attributes, account activity, and churn status.


**Key Features**

The dataset includes 18 variables that span across customer profiles and credit card usage patterns. Some notable features are:

- Customer_Age: Age of the customer
- Gender: Male or Female
- Dependent_count: Number of dependents
- Education_Level: Highest education attained
- Marital_Status: Marital status of the customer
- Income_Category: Estimated annual income
- Card_Category: Type of credit card held
- Months_on_book: Tenure with the bank
- Total_Trans_Ct: Number of transactions in the last 12 months
- Credit_Limit: Assigned credit card limit
- Attrition_Flag: Indicates whether the customer has churned


**Target Variable**

- Attrition_Flag: This binary feature identifies whether a customer is Existing or Attrited. It will serve as the target for churn prediction.


**Initial Observations**

- The dataset is clean and well-structured, with minimal missing values.
- Features are a mix of categorical and numerical types, suitable for both statistical analysis and machine learning.
- The churn rate appears imbalanced, which may require resampling techniques during modeling.


## 3. **Data Cleaning & Exploratory Data Analysis(EDA)**

### 3.1 **Loading and Exploring Data**

In [2]:
df = pd.read_csv('./drive/MyDrive/datasets/credit_card_churn/BankChurners.csv')

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 23 columns):
 #   Column                                                                                                                              Non-Null Count  Dtype  
---  ------                                                                                                                              --------------  -----  
 0   CLIENTNUM                                                                                                                           10127 non-null  int64  
 1   Attrition_Flag                                                                                                                      10127 non-null  object 
 2   Customer_Age                                                                                                                        10127 non-null  int64  
 3   Gender                                                                           

In [4]:
df.isnull().sum()

Unnamed: 0,0
CLIENTNUM,0
Attrition_Flag,0
Customer_Age,0
Gender,0
Dependent_count,0
Education_Level,0
Marital_Status,0
Income_Category,0
Card_Category,0
Months_on_book,0


### 3.2 **Removing/Renaming Columns**

3.2.1 **We will remove the last two columns**

In [5]:
df.drop(['Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1', 'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2'], axis=1, inplace=True)

3.2.2 **Update column names to lower case**

In [6]:
df.columns = df.columns.str.lower()

3.2.3 **We will rename our columns to be more concise**

In [7]:
df.rename(columns={
    'clientnum': 'client_id',
    'attrition_flag': 'churn_flag',
    'customer_age': 'age',
    'gender': 'gender',
    'dependent_count': 'dependents',
    'education_level': 'education',
    'marital_status': 'marital',
    'income_category': 'income',
    'card_category': 'card_type',
    'months_on_book': 'tenure_months',
    'total_relationship_count': 'relationships',
    'months_inactive_12_mon': 'inactive_months',
    'contacts_count_12_mon': 'contact_count',
    'credit_limit': 'limit',
    'total_revolving_bal': 'revolving_bal',
    'avg_open_to_buy': 'open_to_buy',
    'total_amt_chng_q4_q1': 'amt_chg_q4_q1',
    'total_trans_amt': 'trans_amt',
    'total_trans_ct': 'trans_ct',
    'total_ct_chng_q4_q1': 'ct_chg_q4_q1',
    'avg_utilization_ratio': 'util_ratio'
}, inplace=True)


3.2.4 **Confirm our changes**

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 21 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   client_id        10127 non-null  int64  
 1   churn_flag       10127 non-null  object 
 2   age              10127 non-null  int64  
 3   gender           10127 non-null  object 
 4   dependents       10127 non-null  int64  
 5   education        10127 non-null  object 
 6   marital          10127 non-null  object 
 7   income           10127 non-null  object 
 8   card_type        10127 non-null  object 
 9   tenure_months    10127 non-null  int64  
 10  relationships    10127 non-null  int64  
 11  inactive_months  10127 non-null  int64  
 12  contact_count    10127 non-null  int64  
 13  limit            10127 non-null  float64
 14  revolving_bal    10127 non-null  int64  
 15  open_to_buy      10127 non-null  float64
 16  amt_chg_q4_q1    10127 non-null  float64
 17  trans_amt   

In [9]:
df.tail()

Unnamed: 0,client_id,churn_flag,age,gender,dependents,education,marital,income,card_type,tenure_months,...,inactive_months,contact_count,limit,revolving_bal,open_to_buy,amt_chg_q4_q1,trans_amt,trans_ct,ct_chg_q4_q1,util_ratio
10122,772366833,Existing Customer,50,M,2,Graduate,Single,$40K - $60K,Blue,40,...,2,3,4003.0,1851,2152.0,0.703,15476,117,0.857,0.462
10123,710638233,Attrited Customer,41,M,2,Unknown,Divorced,$40K - $60K,Blue,25,...,2,3,4277.0,2186,2091.0,0.804,8764,69,0.683,0.511
10124,716506083,Attrited Customer,44,F,1,High School,Married,Less than $40K,Blue,36,...,3,4,5409.0,0,5409.0,0.819,10291,60,0.818,0.0
10125,717406983,Attrited Customer,30,M,2,Graduate,Unknown,$40K - $60K,Blue,36,...,3,3,5281.0,0,5281.0,0.535,8395,62,0.722,0.0
10126,714337233,Attrited Customer,43,F,2,Graduate,Married,Less than $40K,Silver,25,...,2,4,10388.0,1961,8427.0,0.703,10294,61,0.649,0.189


### 3.3 **Replacing Data Values**

3.3.1 **Replacing churn_flag values to Boolean True/False**

In [10]:
df['churn_flag'] = df['churn_flag'].map({
    'Attrited Customer': True,
    'Existing Customer': False
})

In [11]:
df.tail()

Unnamed: 0,client_id,churn_flag,age,gender,dependents,education,marital,income,card_type,tenure_months,...,inactive_months,contact_count,limit,revolving_bal,open_to_buy,amt_chg_q4_q1,trans_amt,trans_ct,ct_chg_q4_q1,util_ratio
10122,772366833,False,50,M,2,Graduate,Single,$40K - $60K,Blue,40,...,2,3,4003.0,1851,2152.0,0.703,15476,117,0.857,0.462
10123,710638233,True,41,M,2,Unknown,Divorced,$40K - $60K,Blue,25,...,2,3,4277.0,2186,2091.0,0.804,8764,69,0.683,0.511
10124,716506083,True,44,F,1,High School,Married,Less than $40K,Blue,36,...,3,4,5409.0,0,5409.0,0.819,10291,60,0.818,0.0
10125,717406983,True,30,M,2,Graduate,Unknown,$40K - $60K,Blue,36,...,3,3,5281.0,0,5281.0,0.535,8395,62,0.722,0.0
10126,714337233,True,43,F,2,Graduate,Married,Less than $40K,Silver,25,...,2,4,10388.0,1961,8427.0,0.703,10294,61,0.649,0.189


3.3.2 **Ordial Encoding and Grouping Income Levels**



In [12]:
df['income'].unique()

array(['$60K - $80K', 'Less than $40K', '$80K - $120K', '$40K - $60K',
       '$120K +', 'Unknown'], dtype=object)

Lets create an ordered scale of our income as a column

In [13]:
income_order = {
    'Less than $40K': 1,
    '$40K - $60K': 2,
    '$60K - $80K': 3,
    '$80K - $120K': 4,
    '$120K +': 5,
    'Unknown': 0
}

And create income_level column to apply our income scale

In [14]:
df['income_level'] = df['income'].map(income_order)

Create a function to group income levels in to Low, Mid, High

In [15]:
def group_income(val):
  if val in ['Less than $40K', '$40K - $60K']:
    return 'Low'
  elif val in ['$60K - $80K', '$80K - $120K']:
    return 'Mid'
  elif val == '$120K +':
    return 'High'
  else:
    return 'Unknown'

Create income_group column by applying group_income function on income values

In [16]:
df['income_group'] = df['income'].apply(group_income)

Confirm our changes

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 23 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   client_id        10127 non-null  int64  
 1   churn_flag       10127 non-null  bool   
 2   age              10127 non-null  int64  
 3   gender           10127 non-null  object 
 4   dependents       10127 non-null  int64  
 5   education        10127 non-null  object 
 6   marital          10127 non-null  object 
 7   income           10127 non-null  object 
 8   card_type        10127 non-null  object 
 9   tenure_months    10127 non-null  int64  
 10  relationships    10127 non-null  int64  
 11  inactive_months  10127 non-null  int64  
 12  contact_count    10127 non-null  int64  
 13  limit            10127 non-null  float64
 14  revolving_bal    10127 non-null  int64  
 15  open_to_buy      10127 non-null  float64
 16  amt_chg_q4_q1    10127 non-null  float64
 17  trans_amt   

In [18]:
df.head()

Unnamed: 0,client_id,churn_flag,age,gender,dependents,education,marital,income,card_type,tenure_months,...,limit,revolving_bal,open_to_buy,amt_chg_q4_q1,trans_amt,trans_ct,ct_chg_q4_q1,util_ratio,income_level,income_group
0,768805383,False,45,M,3,High School,Married,$60K - $80K,Blue,39,...,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,3,Mid
1,818770008,False,49,F,5,Graduate,Single,Less than $40K,Blue,44,...,8256.0,864,7392.0,1.541,1291,33,3.714,0.105,1,Low
2,713982108,False,51,M,3,Graduate,Married,$80K - $120K,Blue,36,...,3418.0,0,3418.0,2.594,1887,20,2.333,0.0,4,Mid
3,769911858,False,40,F,4,High School,Unknown,Less than $40K,Blue,34,...,3313.0,2517,796.0,1.405,1171,20,2.333,0.76,1,Low
4,709106358,False,40,M,3,Uneducated,Married,$60K - $80K,Blue,21,...,4716.0,0,4716.0,2.175,816,28,2.5,0.0,3,Mid


## 4. **Feature Engineering**

## 5. **Modelling**

## 6. **Model Insights**

## 7. **Conclusion**