# Content

In the present financial landscape, credit card churn has emerged as a growing concern for banks. With customers increasingly drawn to attractive introductory offers and rewards, they are more likely to switch between credit card providers, impacting banks' profitability and customer retention. The availability of information through digital channels and the ease of comparing offers have empowered consumers to make informed decisions, further contributing to the rising churn rates. To address this challenge, banks must rethink their strategies, offering more personalized and long-term value to retain customers and foster loyalty in this competitive credit card market.

In [1]:
# load library and import dataset
library(tidyverse)
library(janitor)
df <- read_csv("/kaggle/input/credit-card-customers/BankChurners.csv")

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.2     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.2     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.1     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors

Attaching package: ‘janitor’


The following objects are masked from ‘package:stats’:

    chisq.test, fisher.test


[1mRows: [22m[34m10127[39m [1mColumns: [22m[

In [2]:
# preview data
tibble(df)
colnames(df)

# clean data
df <- clean_names(df)

df %>% 
  is.na() %>% 
  colSums() %>% 
  print()

CLIENTNUM,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,⋯,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
<dbl>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,⋯,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,9.3448e-05,0.9999100
818770008,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,44,⋯,8256.0,864,7392.0,1.541,1291,33,3.714,0.105,5.6861e-05,0.9999400
713982108,Existing Customer,51,M,3,Graduate,Married,$80K - $120K,Blue,36,⋯,3418.0,0,3418.0,2.594,1887,20,2.333,0.000,2.1081e-05,0.9999800
769911858,Existing Customer,40,F,4,High School,Unknown,Less than $40K,Blue,34,⋯,3313.0,2517,796.0,1.405,1171,20,2.333,0.760,1.3366e-04,0.9998700
709106358,Existing Customer,40,M,3,Uneducated,Married,$60K - $80K,Blue,21,⋯,4716.0,0,4716.0,2.175,816,28,2.500,0.000,2.1676e-05,0.9999800
713061558,Existing Customer,44,M,2,Graduate,Married,$40K - $60K,Blue,36,⋯,4010.0,1247,2763.0,1.376,1088,24,0.846,0.311,5.5077e-05,0.9999400
810347208,Existing Customer,51,M,4,Unknown,Married,$120K +,Gold,46,⋯,34516.0,2264,32252.0,1.975,1330,31,0.722,0.066,1.2303e-04,0.9998800
818906208,Existing Customer,32,M,0,High School,Unknown,$60K - $80K,Silver,27,⋯,29081.0,1396,27685.0,2.204,1538,36,0.714,0.048,8.5795e-05,0.9999100
710930508,Existing Customer,37,M,3,Uneducated,Single,$60K - $80K,Blue,36,⋯,22352.0,2517,19835.0,3.355,1350,24,1.182,0.113,4.4796e-05,0.9999600
719661558,Existing Customer,48,M,2,Graduate,Single,$80K - $120K,Blue,36,⋯,11656.0,1677,9979.0,1.524,1441,32,0.882,0.144,3.0251e-04,0.9997000


                                                                                                                         clientnum 
                                                                                                                                 0 
                                                                                                                    attrition_flag 
                                                                                                                                 0 
                                                                                                                      customer_age 
                                                                                                                                 0 
                                                                                                                            gender 
                                                                            

# Data manipulation

In [3]:
# change format
df$gender <- ifelse(df$gender == "M", "Male", "Female")

# create age groups
breaks <- c(0, 41, 51, Inf)

age_labels <- c("0-40", "41-50", "over 50")
age_group <- cut(df$customer_age, breaks = breaks, labels = age_labels)

df["age_group"] <- age_group

# Data exploration

Despite a 16% of attrited customer is not high rate, but the bank is deeply concerned about customer churn and understands the potential impact on its business. To address this, the bank aims to improve its services and focus on customer retention.

In [4]:
# to find most customer amount in each category
df %>% 
  select(attrition_flag) %>% 
  group_by(attrition_flag) %>% 
  summarise(count = n())

atttrition_perc <- (1627/(1627+8500))*100

atttrition_perc

attrition_flag,count
<chr>,<int>
Attrited Customer,1627
Existing Customer,8500


![Sheet 1.png](attachment:81e75dee-5663-4aec-9a51-04723b5ce658.png)

In [5]:
df %>% 
  select(gender) %>% 
  group_by(gender) %>% 
  summarise(count = n()) %>%
  arrange(desc(count))

income_preview <- df %>% 
  select(income_category) %>% 
  group_by(income_category) %>% 
  summarise( count = n()) %>% 
  arrange(desc(income_category))

income_preview

df %>% 
  select(age_group) %>% 
  group_by(age_group) %>% 
  summarise(count = n()) %>%
  arrange(desc(count))

df %>% 
  select(card_category) %>% 
  group_by(card_category) %>% 
  summarise( count = n()) %>%
  arrange(desc(count))

df %>% 
  select(education_level) %>% 
  group_by(education_level) %>% 
  summarise( count = n()) %>%
  arrange(desc(count))

gender,count
<chr>,<int>
Female,5358
Male,4769


income_category,count
<chr>,<int>
Unknown,1112
Less than $40K,3561
$80K - $120K,1535
$60K - $80K,1402
$40K - $60K,1790
$120K +,727


age_group,count
<fct>,<int>
41-50,4671
0-40,2776
over 50,2680


card_category,count
<chr>,<int>
Blue,9436
Silver,555
Gold,116
Platinum,20


education_level,count
<chr>,<int>
Graduate,3128
High School,2013
Unknown,1519
Uneducated,1487
College,1013
Post-Graduate,516
Doctorate,451


# Data analysis

In [6]:
# Attrited Customer in each card category
card_group <- df %>% 
  select(card_category, attrition_flag) %>%
  filter(attrition_flag == 'Attrited Customer') %>% 
  group_by(card_category) %>% 
  summarise(count = n()) %>% 
  arrange(desc(count))

# Attrited Customer in each gender 
gender_group <- df %>% 
  select(gender, attrition_flag) %>% 
  filter(attrition_flag == 'Attrited Customer') %>% 
  group_by(gender) %>% 
  summarise(count = n()) %>% 
  arrange(desc(count))

# Attrited Customer in each education level
educate_group <- df %>% 
  select(education_level, attrition_flag) %>% 
  filter(attrition_flag == 'Attrited Customer') %>% 
  group_by(education_level) %>%
  summarise(count = n()) %>% 
  arrange(desc(count))

We have income range for our customer and top high attrited customer percent range is 120K but this group is a few customer in our bank so, we will focus on no.2 is less income than 40K which is the most our customer.

In [7]:
# Attrited Customer in each income category
income_group <- df %>% 
  select(income_category, attrition_flag) %>% 
  filter(attrition_flag == 'Attrited Customer') %>% 
  group_by(income_category) %>% 
  summarise(count = n()) %>% 
  arrange(desc(income_category))

income_cal <- (income_group$count/income_preview$count)*100

income_cal <- data.frame(income_preview, income_group, income_cal)

income_cal %>% rename(actual_income=income_category, attrition_income=income_category.1, percent_change=income_cal) %>% arrange(desc(percent_change))

actual_income,count,attrition_income,count.1,percent_change
<chr>,<int>,<chr>,<int>,<dbl>
$120K +,727,$120K +,126,17.3315
Less than $40K,3561,Less than $40K,612,17.18618
Unknown,1112,Unknown,187,16.81655
$80K - $120K,1535,$80K - $120K,242,15.76547
$40K - $60K,1790,$40K - $60K,271,15.13966
$60K - $80K,1402,$60K - $80K,189,13.48074


![Sheet 2.png](attachment:76916085-69a4-44c0-8bd0-4a1a122f507b.png)

In [8]:
# correlationship between income, age, gender and attrited customer
df$attrition_flag <-  as.numeric(ifelse(df$attrition_flag == "Attrited Customer", 1, 0))

In [9]:
df$gender <- as.numeric(ifelse(df$gender == "Female", 1 , 0))

In [10]:
df %>% 
  select(months_inactive_12_mon, credit_limit, customer_age, gender, attrition_flag) %>% 
  cor()

Unnamed: 0,months_inactive_12_mon,credit_limit,customer_age,gender,attrition_flag
months_inactive_12_mon,1.0,-0.020393791,0.054360999,0.0111633,0.15244881
credit_limit,-0.02039379,1.0,0.002476227,-0.42080627,-0.02387299
customer_age,0.054361,0.002476227,1.0,0.01731152,0.01820314
gender,0.0111633,-0.42080627,0.01731152,1.0,0.0372717
attrition_flag,0.15244881,-0.023872995,0.018203139,0.0372717,1.0


![Sheet 4.png](attachment:61a8bfd1-6245-4da1-ae13-272faaf37ea4.png)

Months Inactive in 12 month is kindly close to one more than other factor, so it can summarise to attrition customer and Months Inactive in 12 month have relationship more than credit limit, customer age and gender.