# Credit Card Retention Analysis

## Imports

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.graph_objs as go
from plotly.offline import iplot
sns.set()
pd.options.display.max_columns = 999

In [80]:
data = pd.read_csv('../data/BankChurners_v2.csv')

In [10]:
data = data[['CLIENTNUM', 'Attrition_Flag', 'Customer_Age', 'Gender',
       'Dependent_count', 'Education_Level', 'Marital_Status',
       'Income_Category', 'Card_Category', 'Months_on_book',
       'Total_Relationship_Count', 'Months_Inactive_12_mon',
       'Contacts_Count_12_mon', 'Credit_Limit', 'Total_Revolving_Bal',
       'Avg_Open_To_Buy', 'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt',
       'Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1', 'Avg_Utilization_Ratio',]]

In [14]:
data['Education_Level'] = data['Education_Level'].fillna('Unknown')
data['Marital_Status'] = data['Marital_Status'].fillna('Unknown')
data['Income_Category'] = data['Income_Category'].fillna('Unknown')

In [18]:
# https://towardsdatascience.com/data-preprocessing-with-python-pandas-part-5-binning-c5bd5fd1b950
bins = [25, 30, 40, 50, 60, 70, 80]
labels = ['20s', '30s', '40s', '50s', '60s', '70s']
data['Customer_Age_bins'] = pd.cut(data['Customer_Age'], bins=bins, labels=labels, include_lowest=True, right=False)

***

## EDA (exploratory vs explanatory)

We will now be moving into what I consider to be the funnest part of any analysis. This is where we will get to explore the data in any which way we want in order to find the "story" within the data. Here we know we care about attrition and attributes that are leading to attrition in the hopes of advising the company on how to reduce lost customers. 

In Cole Knaflic's Storytelling with Data, she covers the difference between exploratory and explanatory analysis and that will be something we emphasize here as well. Typical pitfalls I see when building a final deliverable is to include what was meant to be exploratory analysis. Basically I define this as visuals for ME-- and visuals for YOUR AUDIENCE. Visuals for me can be messy, complex, and anything that helps lead me to the next step or to a conclusion. From there, I can develop a visual for YOUR AUDIENCE that simplifies the finding in a way you can understand in 10 seconds or less.

### Sanity Checks

Let's start by confirming to ourselves the composition of our client data. A part of any good analysis is continued sanity checks, so let's verify that 16% of our dataset are attrited customers. We can do this simply by looking at counts in each bucket and leveraging the `.value_counts()` method in python. This method will count the number of instances in the dataset that fall into either category.

In [20]:
data['Attrition_Flag'].value_counts()

Existing Customer    8500
Attrited Customer    1627
Name: Attrition_Flag, dtype: int64

To check our 16%, we can print the values as follows:

In [21]:
data['Attrition_Flag'].value_counts()['Attrited Customer'] / data.shape[0]

0.1606596227905599

If we wanted to keep this in our file to dynamically change if our dataset changes, we can write the following code:

In [22]:
print(round(data['Attrition_Flag'].value_counts()['Attrited Customer'] / data.shape[0] * 100 , 2) , '% of our customers have churned, which matches the documentation')

16.07 % of our customers have churned, which matches the documentation


Great! We've sanity checked ourselves. We can move on..