# Client Churn Prediction
### CRISP-DM Cycle 3
---

The Top Bank company operates in Europe with a bank account as the main product, this product can keep client's salary and make payments. This account doesn't have any cost in the first 12 months, however, after that time trial, the client needs to rehire the bank for upcoming 12 months and redo this process every year. Recently the Analytics Team noticed that the churn rate is increasing.

As a Data Science Consultant, you need to create an action plan to decrease the number of churn customers and show the financial return on your solution.
At the end of your consultancy, you need to deliver to the TopBottom CEO a model in production, which will receive a customer base via API and return that same base with an extra column with the probability of each customer going into churn.
In addition, you will need to provide a report reporting your model's performance and the financial impact of your solution. Questions that the CEO and the Analytics team would like to see in their report:

1.  What is Top Bank's current Churn rate?
2.  How does the churn rate vary monthly?
3.  What is the performance of the model in classifying customers as churns
4.  What is the expected return, in terms of revenue, if the company uses its model to avoid churn from customers?

> Disclaimer: This is a fictional bussiness case

## 0. PREPARATION

### 0.1 Planning

#### Input

- Predict wheter customer will be in churn;
- Dataset with sales records and customers info.

#### Output

- Which customer will be in churn;
- Churn rate of the company;
- Performance of the model;
- Action plan


#### Tasks

1. Which customer will be in churn:
    - What is the criterion?
        - Downtime
        - Time remaining until the contract ends


2. Current churn rate of the company:
    - Calculate churn rate
    - Calculate monthly churn rate and variation

3. Performance of the model:
    - Precision at K score
    - Recall at K score

4. Action plan:
    - Discount?
    - Voucher?
    - Deposit bonus?

### 0.2 Settings

In [1]:
# imports
import os
import sys

import numpy as np
import pandas as pd
from dotenv import load_dotenv
from IPython.core.display import HTML
from matplotlib import pyplot as plt

# load .env file
env_path = '../.env'
load_dotenv(dotenv_path=env_path)

# add home path to sys.path
path = os.getenv('HOME_PATH')
sys.path.append(path)

# import classes
from Class.QueryBuilder import QueryBuilder

In [2]:
def jupyter_settings():
    """
    Configure Jupyter settings for data visualization and display.

    This function sets up the necessary settings for Jupyter notebooks to enhance data visualization and display options.
    """
    %matplotlib inline
    plt.rcParams['figure.figsize'] = [25, 12]
    plt.rcParams['font.size'] = 24
    display(HTML('<style>.container {width:100% !important;}</style>'))
    
jupyter_settings()

# optuna
#optuna.logging.set_verbosity(optuna.logging.WARNING)

# round
pd.options.display.float_format = '{:.3f}'.format

### 0.4 Data

This dataset is avaliable [here](https://www.kaggle.com/mervetorkan/churndataset).


**Data fields**

- **RowNumber**: the number of the columns
- **CustomerID**: unique identifier of clients
- **Surname**: client's last name
- **CreditScore**: clien'ts credit score for the financial market
- **Geography**: the country of the client
- **Gender**: the gender of the client
- **Age**: the client's age
- **Tenure**: number of years the client is in the bank 
- **Balance**: the amount that the client has in their account 
- **NumOfProducts**: the number of products that the client bought 
- **HasCrCard**: if the client has a credit card 
- **IsActiveMember**: if the client is active (within the last 12 months) 
- **EstimateSalary**: estimative of anual salary of clients 
- **Exited**: if the client is a churn (*target variable*)

## 1. DATA DESCRIPTION

### 1.1 First Look

In [3]:
qb = QueryBuilder()
conn = qb.get_connection(path + "data/interim/churn.db")
query = qb.select("*").from_table("churn").limit(5).build()

result = conn.execute(query).df()
result

Unnamed: 0,row_number,customer_id,surname,credit_score,geography,gender,age,tenure,balance,num_of_products,has_cr_card,is_active_member,estimated_salary,exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


### 1.2 Data Dimensions

In [4]:
query = qb.shape("churn").build()
result = conn.execute(query).df()
result

Unnamed: 0,row_count,column_count
0,10000,14


### 1.3 Check NA

In [5]:
result = qb.build_and_execute_count_nulls(conn, "churn")
result

Unnamed: 0,row_number,customer_id,surname,credit_score,geography,gender,age,tenure,balance,num_of_products,has_cr_card,is_active_member,estimated_salary,exited
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


There's no NA in this dataset, however, there's a possibility that there are zeros in place of NAs.

In [6]:
result = qb.build_and_execute_count_zeros(conn, "churn")
result

Unnamed: 0,row_number,customer_id,credit_score,age,tenure,num_of_products,has_cr_card,is_active_member,exited,balance,estimated_salary
0,0,0,0,0,413,0,2945,4849,7963,3617,0


The only relevants values in this case are tenure and balance, all the other are categorical features.

The balance columns has more than 1/3 of zero values, it doesn't mean that they're missing values but the DS team should get more information about this.

### 1.4 Data Types

In [7]:
result = qb.get_types(conn, "churn")
result

[[('row_number', 'BIGINT'),
  ('customer_id', 'BIGINT'),
  ('surname', 'VARCHAR'),
  ('credit_score', 'BIGINT'),
  ('geography', 'VARCHAR'),
  ('gender', 'VARCHAR'),
  ('age', 'BIGINT'),
  ('tenure', 'BIGINT'),
  ('balance', 'DOUBLE'),
  ('num_of_products', 'BIGINT'),
  ('has_cr_card', 'BIGINT'),
  ('is_active_member', 'BIGINT'),
  ('estimated_salary', 'DOUBLE'),
  ('exited', 'BIGINT')]]

### 1.5 Descriptive Statistics