# Client Churn Prediction
### CRISP-DM Cycle 4
---
The Top Bank company operates in Europe with a bank account as the main product, this product can keep client's salary and make payments. This account doesn't have any cost in the first 12 months, however, after that time trial, the client needs to rehire the bank for upcoming 12 months and redo this process every year. Recently the Analytics Team noticed that the churn rate is increasing.

As a Data Science Consultant, you need to create an action plan to decrease the number of churn customers and show the financial return on your solution.
At the end of your consultancy, you need to deliver to the TopBottom CEO a model in production, which will receive a customer base via API and return that same base with an extra column with the probability of each customer going into churn.
In addition, you will need to provide a report reporting your model's performance and the financial impact of your solution. Questions that the CEO and the Analytics team would like to see in their report:

1.  What is Top Bank's current Churn rate?
2.  How does the churn rate vary monthly?
3.  What is the performance of the model in classifying customers as churns
4.  What is the expected return, in terms of revenue, if the company uses its model to avoid churn from customers?

> Disclaimer: This is a fictional bussiness case

## 0. Preparation

### 0.1 Planning

#### Input

- Predict wheter customer will be in churn;
- Dataset with sales records and customers info.

#### Output

- Which customer will be in churn;
- Churn rate of the company;
- Performance of the model;
- Action plan


#### Tasks

1. Which customer will be in churn:
    - What is the criterion?
        - Downtime
        - Time remaining until the contract ends


2. Current churn rate of the company:
    - Calculate churn rate
    - Calculate monthly churn rate and variation

3. Performance of the model:
    - Precision at K score
    - Recall at K score

4. Action plan:
    - Discount?
    - Voucher?
    - Deposit bonus?

### 0.2 Imports

In [50]:
import polars as pl
from matplotlib import pyplot as plt
import seaborn as sns
from IPython.core.display import HTML, Image
import optuna
import duckdb
from pathlib import Path


### 0.3 Path

In [51]:
# found the main project folders
path = Path().resolve().parent
data_path = path / "data"


In [52]:
def jupyter_settings():
    """
    Plots pre settings.
    """

    %matplotlib inline
    plt.style.use("seaborn-v0_8-whitegrid")
    plt.rcParams["figure.figsize"] = [25, 12]
    plt.rcParams["font.size"] = 24
    display(HTML("<style>.container {width:100% !important;}</style>"))
    sns.set()


jupyter_settings()

sns.set_style("white")

# optuna
optuna.logging.set_verbosity(optuna.logging.WARNING)

# random state seed
seed = 42

### 0.4 Data

This dataset is avaliable [here](https://www.kaggle.com/mervetorkan/churndataset).


**Data fields**

- **RowNumber**: the number of the columns
- **CustomerID**: unique identifier of clients
- **Surname**: client's last name
- **CreditScore**: clien'ts credit score for the financial market
- **Geography**: the country of the client
- **Gender**: the gender of the client
- **Age**: the client's age
- **Tenure**: number of years the client is in the bank 
- **Balance**: the amount that the client has in their account 
- **NumOfProducts**: the number of products that the client bought 
- **HasCrCard**: if the client has a credit card 
- **IsActiveMember**: if the client is active (within the last 12 months) 
- **EstimateSalary**: estimative of anual salary of clients 
- **Exited**: if the client is a churn (*target variable*)

In [53]:
# Load data in duckdb
conn_path = str(data_path / "interim/churn.db")
conn = duckdb.connect(database=conn_path, read_only=False)
query = conn.execute("SELECT * FROM churn")
df = pl.DataFrame(query.fetchdf())

## 1. Data Description

In [56]:
df.head().transpose(include_header=True)

column,column_0,column_1,column_2,column_3,column_4
str,str,str,str,str,str
"""row_number""","""1""","""2""","""3""","""4""","""5"""
"""customer_id""","""15634602""","""15647311""","""15619304""","""15701354""","""15737888"""
"""surname""","""Hargrave""","""Hill""","""Onio""","""Boni""","""Mitchell"""
"""credit_score""","""619""","""608""","""502""","""699""","""850"""
"""geography""","""France""","""Spain""","""France""","""France""","""Spain"""
…,…,…,…,…,…
"""num_of_products""","""1""","""1""","""3""","""2""","""1"""
"""has_cr_card""","""1""","""0""","""1""","""0""","""1"""
"""is_active_member""","""1""","""1""","""0""","""0""","""1"""
"""estimated_salary""","""101348.88""","""112542.58""","""113931.57""","""93826.63""","""79084.1"""


### 1.1 Data Dimensions

In [57]:
df.shape

(10000, 14)