# Client Churn Prediction
### CRISP-DM Cycle 3
---

The Top Bank company operates in Europe with a bank account as the main product, this product can keep client's salary and make payments. This account doesn't have any cost in the first 12 months, however, after that time trial, the client needs to rehire the bank for upcoming 12 months and redo this process every year. Recently the Analytics Team noticed that the churn rate is increasing.

As a Data Science Consultant, you need to create an action plan to decrease the number of churn customers and show the financial return on your solution.
At the end of your consultancy, you need to deliver to the TopBottom CEO a model in production, which will receive a customer base via API and return that same base with an extra column with the probability of each customer going into churn.
In addition, you will need to provide a report reporting your model's performance and the financial impact of your solution. Questions that the CEO and the Analytics team would like to see in their report:

1.  What is Top Bank's current Churn rate?
2.  How does the churn rate vary monthly?
3.  What is the performance of the model in classifying customers as churns
4.  What is the expected return, in terms of revenue, if the company uses its model to avoid churn from customers?

> Disclaimer: This is a fictional bussiness case

## 0. PREPARATION

### 0.1 Planning

#### Input

- Predict wheter customer will be in churn;
- Dataset with sales records and customers info.

#### Output

- Which customer will be in churn;
- Churn rate of the company;
- Performance of the model;
- Action plan


#### Tasks

1. Which customer will be in churn:
    - What is the criterion?
        - Downtime
        - Time remaining until the contract ends


2. Current churn rate of the company:
    - Calculate churn rate
    - Calculate monthly churn rate and variation

3. Performance of the model:
    - Precision at K score
    - Recall at K score

4. Action plan:
    - Discount?
    - Voucher?
    - Deposit bonus?

### 0.2 Settings

In [4]:
# imports
import os
import sys
import duckdb

from dotenv import load_dotenv
from IPython.core.display import HTML

import polars as pl


# load .env file
env_path = '../.env'
load_dotenv(dotenv_path=env_path)

# add home path to sys.path
path = os.getenv('HOME_PATH')
sys.path.append(path)

### 0.4 Data

This dataset is avaliable [here](https://www.kaggle.com/mervetorkan/churndataset).


**Data fields**

- **RowNumber**: the number of the columns
- **CustomerID**: unique identifier of clients
- **Surname**: client's last name
- **CreditScore**: clien'ts credit score for the financial market
- **Geography**: the country of the client
- **Gender**: the gender of the client
- **Age**: the client's age
- **Tenure**: number of years the client is in the bank 
- **Balance**: the amount that the client has in their account 
- **NumOfProducts**: the number of products that the client bought 
- **HasCrCard**: if the client has a credit card 
- **IsActiveMember**: if the client is active (within the last 12 months) 
- **EstimateSalary**: estimative of anual salary of clients 
- **Exited**: if the client is a churn (*target variable*)

## 1. DATA DESCRIPTION

In [8]:
# Connection to the database
conn_path = path + "data/interim/churn.db"
conn = duckdb.connect(database=conn_path, read_only=False)
cursor = conn.cursor()

# Query the database
query = "SELECT * FROM churn"
cursor.execute(query)
data = cursor.fetchall()

# Get the column names
cols = [description[0] for description in cursor.description]

# Close the connection
cursor.close()
conn.close()

# Create a Polars Dataframe
df1 = pl.DataFrame(data, schema=cols)

### 1.1 First Look

### 1.2 Data Dimensions

### 1.3 Check NA

### 1.4 Data Types

### 1.5 Descriptive Statistics