# Case Study: Churn prediction project

**Data Science Use Case in Marketing**: Customer Churn Rate Prediction
Customer churn is a tendency of customers to cancel their subscriptions to a service they have been using and, hence, stop being a client of that service. Customer churn rate is the percentage of churned customers within a predefined time interval. It's the opposite of the customer growth rate that tracks new clients.

Customer churn rate is a very important indicator of customer satisfaction and the overall business wellness of the company. Apart from natural churn which always takes place in any business, or seasonable churn typical of some services, there are other factors that can mean something in the company has gone wrong and should be fixed. These factors are:

* lack or low quality of customer support<br>,
*  negative customer experiences<br>,
* switching to a competitor with better conditions or pricing strategy<br>,
* customers’ priorities changed<br>,
* long-time customers don’t feel satisfied<br>,
* the service didn't meet customers’ expectations<br>,
* finance issues<br>,
* fraud protection on customers' payments<br>.

High customer churn rate represents a serious problem for any company for the following reasons:

* It correlates with the company's revenue loss<br>.
* It takes much more money to acquire new customers than to retain the existing ones. This is especially true for highly ** competitive markets<br>.
* In the case of churning because of poor customer service, the company's reputation may be heavily damaged because of negative reviews left by unsatisfied ex-customers on social media or review websites<br>.
Customer retention is a crucial component of the business strategy for all subscription-based services. To predict customer churn rate and undertake the corresponding preventive measures, it's necessary to gather and analyze the information on customer behavior (purchase intervals, the overall period of being a client, cancellations, follow-up calls and messages, online activity) and figure out which attributes and their combinations are characteristic to the clients that are at risk of leaving. Knowing in advance which customers may churn soon, especially in the case of high revenue or long-time customers, can help the company to focus exactly on them and develop an efficient strategy to try to convince them to stay. The approach can include a call to such clients with a special offer of a gift, discount, subscription upgrading for the same price, or any other customized experience.

Technically, customer churn prediction is a typical classification problem of machine learning when the clients are labeled as "yes" or "no", in terms of being at risk of churning, or not. Let's investigate this use case in Python on real-world data.

We'll use Decision Tree to predict churn


## 1.1 

* Dataset: https://drive.google.com/file/d/15C1130YVEymfJltHUFX9nZr-OGns82gJ/view?usp=drive_link



### Dataset  information

customerID
Customer ID

gender
Whether the customer is a male or a female

SeniorCitizen
Whether the customer is a senior citizen or not (1, 0)

Partner
Whether the customer has a partner or not (Yes, No)

Dependents
Whether the customer has dependents or not (Yes, No)

tenure
Number of months the customer has stayed with the company

PhoneService
Whether the customer has a phone service or not (Yes, No)

MultipleLines
Whether the customer has multiple lines or not (Yes, No, No phone service)

InternetService
Customer’s internet service provider (DSL, Fiber optic, No)

OnlineSecurity
Whether the customer has online security or not (Yes, No, No internet service)

## 1.2 Data Science Workflow

#### Data preprocessing
**Download the data, read it with pandas<br>**
**Look at the data<br>**
**Make column names and values look uniform<br>**
**Check if all the columns read correctly<br>**
**Check if the churn variable needs any preparation<br>**

#### EDA

**check for missing values/ treat them with right imputation techniques<br>**
**Look at the target variable (churn)<br>**
**Look at numerical and categorical variables<br>**
**Visualize the data -- understand the relationship/impacts<br>**
**Feature selection -- requires domain understanding along with techincal knowledge<br>**
**Feature Engineering --> encoding the data/creating the new features<br>**
**conduct a hypothesis testing if required**<br>

#### Build The model
**Setting up the framework<br>**
**Model building<br>**
**Feature selection<br>**
**Model evaluation<br>**
**model optimization<br>**
**Model interepretation<br>**
**Choosing the right model<br>**


#### Deployment 
**Understand Mlops -- Machine learning operation-optional**


#### Data preprocessing
* Download the data, read it with pandas<br>
* Look at the data<br>
* Make column names and values look uniform<br>
* Check if all the columns read correctly<br>
* Check if the churn variable needs any preparation<br>

In [3]:
#import libraries


In [4]:
#read data


In [5]:
#prepare data



df.columns = df.columns.str.lower().str.replace(' ', '_')

categorical_columns = list(df.dtypes[df.dtypes == 'object'].index)

for c in categorical_columns:
    df[c] = df[c].str.lower().str.replace(' ', '_')

In [11]:
#convert churn variable to int

df.churn = (df.churn == 'yes').astype(int)

#### EDA

* check for missing values/ treat them with right imputation techniques<br>
* Look at the target variable (churn)<br>
* Look at numerical and categorical variables<br>
* Visualize the data -- understand the relationship/impacts<br>
* Feature selection -- requires domain understanding along with techincal knowledge<br>
* Feature Engineering --> encoding the data/creating the new features<br>
* conduct a hypothesis testing if required<br>

In [None]:
#check for missing data

In [5]:
#check if class if balanced or not




In [6]:
#check for the mean churn value

In [21]:
#check for numerical columns




In [22]:
#check for categorical columns




In [7]:
#check for unique values in each categorical columns
