# Problem Background and Motivation



### "Grow fast or Die slow"

As <b>McKinsey & Co </b> stated, companies either <u>grow fast or die slow</u>. They recommend focusing on customer retention to achieve fast growth: 
<I>“Technology and software companies spend millions acquiring new customers, yet <b>customer retention</b> is what separates top performers from their competitors.”</I> 

This indicates that only companies who have successfully dealt with <b>customer churn</b> have sustained growth throughout time. 

### So, What is Customer Churn?

There is not a single business in the world that has never lost a customer. When customers are unhappy, they stop doing business with you. It’s that simple. And every business deals with it differently: some immediately start looking for new customers to replace the loss; others throw all their forces at analyzing what went wrong and how to put a lid on others trying to run away.

This problem is called <u>customer churn</u> – the number of customers who leave a company during a given time period.

### Impact of Customer Churn

Customer churn has a significant impact on your business as it lowers revenues and profits. Yet surprisingly, more than 2 out of 3 companies have no strategy for preventing customer churn.

![impact%20of%20customer%20churn.png](attachment:impact%20of%20customer%20churn.png)

### Compounding effect of Customer Churn

The image above demonstrates why it's crucial for businesses to track customer churn since it may seriously harm a company's bottom line.

<b>Here's how:</b>

Consider the chart below, which visualizes the total customers between two companies—one with a five percent churn rate and another with a 10 percent churn rate. Not much difference, right? Just five percent. 

![ChurnRate.png](attachment:ChurnRate.png)

So just over two years, company A (with the five percent churn rate) has nearly five times the total customers. That is a huge difference in overall customer count from just a five percent difference in churn rate.

This shows how bad a company could get hit over time if they do not find a strategy to solve customer churn. 

## Let's dive into some facts


### 1. Companies lose 1.6 trillion dollars per year due to customer churn!
According to the <a href="https://www.forrester.com/bold">Forrester</a>, it costs <b>5 TIMES MORE</b> to acquire new customers than it does to keep an existing one.

![cost-of-customer-churn.png](attachment:cost-of-customer-churn.png)

In addition, It costs companies 16 times more to bring a new customer up to the same level as an existing customer.

### 2. The probability of selling to an existing customer is 60-70%, and only 5-20% to sell to a new prospect.

![new-prospects-vs-existing-customers-costs.jpg](attachment:new-prospects-vs-existing-customers-costs.jpg)

This shows that it makes perfect sense that focusing on reducing churn is paramount since keeping your customers is profitable!

### 3. Majority of startups face 60% churn rate

![Startup-Churn-Churn-Stats.jpg](attachment:Startup-Churn-Churn-Stats.jpg)

Owing to unstablity in early stage startups and the market being highly competitive, they face 60% churn rate at the beggining.

## Majority of companies faced Customer Churn Issues at some point of time

Most recent example, Netflix disclosed that it had lost 200,000 subscribers in the last quarter. The reasons cited for the increase in churn were that Netflix had cracked down on subscribers sharing their logins and had increased the monthly subscription price also the other competitors were offering more attractive prices. 

![netflix%20churn%20rates.png](attachment:netflix%20churn%20rates.png)

<hr>
So, all of these facts sbove are making it evident that the longer you work on building a <b>retention strategy</b> the more it pays off. Otherwise, you are not far away from losing all of your customers to competitors. 
Hence, Once such strategy to make sure our customer retention rate stays strong, is to have a <b>prediction model</b> in place which could tell us how likely a customer can leave your subscription.<br>
The results from prediction model can help us build and deploy solid retention strategies which would avoid any harm to bottom line of the business.
<hr>

## Importing Package Dependencies


### Libraries

First things first, let's import the libraries which will be used in our machine learning model.
>**Pandas:**<br> 
Pandas has been one of the most commonly used tools for Machine learning, which is used for data cleaning and analysis.Based on the features available in pandas we can say pandas is best for handling data. It can handle missing data, cleaning up the data and it supports multiple file formats. This means it can read or load data in many formats like CSV, Excel, SQL, etc.
In this project, we will load CSV file using Pandas. 

>**SKLearn:**<br>
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python.
>>**Logistic Regression:** It is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other words, the logistic regression model predicts P(Y=1) as a function of X.
>>>**Accuracy Score:** The accuracy_score function calculates the accuracy score for the final output we are trying to predict (target or Y) against the true outcome of the target.

>**Pickle:**<br>
Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it’s the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

In [1]:
import pandas as pd                                            # Pandas is a Python library used for working with data sets.
                                                               # It has functions for analyzing, cleaning, exploring, 
                                                               # and manipulating data.

from sklearn.linear_model import LogisticRegression            # It provides a selection of efficient tools for machine 
                                                               # learning and statistical modeling including classification,
                                                               # regression, clustering and dimensionality reduction 
                                                               # via a consistence interface in Python.    

from sklearn.metrics import accuracy_score                     # sometimes we just import specific functions from an
                                                               # open source package because that's all we need to use
                                                               # REMEMBER! Importing an entire package may be a significant
                                                               # waste of RAM
            
import pickle                                                  # this library is used to translate our Python code into
                                                               # a 'serialized' form...thus making it accessible and 
                                                               # easier to work with in deployment

## Getting Our Data

> -  We will use pandas (pd) to read our **CSV file - telco customer churn**

<hr>

>  - One of the best functionality of pandas library is that it can read any type of file. Be it CSV, JSON, HTML, SQL... you name it, pandas has capability to read, analyze and cleanse all type of files. 
>  - If we talk more about capabilities of panda, Well pandas is capable of reading data files worth 100 GB. In effect, this benchmark is so large that it would take an extraordinarily large data set to reach it

In [2]:
df = pd.read_csv('telco_customer_churn.csv')                   # The pd.read_csv() function in pandas helps to read 
                                                               # comma-separated values (CSV). It takes multiple arguments. 
                                                               # We can simply pass a filename to it, provided pandas code 
                                                               # and content file are in the same directory.

## Preparing the Data

>Now we will apply <b>iloc()</b> function in the next two lines of codes below which would help us in selecting rows and columns from the dataset. 
As a result, we will end up with two new objects - X and y.<br>
>><b>X</b> contains <b>Independent Variables (Input Variables)</b>, which are primarily the drivers of our prediction. These variables are placed from column 2 to column 6 in our csv file. 
>>>Whereas, <b>Y</b> is the <b>Dependent Variable (Target Variable)</b>, this is what we want to predict. This variable is place in column 1 in our csv file.

<hr>

> As always, to avoid garbage in garbage out situation, you must have a detailed checklist for data pre-processing before we feed into our machine learning model. 
>> - **Check for Data Types**<br>
Check if all the columns are rightly classified as the right data type? If not, change the data types.
>> - **Check Column names**<br>
Do column names have lengthy names, special characters, spaces etc ? If so, the column names have to be changed.
>> - **Missing Values — Drop or Impute ?**<br>
If there are lots of missing values in a column ( eg: round 60% ), it might be advisable to drop those columns ( rather than trying to impute with some value )
>> - **Data Distribution**<br>
Finding the skew and kurtosis would be a good start to figure out if the data in any of the columns need any treatment.
>> - **Outliers**<br>
If there are Outliers, Cap them or remove them from the data-set.
>> - **Data Imbalance**<br>
If you are building a classification model then check if all the classes in your data at-least almost evenly distributed. <br>

> So, Adhering to these checklists will ensure better accuracy of our machine learning model.

In [3]:
X = df.iloc[:,1:len(df.columns)]                                # The iloc() function in python is one of the functions 
                                                                # defined in the Pandas module that helps us to select
                                                                # a specific row or column from the data set.
        
                                                                # X is set of independent variables.
            
y = df.iloc[:,0]                                                # y is the dependent variable or target variable.

## Building the Machine Learning Model

>**First Line:**<br>
Let's create a new object named 'model' which will predict the probablity of our dependent variable Y. 
>>The logistic regression uses maximum likelihood, which is an iterative procedure.  The first iteration (called iteration 0) is the log likelihood of the “null” or “empty” model; that is, a model with no predictors. At the next iteration, the predictor(s) are included in the model.  At each iteration, the log likelihood increases because the goal is to maximize the log likelihood. Hence, we have kept our maximum number of iterations as 800. 

>**Second Line:**<br>
We will now apply the SKLearn function, <b>fit()</b> to our newly built object 'model'. 
>> Fitting our model to (i.e. using the <b>fit()</b> method on) the training data is essentially the training part of the modeling process. It finds the coefficients for the equation specified via the algorithm being used, In our case its Logistic Regression.<br>
So In a nutshell, fitting is equal to training. Then, after model is trained, the model can be used to make predictions, usually with a <b>predict()</b> method call which we will use in the libe below. 

>**Third Line:**<br>
As we trained our model in the line above, predict() function will enable us to predict the dependent variable (Y) on the basis of the trained model.

>**Fourth Line:**<br>
Now, let's print out the accuracy of our prediction.

<hr>

> So, we want to predict the target variable (Y) by performing logistic regression classification algorithm.<br>
But is there any way to prove that the model we are building will be the best? So, let's think about how could evaluate our machine learning model. 
Listing down the methods by which we could evaluate the classification models. 
>> - **Confusion Matrix**<br>
A confusion matrix, also known as an error matrix, is a summarized table used to assess the performance of a classification model. The number of correct and incorrect predictions are summarized with count values and broken down by each class.
>> - **F1 Score**<br>
The F1 score is a measure of a test’s accuracy — it is the harmonic mean of precision and recall. It can have a maximum score of 1 (perfect precision and recall) and a minimum of 0. Overall, it is a measure of the preciseness and robustness of your model.
>> - **AUC-ROC Curve**<br>
The AUC-ROC Curve is a performance measurement for classification problems that tells us how much a model is capable of distinguishing between classes. A higher AUC means that a model is more accurate.

> So, with these methods we could know how good our machine learning model is. 


In [4]:
model = LogisticRegression(max_iter=800)                           # LogisticRegression is a function from the SKLearn 
                                                                   # library, which is used to predict a binary outcome, 
                                                                   # such as yes or no, based on prior observations of a 
                                                                   # data set. 
            
                                                                   # "model" is our object which will predict the 
                                                                   #  probablity of our dependent variable Y.

model.fit(X,y)                                                     # Fit() function will train our model. 

predictions = model.predict(X)                                     # predict() function will predict the dependent variable
                                                                   # as our model is trained. 
    
print(accuracy_score(y,predictions))                               # printing out accuracy score.  

0.7762317194377396


## Deploying our Maching Learning Model

>**pickle_out** is a new object that allows to WRITE BINARY (wb) to a new file called 'classifier'<br>
once completed, we'll find this new file in the same folder as our Jupyter Notebook

>**pickle.dump()** translates our model functionality (as defined above) into binary and loads it into the pickle_out object...which, by extension, dumps everything into the new file "classifier"<br>

>**pickle_out.close()** is simply shutting down the pickle operation.

<hr>

> Streamlit is a great tool to create beautiful data applications quite easily but when it comes to deploying them and making them accessible, we should make sure that our data is secured. Because protecting data from internal or external corruption and illegal access protects a company from financial loss, reputational harm, consumer trust degradation, and brand erosion.<br>

In [5]:
pickle_out = open('classifier', mode='wb')                          # We are creating a new object called 'pickle_out'.  
                                                                    # This object contains a new file called 'classifier'
                                                                    # and this is in 'write' mode.  And we are opening 
                                                                    # it to write 'binary' to it (wb).
            
pickle.dump(model, pickle_out)                                      # use the pickle library to 'export' or 'translate' our 
                                                                    # previously created ML model object ('model') into the
                                                                    # binary 'classifier' file created in the code 
                                                                    # immediately preceding this line.
            
pickle_out.close()                                                  # close the pickle object now that we've written our 
                                                                    # Python to it.

>  - The '''**writefile app.py**''' lets us output code developed in our Notebook to a Python module.<br>
>  - In second line, we imported **pickle library** to implement the binary protocol.<br>
>  - In the third line, we imported **streamlit** library, Streamlit helps us create web apps for machine learning models in a short time, you can also customize the web apps. <br><br>
>  - Then, we will define our function 'prediction' which will take inputs from the data we enter in the model, and will return us the prediction. 
>  - Finally, In the end, we launch streamlit which will help us work with our model on a webpage. 

In [6]:
%%writefile app.py


import pickle
import streamlit as st

# this function will add a new background image on the Streamlit web app of our choice. 
def add_bg_from_url():
    st.markdown(
         f"""
         <style>
         .stApp {{
             background-image: url("https://res.allmacwallpaper.com/get/iMac-21-inch-wallpapers/Prediction-background-1920x1080/1713-9.jpg");
             background-attachment: fixed;
             -webkit-background-size: cover;
             -moz-background-size: cover;
             -o-background-size: cover;
            background-size: cover;
            height:100%;
         }}
         </style>
         """,
         unsafe_allow_html=True
     )

add_bg_from_url() 

pickle_in = open('classifier', 'rb')
classifier = pickle.load(pickle_in)

@st.cache()

# Defining the function - prediction() which will make the prediction using data
# inputs from users.

def prediction(senior_citizen, has_dependents,
               months_as_customer, has_internet_service, has_month_to_month_contract):
    
    # Making prediction based on the inputs received from the users.
    prediction = classifier.predict(
        [[senior_citizen, has_dependents,months_as_customer, has_internet_service, has_month_to_month_contract]])
    
    if prediction == 0:
        pred = 'This customer is not likely to churn'
    else:
        pred = 'Warning! This customer might churn'
    return pred

# This is the main function in which we define our webpage
def main():
    
    # Creating the input fields
    senior_citizen = st.number_input("Are they Senior Citizen? (0,1)",
                                  min_value=0,
                                  max_value=1,
                                  value=0,
                                  step=1
                                 )
    has_dependents = st.number_input("Do they have Dependents? (0-1)",
                              min_value=0,
                              max_value=1,
                              value=0,
                              step=1
                             )

    months_as_customer = st.number_input("Months as customers",
                              min_value=0,
                              max_value=100,
                              value=1,
                              step=1
                             )
    has_internet_service = st.number_input("Do they have internet service? (0,1)",
                          min_value=0,
                          max_value=1,
                          value=0,
                          step=1
                         )
    has_month_to_month_contract = st.number_input("Do they have monthly subscription? (0,1)",
                          min_value=0,
                          max_value=1,
                          value=0,
                          step=1
                         )

    result = ""
    
    # When 'Predict' is clicked, make the prediction and store it
    if st.button("Predict"):
        result = prediction(senior_citizen, has_dependents,months_as_customer, has_internet_service, has_month_to_month_contract)
        st.success(result)
        
if __name__=='__main__':
    main()

Overwriting app.py


In [7]:
!streamlit run app.py                                                 # Launching the Streamlit web app!! We are all set :D 

^C


# What's Next??

Well, We created this model in order to forecast customer churn. This is wonderful to know, but if we don't implement a strategy based on what the machine learning model's results indicate, our efforts will be in useless.

Hence, Here are some suggested strategies we could deploy for customers who are likely to churn!
> - **Better Pricing and gift offers:**<br>
If customers find a more cost-effective solution to the problem they want to solve, they may churn. This is why it's important to establish value and customer onboarding and education so customers feel that the purchase is worth the cost. So, prepare a new price plan or gift offers which would make your product attractive and wait to see the results! 
> - **Better Customer Experience:**<br>
Finally, if a customer's experience connecting with youisn't positive, they may be likely to churn. Customers want to feel welcomed and valued by communities they support, so focusing for a better customer experience will help you avoid churn!
> - **Proactiveness**<br>
Not always, you should solve's one problem when they ask you. As a company, its your responsibility to be in touch with your client proactively, which would make them feel valuable and finally make them loyal towards your company. 
> - **Tracking KPIs of customer success executives**<br>
Its very important to track and analyze the performance of customer success executives such as 'first call resolution', 'csat score', 'average time of resolution'. Keeping a track of such metrics overall and individual wise will keep you updated, and in case of any anomalies, take certain actions which will always maintain your customer service at par. 