# <Center>Predicting Customer Churn for Telecommunication company</Center>

We will create a model for a telecommunication company, to predict when its customers will leave for a competitor, so that they can take some action to retain the customers.

<h1>Table of contents</h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#ref1">Logistic Regression</a></li>
        <li><a href="#ref2">About the dataset</a></li>
        <li><a href="#ref3">Import Libraries</a></li>
        <li><a href="#ref4">Read Telco Churn data</a></li>
        <li><a href="#ref5">Data Selection for modelling</a></li>
            <ol>
                <li><a href="#ref51">Feature Set</a></li>
            </ol>
        <li><a href="#ref6">Normalize the dataset</a></li>
        <li><a href="#ref7">Train/Test dataset</a></li>
        <li><a href="#ref8">Modeling</a></li>
        <li><a href="#ref9">Prediction</a></li>
        <li><a href="#ref10">Evaluation</a></li>
            <ol>
                <li><a href="#ref101">Jaccard index</a></li>
                <li><a href="#ref102">Log loss</a></li>
            </ol>
    </ol>
</div>
<br>
<hr>

<a id="ref1"></a>
# 1.Logistic Regression

Logistic Regression is a variation of Linear Regression, useful when the observed dependent variable, y, is categorical.



<a id="ref2"></a>
# 2. About the dataset
We will use a telecommunications dataset for predicting customer churn. This is a historical customer dataset where each row represents one customer. Typically it is less expensive to keep customers than acquire new ones, so the focus of this analysis is to predict the customers who will stay with the company. This data set provides information to predict what behavior will help telecom company to retain customers.

The dataset includes information about:

<ul>
    <li>Customers who left within the last month – the column is called <b>Churn</b>
<li>Services that each customer has signed up for –<b> phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies</b>
<li>Customer account information –<b>how long they had been a customer, contract, payment method, paperless billing, monthly charges, and total charges</b>
<li>Demographic info about customers –<b> gender, age range, and if they have partners and dependents</b>
</ul>


<a id="ref3"></a>
# 3 Import Libraries

<ul>
<li>import pandas as pd
<li>import numpy as np
<li>from sklearn import preprocessing
<li>from sklearn.model_selection import train_test_split
<li>from sklearn.linear_model import LogisticRegression
<li>from sklearn.metrics import jaccard_similarity_score
<li>from sklearn.metrics import log_loss
</ul>

<a id="ref4"></a>
# 4. Read Telco Churn data 
Telco Churn is a hypothetical data file that concerns a telecommunications company's efforts to reduce turnover in its customer base. Each case corresponds to a separate customer and it records various demographic and service usage information.

To download the data, we will use `!wget` to download it from IBM Object Storage.

!wget -O ChurnData.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/ChurnData.csv

**Import pandas and numpy libraries**

In [1]:
import pandas as pd
import numpy as np

**Read Data**

In [2]:
df_churn=pd.read_csv(r"C:\Users\user\Desktop\Coursera\Module 8- Machine Learning with Python\Data\ChurnData.csv")
print(df_churn.shape)
print(df_churn.columns)
df_churn.head()

(200, 28)
Index(['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip',
       'callcard', 'wireless', 'longmon', 'tollmon', 'equipmon', 'cardmon',
       'wiremon', 'longten', 'tollten', 'cardten', 'voice', 'pager',
       'internet', 'callwait', 'confer', 'ebill', 'loglong', 'logtoll',
       'lninc', 'custcat', 'churn'],
      dtype='object')


Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,longmon,...,pager,internet,callwait,confer,ebill,loglong,logtoll,lninc,custcat,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,4.4,...,1.0,0.0,1.0,1.0,0.0,1.482,3.033,4.913,4.0,1.0
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,9.45,...,0.0,0.0,0.0,0.0,0.0,2.246,3.24,3.497,1.0,1.0
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,6.3,...,0.0,0.0,0.0,1.0,0.0,1.841,3.24,3.401,3.0,0.0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,6.05,...,1.0,1.0,1.0,1.0,1.0,1.8,3.807,4.331,4.0,0.0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,7.1,...,0.0,0.0,1.0,1.0,0.0,1.96,3.091,4.382,3.0,0.0


<a id="ref5"></a>

# 5. Data Selection for modelling

Lets select some features for the modeling. Also we change the target data type to be integer, as it is a requirement by the scikit-learn algorithm:

In [3]:
df_churn2= df_churn[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip',   'callcard', 'wireless','churn']]
df_churn2["churn"].astype("int")
print(df_churn2.shape)
print(df_churn2.columns)
df_churn2.head()

(200, 10)
Index(['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip',
       'callcard', 'wireless', 'churn'],
      dtype='object')


Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,1.0
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,1.0
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,0.0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,0.0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,0.0


<a id="ref51"></a>
### 5.1 Feature Set(X and y)

Lets define X, and y for our dataset:

X as the Feature Matrix (data of df_churn2)<br>
y as the response vector (target)

In [4]:
X=np.asanyarray(df_churn2[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip',   'callcard', 'wireless']])
X[0:5]

array([[ 11.,  33.,   7., 136.,   5.,   5.,   0.,   1.,   1.],
       [ 33.,  33.,  12.,  33.,   2.,   0.,   0.,   0.,   0.],
       [ 23.,  30.,   9.,  30.,   1.,   2.,   0.,   0.,   0.],
       [ 38.,  35.,   5.,  76.,   2.,  10.,   1.,   1.,   1.],
       [  7.,  35.,  14.,  80.,   2.,  15.,   0.,   1.,   0.]])

In [5]:
y=np.asanyarray(df_churn2["churn"])
y[0:5]

array([1., 1., 0., 0., 0.])

<a id="ref6"></a>
# 6. Preprocessing & Normalization:

In [6]:
from sklearn import preprocessing

In [7]:
X = preprocessing.StandardScaler().fit(X).transform(X)
X[0:5]

array([[-1.13518441, -0.62595491, -0.4588971 ,  0.4751423 ,  1.6961288 ,
        -0.58477841, -0.85972695,  0.64686916,  1.56469673],
       [-0.11604313, -0.62595491,  0.03454064, -0.32886061, -0.6433592 ,
        -1.14437497, -0.85972695, -1.54590766, -0.63910148],
       [-0.57928917, -0.85594447, -0.261522  , -0.35227817, -1.42318853,
        -0.92053635, -0.85972695, -1.54590766, -0.63910148],
       [ 0.11557989, -0.47262854, -0.65627219,  0.00679109, -0.6433592 ,
        -0.02518185,  1.16316   ,  0.64686916,  1.56469673],
       [-1.32048283, -0.47262854,  0.23191574,  0.03801451, -0.6433592 ,
         0.53441472, -0.85972695,  0.64686916, -0.63910148]])

<a id="ref7"></a>

# 7. Train/Test dataset

Train/Test Split involves splitting the dataset into training and testing sets respectively. After that we train with the training set and test with the testing set. 
This will provide a more accurate evaluation on out-of-sample accuracy because the testing dataset is not part of the dataset that have been used to train the data. It is more realistic for real world problems.

This means that we know the outcome of each data point in this dataset, making it great to test with! And since this data has not been used to train the model, the model has no knowledge of the outcome of these data points. So, in essence, it is truly an out-of-sample testing.

Let's import train_test_split from sklearn.model_selection.

Now train_test_split will return 4 different parameters. We will name them: X_train, X_test, y_train, y_test

The train_test_split will need the parameters: X, y, test_size=0.2, and random_state=4.

The X and y are the arrays required before the split, the test_size represents the ratio of the testing dataset, and the random_state ensures that we obtain the same splits.

In [8]:
from sklearn.model_selection import train_test_split

In [9]:
X_train,X_test,y_train, y_test=train_test_split(X,y, test_size=0.2,random_state=4)

print("Train data:", X_train.shape,y_train.shape)
print("Test data:", X_test.shape,y_test.shape)

Train data: (160, 9) (160,)
Test data: (40, 9) (40,)


<a id="ref8"></a>
# 8. Modeling

Lets build our model using __LogisticRegression__ from Scikit-learn package. This function implements logistic regression and can use different numerical optimizers to find parameters, including **‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’ solvers**.

The version of Logistic Regression in Scikit-learn, support regularization. Regularization is a technique used to solve the overfitting problem in machine learning models.
"C" parameter indicates  __inverse of regularization strength__ which must be a positive float. Smaller values specify stronger regularization. 

Now lets fit our model with train set:

In [10]:
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression(C=0.01, solver='liblinear').fit(X_train,y_train)
LR

LogisticRegression(C=0.01, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='liblinear',
          tol=0.0001, verbose=0, warm_start=False)

<a id="ref9"></a>
# 9. Prediction

**Now we can predict using our test set:**

In [11]:
yhat = LR.predict(X_test)
yhat

array([0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1.,
       1., 1., 0., 0., 1., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 1., 0.,
       0., 0., 1., 0., 0., 1.])

__predict_proba__  returns estimates for all classes, ordered by the label of classes. So, the first column is the probability of class 1, P(Y=1|X), and second column is probability of class 0, P(Y=0|X):

In [12]:
yhat_prob = LR.predict_proba(X_test)
yhat_prob[0:5]

array([[0.58711718, 0.41288282],
       [0.56650898, 0.43349102],
       [0.5313329 , 0.4686671 ],
       [0.66722528, 0.33277472],
       [0.53481231, 0.46518769]])

<a id="ref10"></a>

# 10. Evaluation

To calculate the accuracy of predicted output vs actual data we will use :
1. Jaccard Index
2. Confusion Metrix
3. log loss

<a id="ref101"></a>
### 10.1 Jaccard index

Lets try jaccard index for accuracy evaluation. we can define jaccard as the size of the intersection divided by the size of the union of two label sets. If the entire set of predicted labels for a sample strictly match with the true set of labels, then the subset accuracy is 1.0; otherwise it is 0.0.

In multilabel classification, __accuracy classification score__ is a function that computes subset accuracy. This function is equal to the jaccard_similarity_score function. Essentially, it calculates how closely the actual labels and predicted labels are matched in the test set.

If the entire set of predicted labels for a sample strictly matches with the true set of labels, then the subset accuracy is 1.0; otherwise it is 0.0. 

In [13]:
from sklearn.metrics import jaccard_similarity_score
jaccard_similarity_score(y_test, yhat)

0.65

<a id="ref102"></a>
### 10.2 log loss

Now, lets try __log loss__ for evaluation. In logistic regression, the output can be the probability of customer churn is yes (or equals to 1). This probability is a value between 0 and 1.
Log loss( Logarithmic loss) measures the performance of a classifier where the predicted output is a probability value between 0 and 1. The classifier with lower log loss has better accuracy. 

In [14]:
from sklearn.metrics import log_loss
log_loss(y_test, yhat_prob)

0.6155809757244557