# Introduction

<p>This project deals with customer attrition or churn, i.e. the retention of customers. Relevant data about the consumers have been obtained. The goal is to model the dataset using the various parameters provided into groups of customers that are likely going to stop being customers or not. When the attribute<code>churn</code> value is 1, then the customer is predicted to leave and vice versa.</p>
<p>The classification is done by logistic regression. The sigmoid function is being used as the activation function in this analysis.</p>

# Lemma
<p>The relevant attributes include <code>employ, equip, ed, callcard, wireless</code> and will give an accurate model.
Below is the executable code for the logistic regression. The solver that is being used is <code>liblinear</code>. To change the model, change <code>churn_param</code></p>

In [1]:
#Initial encoding of the files and obtaining the data

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.model_selection  import train_test_split
from sklearn import preprocessing
import matplotlib.pyplot as plt
%matplotlib inline


df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%203/data/ChurnData.csv")
display(df)



Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,longmon,...,pager,internet,callwait,confer,ebill,loglong,logtoll,lninc,custcat,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,4.40,...,1.0,0.0,1.0,1.0,0.0,1.482,3.033,4.913,4.0,1.0
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,9.45,...,0.0,0.0,0.0,0.0,0.0,2.246,3.240,3.497,1.0,1.0
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,6.30,...,0.0,0.0,0.0,1.0,0.0,1.841,3.240,3.401,3.0,0.0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,6.05,...,1.0,1.0,1.0,1.0,1.0,1.800,3.807,4.331,4.0,0.0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,7.10,...,0.0,0.0,1.0,1.0,0.0,1.960,3.091,4.382,3.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,55.0,44.0,24.0,83.0,1.0,23.0,0.0,1.0,0.0,17.35,...,0.0,0.0,0.0,1.0,0.0,2.854,3.199,4.419,3.0,0.0
196,34.0,23.0,3.0,24.0,1.0,7.0,0.0,1.0,0.0,6.00,...,0.0,0.0,1.0,1.0,0.0,1.792,3.332,3.178,3.0,0.0
197,6.0,32.0,10.0,47.0,1.0,10.0,0.0,1.0,0.0,3.85,...,0.0,0.0,1.0,1.0,0.0,1.348,3.168,3.850,3.0,0.0
198,24.0,30.0,0.0,25.0,4.0,5.0,0.0,1.0,1.0,8.70,...,1.0,1.0,1.0,1.0,1.0,2.163,3.866,3.219,4.0,1.0


In [2]:
churn_param = df[["employ","equip","ed","callcard","wireless"]]
x = np.asanyarray(churn_param)
y = np.asanyarray(df['churn'])

x = preprocessing.StandardScaler().fit(x).transform(x)  #(x - mean)/ variance

xtrain, xtest, ytrain, ytest = train_test_split(x, y, random_state = 4, test_size = 0.2)

LR = LogisticRegression(C=0.01, solver='liblinear').fit(xtrain,ytrain) #C determines regularisation: checking errors in an algorithm.
LR #.01 is the industry standard.
yhat = LR.predict(xtest)


In [3]:
cm = confusion_matrix(yhat, ytest)
print(cm)
print("Accuracy =", (cm[0][0] + cm[1][1])/40)

[[18  9]
 [ 7  6]]
Accuracy = 0.6


# Inference

These parameters do not give an accurate model at all.

# Model 2

<p>The age and tenure of the customer are crucial. Three other parameters are chosen to improve accuracy. Through trial and error,<code>address, callcard</code> and <code>internet</code> were found to bolster the accuracy. </p>

In [4]:
churn_param = df[["address","tenure","age","callcard","internet"]]
x = np.asanyarray(churn_param)
y = np.asanyarray(df['churn'])

x = preprocessing.StandardScaler().fit(x).transform(x)  #(x - mean)/ variance

xtrain, xtest, ytrain, ytest = train_test_split(x, y, random_state = 4, test_size = 0.2)

LR = LogisticRegression(C=0.01, solver='liblinear').fit(xtrain,ytrain) #C determines regularisation: checking errors in an algorithm.
LR #.01 is the industry standard.
yhat = LR.predict(xtest)

cm = confusion_matrix(yhat, ytest)
print(cm)
print("Accuracy =", (cm[0][0] + cm[1][1])/40)

[[24  7]
 [ 1  8]]
Accuracy = 0.8


<p>The accuracy has been significantly bumped up.</p>