In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(15,12))
plt.imshow(plt.imread("../input/customer1/customer1.png"))

## Content
1. [Introduction section](#1)
2. [Exploratory Data Analysis and Data Preprocessing](#2)
3. [Data Visualization](#3)
4. [Data Preprocessing](#4) 
4. [Training Machine Learning Models and Model Performance Evaluation](#5)


<a id="1"></a> <br>
# 1. Introduction

<font color="blue">
    
Customer churn is defined as when customers or subscribers discontinue doing business with a firm or service.

Customers in the telecom industry can choose from a variety of service providers and actively switch from one to the next. The telecommunications business has an annual churn rate of 15-25 percent in this highly competitive market.

Individualized customer retention is tough because most firms have a large number of customers and can't afford to devote much time to each of them. The costs would be too great, outweighing the additional revenue. However, if a corporation could forecast which customers are likely to leave ahead of time, it could focus customer retention efforts only on these "high risk" clients. The ultimate goal is to expand its coverage area and retrieve more customers loyalty. The core to succeed in this market lies in the customer itself.

Customer churn is a critical metric because it is much less expensive to retain existing customers than it is to acquire new customers.


To reduce customer churn, telecom companies need to predict which customers are at high risk of churn.

To detect early signs of potential churn, one must first develop a holistic view of the customers and their interactions across numerous channels, including store/branch visits, product purchase histories, customer service calls, Web-based transactions, and social media interactions, to mention a few.

As a result, by addressing churn, these businesses may not only preserve their market position, but also grow and thrive. More customers they have in their network, the lower the cost of initiation and the larger the profit. As a result, the company's key focus for success is reducing client attrition and implementing effective retention strategy.

<iframe src="https://www.kaggle.com/embed/bhartiprasad17/customer-churn-prediction?cellIds=1&kernelSessionId=67014939" height="300" style="margin: 0 auto; width: 100%; max-width: 950px;" frameborder="0" scrolling="auto" title="CUSTOMER CHURN PREDICTION 📈"></iframe>

In [None]:
import seaborn as sns
import plotly.offline as py
import plotly.express as px
import plotly.graph_objects as go
import cufflinks as cf
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

## 2. Exploratory Data Analysis and Data Preprocessing

In [None]:
df = pd.read_csv("../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")
df.head()

In [None]:
df.info()
# There are only 3 columns with numerical values

<font color="blue">
Columns "TotalCharges" seems string although it has numerical, I will change data type from object into float64

In [None]:
df["TotalCharges"] = pd.to_numeric(df["TotalCharges"],errors = 'coerce')
df.info()

<font color="blue">
We need to transform non numerical feature into numerical in order to explore data and make them ready before Machine Learning Algorithm:

In [None]:
df.select_dtypes("object").columns

In [None]:
#Here I select the nonnumerical columns and print out 
for i in df.select_dtypes("object").columns:
    print(f"Column {i} has these type of data: {df[i].nunique()}")
    print("***************************************************")

<font color="blue">
Except from customerID column, all of the other columns can be convertible into numerical values because they have 2.+, 3 or 4 different values

In [None]:
#Lets drop customerID column because this column will not affect our target column
df.drop("customerID",axis=1, inplace=True)

<font color="blue">
Here we use pd.get_dummies() function or order to transform categorical columns into dummy numbers


In [None]:
# I will exclude the target column before transforming into numerical dommy varieables
categorical_features = ['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines',
       'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
       'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
       'PaperlessBilling', 'PaymentMethod']
categorical_features


In [None]:
df_latest=pd.get_dummies(data=df,columns=categorical_features,drop_first=True)
df_latest

In [None]:
#I will also transform the target columns "Churns" into numerical column in order to show statistical relation between target and features
df_latest2 = pd.get_dummies(data=df,columns=df.select_dtypes("object").columns,drop_first=False)
df_latest2.head()

In [None]:
#Lets get overall statistical information on our features, I will use decribe() function and transpose the result by using transpose() function
df_latest2.describe(include="all").transpose()

In [None]:
df_latest2.corr()[["Churn_Yes","Churn_No"]].sort_values(by="Churn_Yes",ascending=False) 
#Here we can see the correlations between features and churning

<font color="red">
According to these values above and in the figure below,people will be tend to churn if 
    
    1. their contract type is month to month 
    2. there is no online security
    3. there is no technical support to the customer
    4. internet service is fiber optic
    5. payment method is electronic check
    6. there is no online backup
    7. there is no device protection
    8. there is monthly charges
    9. there is paperless billing

<font color="green">
According to these valuues,people will not churn if 
    
    1. there is higher tenure 
    2. there is two years period contract
    3. there is no internet service
    4. there is no streaming TV

In [None]:
#Lets visualize overall correlations between all columns with each other
plt.figure(figsize=(25,20))
mask = np.zeros_like(df_latest2.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
sns.heatmap(df_latest2.corr(),cmap="jet",annot=True,linewidths=0, linecolor='white',cbar=True,mask=mask)

## 3. Data Visualization:

In [None]:
labels = 'Churn', 'Retained'
sizes = [df_latest.Churn[df_latest["Churn"]=="Yes"].count(), df_latest.Churn[df_latest["Churn"]=="No"].count()]
explode = (0, 0.1)
fig1, ax1 = plt.subplots(figsize=(10, 8))
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')
plt.title("Proportion of customer churned and retained", size = 20)
plt.show()

<font color="green">
So about 26.5% of the customers have churned while  73.5% have retained. So the baseline model could be to predict that 26.5% of the customers will churn. This means that we have unbalanced target which can affect the performance of the machine learning algorithm and its predictions negatively if we do not deal with this issue.

<font color="red">
3.1. Contract Type - Churn Relationship:

In [None]:
plt.figure(figsize=(15,10))
sns.countplot(x="Contract",hue="Churn", data=df, palette="Set1")
plt.title("The Counts of Churns by Contract Type")
plt.legend()
#There was a positive correlation between Churning and Month-to-Month Contract Type, we can see the proof of this relationship below:

<font color="green">
Almost %40 of month to month contract have churned according to the table above.The percentage is very low when it comes to customers with One Year Contract and and customers with Two Year Contract. Therefore, this firm should focus more on the customers month ot month customer and make campaigns to retain them.


In [None]:
df["Contract"].value_counts()# There are a lot customers with this contract type in this organization that needs to be addresses

<font color="red">
3.2. Gender- Churn Relationship:

In [None]:
print(df["Churn"][df["Churn"]=="No"].groupby(by=df["gender"]).count())
print("***************************************")
print(df["Churn"][df["Churn"]=="Yes"].groupby(by=df["gender"]).count())

In [None]:
plt.figure(figsize=(6, 6))
labels =["Churn: Yes","Churn:No"]
values = [1869,5163]
labels_gender = ["F","M","F","M"]
sizes_gender = [939,930 , 2544,2619]
colors = ['#ff6666', '#66b3ff']
colors_gender = ['#c2c2f0','#ffb3e6', '#c2c2f0','#ffb3e6']
explode = (0.3,0.3) 
explode_gender = (0.1,0.1,0.1,0.1)
textprops = {"fontsize":15}
#Plot
plt.pie(values, labels=labels,autopct='%1.1f%%',pctdistance=1.08, labeldistance=0.8,colors=colors, startangle=90,frame=True, explode=explode,radius=10, textprops =textprops, counterclock = True, )
plt.pie(sizes_gender,labels=labels_gender,colors=colors_gender,startangle=90, explode=explode_gender,radius=7, textprops =textprops, counterclock = True, )
#Draw circle
centre_circle = plt.Circle((0,0),5,color='black', fc='white',linewidth=0)
fig = plt.gcf()
fig.gca().add_artist(centre_circle)

plt.title('Churn Distribution vs Gender: Male(M), Female(F)', fontsize=15, y=1.1)

# show plot 
 
plt.axis('equal')
plt.tight_layout()
plt.show()

<font color="blue">
There is no positive or negative correlation between gender and Churn. Both genders behaved in similar fashion when it comes to migrating to another service provider/firm.Therefore,we can say that there is no need to make a special focus on gender type in order retain customers.

<font color="red">
3.3. Payment - Churn Relationship:

In [None]:
labels = df['PaymentMethod'].unique()
values = df['PaymentMethod'].value_counts()

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])
fig.update_layout(title_text="<b>Payment Method Distribution</b>")
fig.show()


In [None]:
fig = px.histogram(df, x="Churn", color="PaymentMethod", title="<b>Customer Payment Method distribution vs Churn</b>")
fig.update_layout(width=900, height=600, bargap=0.1)
fig.show()

<font color="blue">
I have already found that people tend to churn if payment method is electronic ckeck when ckecking correlarion in the previous section.This figure demonstrates that people have higher rate of churn if payment method is electronic check. Therefore, company should deal with more with electronic check payment method and make some precautions.

<font color="red">
3.4. Internet Service - Churn Relationship:

In [None]:
labels = df['InternetService'].unique()
values = df['InternetService'].value_counts()

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])
fig.update_layout(title_text="<b>Internet Service Types Distribution</b>")
fig.show()


In [None]:
fig = px.histogram(df, x="Churn", color="InternetService", title="<b>Internet Service Types distribution vs Churn</b>")
fig.update_layout(width=900, height=600, bargap=0.1)
fig.show()

<font color="blue">
I have already found that people with fiber optic internet service tend to churn. This figure complies with this result. People with no internet service are stable and retain in the company. I think people with faster internet service like fiber optic can reach internet faster and can find find other options that can be better than this company. Hence, this company should make some ads and campaigns in order to attract customers with fiber optic internet service.

<font color="red">
3.5. Dependents - Churn Relationship:

In [None]:
labels = df['Dependents'].unique()
values = df['Dependents'].value_counts()

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])
fig.update_layout(title_text="<b>Dependents Distribution</b>")
fig.show()

In [None]:
color_map = {"Yes": "green", "No": "red"}
fig = px.histogram(df, x="Churn", color="Dependents",title="<b>Dependents distribution vs Churn</b>", color_discrete_map=color_map)
fig.update_layout(width=900, height=600, bargap=0.1)
fig.show()

<font color="green">
The figure above points out that customers without dependents are more likely to churn from the company.

<font color="red">
3.6. Partners - Churn Relationship:

In [None]:
color_map = {"Yes": "orange", "No": "red"}
fig = px.histogram(df, x="Churn", color="Partner",title="<b>Partner distribution vs Churn</b>", color_discrete_map=color_map)
fig.update_layout(width=900, height=600, bargap=0.1)
fig.show()

<font color="green">
The figure shows that customers without partners are more likely to churn. In this sense, this company may prioritize to organize campaigns and ads towards customers without partners in order to retain them in the company.

<font color="red">
3.7. Senior Citizen - Churn Relationship:

In [None]:
color_map = {"Yes": "green", "No": "blue"}
fig = px.histogram(df, x="Churn", color="SeniorCitizen",title="<b>Senior Citizen vs Churn</b>", color_discrete_map=color_map)
fig.update_layout(width=900, height=600, bargap=0.1)
fig.show()

<font color="green">
The figure shows that most of the senior citizens churn, thus senior citizen have very high rate of churn that the company should take of seriously so as to retain them.

<font color="red">
3.8. Online Security - Churn Relationship:

In [None]:
color_map = {"Yes": "purple", "No": "yellow"}
fig = px.histogram(df, x="Churn", color="OnlineSecurity",title="<b>Online Security vs Churn</b>", color_discrete_map=color_map)
fig.update_layout(width=900, height=600, bargap=0.1)
fig.show()

<font color="green">
The figure shows that customers tend to churn if there is no online internet security. Therefore, the more company increase online security in its services, the higher it will retain its customers.

<font color="red">
3.9. Paperless Billing - Churn Relationship:

In [None]:
color_map = {"Yes": "maroon", "No": "aqua"}
fig = px.histogram(df, x="Churn", color="PaperlessBilling",title="<b>Paperless Billing vs Churn</b>", color_discrete_map=color_map)
fig.update_layout(width=900, height=600, bargap=0.1)
fig.show()

<font color="green">
The figure shows that paperless billing is risky for the company, thus it will be a good strategy to send bills in paper form to retain its customers instead of paperless and digital options.

<font color="red">
3.10. Technical Support - Churn Relationship:

In [None]:
color_map = {"Yes": "beige", "No": "brown"}
fig = px.histogram(df, x="Churn", color="TechSupport",title="<b>Technical Support vs Churn</b>", color_discrete_map=color_map)
fig.update_layout(width=900, height=600, bargap=0.1)
fig.show()

<font color="green">
The figure shows that customers without enough technical support will migrate to the other companies. Therefore, we can say that providing enough technical support to the customers is very important for customer retention.

<font color="red">
3.11. Total Charces - Churn Relationship:

In [None]:
sns.set_style("darkgrid")
sns.kdeplot(x="TotalCharges",data=df,palette="Set1",hue="Churn",shade=True)
sns.set(rc={'figure.figsize':(20,12)})
plt.ylabel('Density');
plt.xlabel('Total Charges');
plt.title('Distribution of total charges by churn');

<font color="green">
Both churn customers and retained customer have similar distribution with regard to total charges.

<font color="red">
3.12. Tenure - Churn Relationship:

In [None]:
fig = px.box(df, x='Churn', y = 'tenure')

# Update yaxis properties
fig.update_yaxes(title_text='Tenure (Months)', row=1, col=1)
# Update xaxis properties
fig.update_xaxes(title_text='Churn', row=1, col=1)

# Update size and title
fig.update_layout(autosize=True, width=900, height=700,
    title_font=dict(size=25, family='Courier'),
    title='<b>Tenure vs Churn</b>',
)

fig.show()

<font color="green">
The figure shows that new customers tend to churn more than old customers. Therefore, new customers are risky group in terms of churn and needs more campaigns and attraction in order to retain them as customers in the future.

## 4. Data Preprocessing

In [None]:
df_latest.isnull().sum()

In [None]:
df_latest["TotalCharges"].fillna(df_latest["TotalCharges"].mean(),inplace=True)
df_latest.isnull().sum() 
# We fill all the missing values with the mean of the column concerned

<font color="green">
We will use df_latest because we transformed all of the values into numerical values,but as we can see above TotalCharges column has some 11 missing values that we have to fill or drop before applying the algorithm

In [None]:
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()
df_latest["Churn"]=encoder.fit_transform(df_latest["Churn"])
df_latest.head()

<font color="green">
We also need to standardize some of columns(MonthlyCharges, TotalCharges, tenure) in order to make all columns make impact similar on the ML algorithm:

<font color="green">
Now our non standartized columns(3 columns) have been transformed 

In [None]:
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
df_latest[["MonthlyCharges","TotalCharges","tenure"]]= ss.fit_transform(df_latest[["MonthlyCharges","TotalCharges","tenure"]])
df_latest.head(3)

In [None]:
# We drop target column from data, so rest of the columns become features automatically
y = df_latest["Churn"] # represents the target column
X = df_latest.drop("Churn",axis=1) # X represents all the features

In [None]:
#Now we will split data into train set and some for test set in order to measure performance of the algorithms
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.05,random_state = 42)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

## 5. Training Machine Learning Models and Model Performance Evaluation

In [None]:
plt.figure(figsize=(15,12))
plt.imshow(plt.imread("../input/customer2/customer2.jpg"))

<font color="red">
5.1. Ensemble Learning 1 - Random Forest Classifier:

In [None]:
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier()

<font color="green">
We will make random grid search in order to find out best hyperparameter for random forest model

In [None]:
#Choosing best hyperparameters:
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 1200, num = 12)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(5, 30, num = 6)]
# max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10, 15, 100]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 5, 10]

In [None]:
# Create the random grid for these hyperparameters:
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf}

print(random_grid)

In [None]:
from sklearn.model_selection import RandomizedSearchCV
random_grid = RandomizedSearchCV(estimator = forest, param_distributions = random_grid, n_iter = 10, cv = 5, verbose=2, random_state=42, n_jobs = 1)

In [None]:
random_grid.fit(X_train,y_train)

In [None]:
print(random_grid.best_params_)
print(random_grid.best_score_)

In [None]:
predictions = random_grid.predict(X_test)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix,accuracy_score
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

<font color="red">
5.2. K Neighbors Classifier:

In [None]:
from sklearn.neighbors import KNeighborsClassifier

<font color="green">
Instead of using different k_neigbors values which will be time consuming, we can use a for loop in order to choose the best k.

In [None]:
error_rate=list()
#here we iterate meny different k values and plot their error rates 
#and discover which one is better than others and has the lowest error rate
for i in range(1,40):
    knn=KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train,y_train)
    prediction_i=knn.predict(X_test)
    error_rate.append(np.mean(prediction_i != y_test))

In [None]:
# Now we will plot the prediction error rates of different k values
plt.figure(figsize=(15,10))
plt.plot(range(1,40),error_rate, color="blue", linestyle="--",marker="o",markerfacecolor="red",markersize=10)
plt.title("Error Rate vs K Value")
plt.xlabel="K Value"
plt.ylabel("Error Rate")

<font color="green">
As we can see in the figure above, k=35 gives the least error rate,so we will use it for better predictions

In [None]:
knn=KNeighborsClassifier(n_neighbors=35)
knn.fit(X_train, y_train)
predictions=knn.predict(X_test)

In [None]:
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))
# we have almost the same preformance as we have in random forest

<font color="red">
5.3. Decision Tree:

In [None]:
from sklearn.tree import DecisionTreeClassifier
dtree=DecisionTreeClassifier()
dtree.fit(X_train,y_train)

<font color="green">
We do not wait a good performance from decision tree, but we can at least point out import features and which features have impact in formation of decision tree.

In [None]:
from sklearn import tree
print(tree.export_text(dtree))

In [None]:
predictions = dtree.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

<font color="green">
As expected, the performance of decision tree is worse, but we understand from tree fromation that Feature 1(Tenure), Feature 10(InternetService_Fiber optic), Feature 3(Total Charges), Feature 20(Technical Support_Yes) and Feature 25(Contract_Two year) play very decisive role,so they are the most import features for churn or retention from the company.These insights comply with the our finding in the previous sections.

<font color="red">
5.4. Support Vector Machines:

<font color="green">
We will make a grid search in order to find out best hyper parameters to increse accuracy:

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
param_grid={"C":[1,2,3,4,5,10,100],"gamma":[1,0.1,0.2,0.5,0.01,0.001,0.0001]} 
#here we select values for grid search to try
grid=GridSearchCV(SVC(),param_grid,verbose=2)
grid.fit(X_train,y_train)

In [None]:
print(grid.best_params_)
print(grid.best_estimator_)

In [None]:
grid_predictions=grid.predict(X_test)
print(confusion_matrix(y_test,grid_predictions))
print(classification_report(y_test, grid_predictions))
print(accuracy_score(y_test, grid_predictions))

<font color="red">
5.5. Ensemble Learning 2 - Voting Classifiers:

In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import VotingClassifier
clf1 = KNeighborsClassifier(n_neighbors=35)
clf2 = RandomForestClassifier(n_estimators= 900, min_samples_split = 5, min_samples_leaf = 5, max_depth = 10)
clf3 = AdaBoostClassifier()
eclf1 = VotingClassifier(estimators=[('knc', clf1), ('rfc', clf2), ('abc', clf3)], voting='soft')
eclf1.fit(X_train, y_train)
predictions = eclf1.predict(X_test)
print("Final Accuracy Score ")
print(confusion_matrix(y_test,grid_predictions))
print(classification_report(y_test, grid_predictions))
print(accuracy_score(y_test, grid_predictions))

In [None]:
from sklearn.linear_model import LogisticRegression
clf1 = GradientBoostingClassifier()
clf2 = LogisticRegression()
clf3 = AdaBoostClassifier()
eclf1 = VotingClassifier(estimators=[('gbc', clf1), ('lr', clf2), ('abc', clf3)], voting='soft')
eclf1.fit(X_train, y_train)
predictions = eclf1.predict(X_test)
print("Final Accuracy Score ")
print(accuracy_score(y_test, predictions))
print(confusion_matrix(y_test,grid_predictions))
print(classification_report(y_test, grid_predictions))
#This model has the highest score up til now with %82.7 accuracy

<font color="red">
5.6. Ensemble Learning 3 - Pasting and Bagging:

<font color="green">
We will use another approach to get a diverse set of classifiers that uses the same training algorithm for every predictor and train them on different random subsets of the training set. When sampling is performed with replacement, this method is called bagging (short for bootstrap aggregating ). When sampling is performed without replacement, it is called pasting.

In [None]:
from sklearn.ensemble import BaggingClassifier
pasting_clf = BaggingClassifier(
RandomForestClassifier(), n_estimators=900,
max_samples=100, bootstrap=False, n_jobs=-1)
pasting_clf.fit(X_train, y_train)
predictions = pasting_clf.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))
#We try pasting first as follows:

In [None]:
#Here we try Bagging:
from sklearn.ensemble import BaggingClassifier
bagging_clf = BaggingClassifier(
RandomForestClassifier(), n_estimators=500,
max_samples=100, bootstrap=True, n_jobs=-1)
bagging_clf.fit(X_train, y_train)
predictions = bagging_clf.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

In [None]:
# We use Support Vector Machines instead of Random Forest
bagging_clf2= BaggingClassifier(SVC(kernel='rbf',C=1, gamma= 0.1, probability=True),n_estimators=500,max_samples=100,bootstrap=True,n_jobs=-1)
bagging_clf2.fit(X_train,y_train)
predictions = bagging_clf2.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

<font color="red">
5.7. Ensemble Learning 4 - XGBoost:

In [None]:
import xgboost
from sklearn.metrics import log_loss
xgb = xgboost.XGBClassifier(learning_rate=0.1,
                                max_depth=20,
                                min_child_weight=30,
                                n_estimators=20)
xgb.fit(X_train, y_train)
eval_set=[(X_test, y_test)]
predictions = xgb.predict(X_test)
print(log_loss(y_test, predictions))
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))
#This has the second best result among different algorithms

<font color="red">
5.8. Aritifial Neural Networks:

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2
ann = Sequential()
ann.add(Dense(units = 200,activation="relu", kernel_regularizer=l2(0.001)))
ann.add(Dropout(0.2))
ann.add(Dense(units = 100, activation="relu",kernel_regularizer=l2(0.001)))
ann.add(Dropout(0.2))
ann.add(Dense(units = 50, activation="relu",kernel_regularizer=l2(0.001)))
ann.add(Dense(1,activation="sigmoid")) 
ann.compile(optimizer = "adam", loss="binary_crossentropy",metrics=["accuracy"])
callback=EarlyStopping(monitor="val_loss", patience=2)
history = ann.fit(x = X_train, y= y_train, validation_data=(X_test,y_test), batch_size=16, epochs=5,callbacks=[callback])


In [None]:
sns.set_style("darkgrid")
pd.DataFrame(ann.history.history).plot(figsize=(15,10))

In [None]:
predictions = ann.predict_classes(X_test)
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

<font color="red">
Among all the model we have tried Artificial Neural Networks provide the best result with over % 83,6. The reason why we can not obtain more accuracy is that we have unbalanced data: Number of churn is lower than the retained : %26.5 versus % 73.5. With this unbalanced data, it is not good strategy to focus on getting better machine learning algorithm. Instead, we should focus on to try to train machine learning algorithms with balanced data samples.

In [None]:
labels = 'Churn', 'Retained'
sizes = [df.Churn[df["Churn"]=="Yes"].count(), df.Churn[df["Churn"]=="No"].count()]
explode = (0, 0.1)
fig1, ax1 = plt.subplots(figsize=(10, 8))
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')
plt.title("Proportion of customer churned and retained", size = 20)
plt.show()