### Aim/goal of project
The implementation of a predictive model that can precisely identify consumers who are likely to churn(migrate) to a different service provider is the main objective or aim of telecom churn prediction. As a result, telecom businesses can take proactive steps to keep consumers and lower the turnover rate as a whole. Telecom firms can increase customer happiness and loyalty by developing focused retention strategies with the help of customer churn prediction as a way to gather insightful information about the factors that affect customer behaviour. Overall, telecom churn prediction aims to assist businesses in lowering customer turnover and enhancing operational efficiency.

### Visit the link for churn prediction: http://armj.asambhav.org.in:5000/

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv(r'churn_clean.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,...,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,tenure_group
0,0,Female,No,Yes,No,No,No phone service,DSL,No,Yes,...,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,0,1 year
1,1,Male,No,No,No,Yes,No,DSL,Yes,No,...,No,No,No,One year,No,Mailed check,56.95,1889.5,0,3 year
2,2,Male,No,No,No,Yes,No,DSL,Yes,Yes,...,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,1,1 year
3,3,Male,No,No,No,No,No phone service,DSL,Yes,No,...,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,0,4 year
4,4,Female,No,No,No,Yes,No,Fiber optic,No,No,...,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,1,1 year


In [3]:
df =  df.drop(df[df['Unnamed: 0'] == 488].index, axis=0)


In [4]:
df.drop('Unnamed: 0',axis=1,inplace=True)

In [5]:
df.head()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,tenure_group
0,Female,No,Yes,No,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,0,1 year
1,Male,No,No,No,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,0,3 year
2,Male,No,No,No,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,1,1 year
3,Male,No,No,No,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,0,4 year
4,Female,No,No,No,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,1,1 year


In [6]:
# upsampling

from sklearn.utils import resample

#create two different dataframe of majority and minority class 
df_majority = df[(df['Churn']==0)] 
df_minority = df[(df['Churn']==1)] 

# upsample minority class
df_minority_upsampled = resample(df_minority, replace=True, n_samples= 5174,random_state=42)

# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_minority_upsampled, df_majority])

In [7]:
df_upsampled.head()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,tenure_group
4290,Female,No,Yes,Yes,No,No phone service,DSL,No,No,No,Yes,Yes,No,Month-to-month,No,Electronic check,40.1,40.1,1,1 year
5547,Female,No,No,No,Yes,No,Fiber optic,No,No,No,No,Yes,Yes,Month-to-month,Yes,Electronic check,89.45,240.45,1,1 year
3302,Male,No,No,No,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,One year,Yes,Electronic check,103.45,3066.45,1,3 year
4949,Male,No,No,No,No,No phone service,DSL,No,Yes,No,No,Yes,Yes,Month-to-month,Yes,Mailed check,51.0,305.95,1,1 year
4307,Female,Yes,No,No,Yes,Yes,Fiber optic,No,No,No,No,Yes,Yes,Month-to-month,Yes,Electronic check,96.55,3580.3,1,4 year


In [8]:
from sklearn.preprocessing import LabelEncoder

#create instance of label encoder
lab = LabelEncoder()

#perform label encoding on 'team' column

df1 = df_upsampled.select_dtypes(object) 
for i,col in enumerate(df1.columns):
    df_upsampled[col] = lab.fit_transform(df_upsampled[col])

In [9]:
df_upsampled.head()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,tenure_group
4290,0,0,1,1,0,1,0,0,0,0,2,2,0,0,0,2,40.1,40.1,1,0
5547,0,0,0,0,1,0,1,0,0,0,0,2,2,0,1,2,89.45,240.45,1,0
3302,1,0,0,0,1,2,1,0,2,2,0,2,2,1,1,2,103.45,3066.45,1,2
4949,1,0,0,0,0,1,0,0,2,0,0,2,2,0,1,3,51.0,305.95,1,0
4307,0,1,0,0,1,2,1,0,0,0,0,2,2,0,1,2,96.55,3580.3,1,3


In [10]:
y = df_upsampled['Churn']
df_upsampled.drop(['Churn'],axis = 1 , inplace = True)

In [12]:
# random forest

from sklearn.ensemble import RandomForestClassifier 
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import classification_report


# split
x_train, x_test, y_train, y_test = train_test_split(df_upsampled, y, test_size =0.1)

# fit
classifier= RandomForestClassifier(n_estimators= 90, criterion="gini")  
classifier.fit(x_train, y_train)  

# predict
y_pred_rf= classifier.predict(x_test)

print(metrics.confusion_matrix(y_test , y_pred_rf))
# report
print(classification_report(y_test, y_pred_rf, labels=[0,1]))

[[447  66]
 [ 15 507]]
              precision    recall  f1-score   support

           0       0.97      0.87      0.92       513
           1       0.88      0.97      0.93       522

    accuracy                           0.92      1035
   macro avg       0.93      0.92      0.92      1035
weighted avg       0.93      0.92      0.92      1035



In [13]:
from joblib import dump
dump(classifier, 'random_forest.joblib')

['random_forest.joblib']