* **Louai AL Jabi**
* **DSA150**
* **Project**

In [2]:
# Import important libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.compose import make_column_transformer

### Getting to know the data
* We want to predict whether a customer is going to switch to a different provider or not.
* We have the labled data to train our model on assigned to `churn`, and the data we want to predict assigned to churn_unlabeled.
* The first step is to understand the data and get to know which are the numerical and the chategorical values.

In [3]:
# Reading in the labeled data
churn = pd.read_csv("churn_labeled.csv")

In [4]:
# Reading in the unlabeled data
churn_unlabeled = pd.read_csv("churn_unlabeled.csv")

In [5]:
# Let's get a look at the DataFrame
churn

Unnamed: 0,Gender,SeniorCitizen,Partner,Dependents,TenureMonths,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.50,No
2,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6738,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,No,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.50,No
6739,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.90,No
6740,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
6741,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.60,Yes


In [15]:
# Getting more info on the data
churn.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6743 entries, 0 to 6742
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Gender            6743 non-null   object 
 1   SeniorCitizen     6743 non-null   int64  
 2   Partner           6743 non-null   object 
 3   Dependents        6743 non-null   object 
 4   TenureMonths      6743 non-null   int64  
 5   PhoneService      6743 non-null   object 
 6   MultipleLines     6743 non-null   object 
 7   InternetService   6743 non-null   object 
 8   OnlineSecurity    6743 non-null   object 
 9   OnlineBackup      6743 non-null   object 
 10  DeviceProtection  6743 non-null   object 
 11  TechSupport       6743 non-null   object 
 12  StreamingTV       6743 non-null   object 
 13  StreamingMovies   6743 non-null   object 
 14  Contract          6743 non-null   object 
 15  PaperlessBilling  6743 non-null   object 
 16  PaymentMethod     6743 non-null   object 


In [7]:
# We can use the describe method to get an idea on the numerical values
churn.describe()

Unnamed: 0,SeniorCitizen,TenureMonths,MonthlyCharges,TotalCharges
count,6743.0,6743.0,6743.0,6743.0
mean,0.163725,32.410945,64.822408,2281.868686
std,0.370054,24.552997,30.067492,2264.688117
min,0.0,0.0,18.25,0.0
25%,0.0,9.0,35.675,401.675
50%,0.0,29.0,70.4,1396.25
75%,0.0,55.0,89.925,3790.4
max,1.0,72.0,118.75,8684.8


* We can see the `SeniorCitizen` column is listed as an integer. So I want to investigate it 

In [8]:
# Using value_counts to invistigate if SeniorCitizen has a binery outcome
churn["SeniorCitizen"].value_counts()

0    5639
1    1104
Name: SeniorCitizen, dtype: int64

* Great! We can see that the column has a binery outcome so we will treat it as a chategorical column

In [10]:
# Getting a list of the chategorical columns
[col for col in churn.dtypes.index if churn[col].dtypes == "object"]

['Gender',
 'Partner',
 'Dependents',
 'PhoneService',
 'MultipleLines',
 'InternetService',
 'OnlineSecurity',
 'OnlineBackup',
 'DeviceProtection',
 'TechSupport',
 'StreamingTV',
 'StreamingMovies',
 'Contract',
 'PaperlessBilling',
 'PaymentMethod',
 'Churn']

In [11]:
# Getting a list of the numerical variables. However, we found out that SeniorCitizen isn't a numerical column
[col for col in churn.dtypes.index if churn[col].dtypes in ["int64","float64"]]

['SeniorCitizen', 'TenureMonths', 'MonthlyCharges', 'TotalCharges']

### Building the model
* Now, we will seperate the data into our variables(X) and the outcomes(y), and split them into training and testing data.
* Then, we will build the column transformers for each model with the chategorical values scaled with the StandardScaler, and the numerical values OneHotEncoded.
* I will be using `SVC` and `LogisticRegression` since they are the models that I am most familier with, and I spent time on researching and understanding their properties. 
* Both models require preprocessing the data so we will build the column transformers for each model with the chategorical values scaled with the StandardScaler, and the numerical values OneHotEncoded. The preprocessing will happen in the pipeline.
* **SVC** has `C` and `gamma` as its parameters. `C` is a regularization parameter that limits the importance of each training point. It applies stronger regularization the less it is. `gamma` determines how far the influence of a single training example reaches.
* **LogisticRegression** has `C` and `max_iter` as its paremeters. `C` is a regularization parameter. It applies stronger regularization the less it is. `max_iter` is the maximum number of iterations taken for the solvers to converge. The defult is to set it to `1000`.
* Then, we will build the pipeline specifying the names of the transformer and classifier. We can use one of the models and its transformer as a place holder.
* After that, we will create a range of values for the parameters of our classifiers.
* Finally, we will fit all that into our GridSearchCV folding 15 times and using both the `roc_auc` and the `accuracy` scoring methods. Also we're keeping the training score.
* The `roc_auc` scoring method keeps the Area Under the Curve(AUC) always between 0 and 1. Predicting randomly always produces an AUC of 0.50. Could be explained as the probability that a randomly-selected point of the positive class will have a higher score according to the classifier than a randomly-selected point of the negative class. I chose it because it is a better metric than simple accuracy for imbalanced classes.

In [12]:
# Assigning the features and the outcomes. Then, splitting the data into training and testing data.
X = churn.loc[:,"Gender":"TotalCharges"]
y = churn["Churn"]
X_train, X_test, y_train, y_test = train_test_split(X,y,stratify=y)

# Setting the SVC column transformer with the standard scaler to scale the chategorical values
# and onehotencoding the numerical values.
svc_ct = make_column_transformer(
    (StandardScaler(), ["TenureMonths","MonthlyCharges","TotalCharges"]),
    (OneHotEncoder(handle_unknown='ignore'), ['Gender','SeniorCitizen','Partner','Dependents','PhoneService','MultipleLines',
                                        'InternetService','OnlineSecurity','OnlineBackup','DeviceProtection',
                                        'TechSupport','StreamingTV','StreamingMovies','Contract','PaperlessBilling',
                                        'PaymentMethod']))

# Buling the column transformer for LogisticRegression
logreg_ct = make_column_transformer(
    (StandardScaler(), ["TenureMonths","MonthlyCharges","TotalCharges"]),
    (OneHotEncoder(handle_unknown='ignore'), ['Gender','SeniorCitizen','Partner','Dependents','PhoneService','MultipleLines',
                                        'InternetService','OnlineSecurity','OnlineBackup','DeviceProtection',
                                        'TechSupport','StreamingTV','StreamingMovies','Contract','PaperlessBilling',
                                        'PaymentMethod']))

# Making the pipeline
pipe = Pipeline([("preprocessing", svc_ct),
                ("classifier", SVC())])

# Setting the parameter grid fit with the parameters for the models.
param_grid = [
    {'classifier': [SVC()], 'preprocessing': [svc_ct], 
     'classifier__C': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000],
     'classifier__gamma': [0.0001, 0.01, 0.1, 0.2, 0.5, 0.9, 1, 10],
     'classifier__kernel': ['rbf']},
    {'classifier': [LogisticRegression(max_iter=1000)], 'preprocessing': [logreg_ct],
     'classifier__C': [.0001, .001, .01, .1, 1, 10, 100, 1000]}]



# Making two grid searches; one scoring on simple accuracy and the other on roc_auc(Area under the curve)
grid_search_accuracy = GridSearchCV(pipe,param_grid, cv=15, scoring="accuracy", return_train_score=True)
grid_search_roc_auc = GridSearchCV(pipe,param_grid, cv=15, scoring="roc_auc", return_train_score=True)

**Accuracy**
* Let's see the performance of the model using accuracy as a scoring method.
* Also, find the best estimator, parameters, and score.

In [13]:
# Fitting the training data into the grid search
grid_search_accuracy.fit(X_train,y_train)

GridSearchCV(cv=15,
             estimator=Pipeline(steps=[('preprocessing',
                                        ColumnTransformer(transformers=[('standardscaler',
                                                                         StandardScaler(),
                                                                         ['TenureMonths',
                                                                          'MonthlyCharges',
                                                                          'TotalCharges']),
                                                                        ('onehotencoder',
                                                                         OneHotEncoder(handle_unknown='ignore'),
                                                                         ['Gender',
                                                                          'SeniorCitizen',
                                                                          'Partner',
    

In [13]:
# Finding the best estimator when using accuracy
print(f"Best estimator: {grid_search_accuracy.best_estimator_}")

Best estimator: Pipeline(steps=[('preprocessing',
                 ColumnTransformer(transformers=[('standardscaler',
                                                  StandardScaler(),
                                                  ['TenureMonths',
                                                   'MonthlyCharges',
                                                   'TotalCharges']),
                                                 ('onehotencoder',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['Gender', 'SeniorCitizen',
                                                   'Partner', 'Dependents',
                                                   'PhoneService',
                                                   'MultipleLines',
                                                   'InternetService',
                                                   'OnlineSecurity',
                        

In [30]:
# Finding the best parameters 
print(f"Best parameters: {grid_search_accuracy.best_params_}")

Best parameters: {'classifier': LogisticRegression(C=10, max_iter=1000), 'classifier__C': 0.01, 'preprocessing': ColumnTransformer(transformers=[('standardscaler', StandardScaler(),
                                 ['TenureMonths', 'MonthlyCharges',
                                  'TotalCharges']),
                                ('onehotencoder',
                                 OneHotEncoder(handle_unknown='ignore'),
                                 ['Gender', 'SeniorCitizen', 'Partner',
                                  'Dependents', 'PhoneService', 'MultipleLines',
                                  'InternetService', 'OnlineSecurity',
                                  'OnlineBackup', 'DeviceProtection',
                                  'TechSupport', 'StreamingTV',
                                  'StreamingMovies', 'Contract',
                                  'PaperlessBilling', 'PaymentMethod'])])}


In [31]:
# The best score among the cycles
print(f"Best score: {grid_search_accuracy.best_score_:.2f}")

Best score: 0.81


In [32]:
# Taking a look at the results using accuracy
results_accuracy = pd.DataFrame(grid_search_accuracy.cv_results_)
results_accuracy

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_classifier,param_classifier__C,param_classifier__gamma,param_classifier__kernel,param_preprocessing,params,...,split7_train_score,split8_train_score,split9_train_score,split10_train_score,split11_train_score,split12_train_score,split13_train_score,split14_train_score,mean_train_score,std_train_score
0,0.522170,0.010521,0.117228,0.006699,SVC(),0.0001,0.0001,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734625,0.000099
1,0.514259,0.000953,0.113843,0.004761,SVC(),0.0001,0.01,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734625,0.000099
2,0.512395,0.000506,0.112453,0.000231,SVC(),0.0001,0.1,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734625,0.000099
3,0.518606,0.004120,0.129960,0.008778,SVC(),0.0001,0.2,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734625,0.000099
4,0.513030,0.004546,0.123404,0.011918,SVC(),0.0001,0.5,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734534,0.734625,0.000099
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67,0.042996,0.003206,0.003897,0.000286,"LogisticRegression(C=10, max_iter=1000)",0.1,,,ColumnTransformer(transformers=[('standardscal...,"{'classifier': LogisticRegression(C=10, max_it...",...,0.807203,0.807627,0.807203,0.807839,0.807839,0.808898,0.808051,0.808475,0.807438,0.000834
68,0.058348,0.006228,0.004199,0.001108,"LogisticRegression(C=10, max_iter=1000)",1,,,ColumnTransformer(transformers=[('standardscal...,"{'classifier': LogisticRegression(C=10, max_it...",...,0.808263,0.809110,0.808051,0.807627,0.809110,0.809534,0.807839,0.810169,0.808384,0.001042
69,0.048200,0.011660,0.004022,0.000424,"LogisticRegression(C=10, max_iter=1000)",10,,,ColumnTransformer(transformers=[('standardscal...,"{'classifier': LogisticRegression(C=10, max_it...",...,0.806992,0.808475,0.807627,0.807839,0.809322,0.809534,0.806992,0.810169,0.808300,0.001116
70,0.043737,0.005113,0.003807,0.000059,"LogisticRegression(C=10, max_iter=1000)",100,,,ColumnTransformer(transformers=[('standardscal...,"{'classifier': LogisticRegression(C=10, max_it...",...,0.806780,0.808475,0.807839,0.807839,0.809534,0.809534,0.806780,0.810169,0.808370,0.001167


**roc_auc**
* Let's see the performance of the model using roc_auc as a scoring method.
* Also, find the best estimator, parameters, and score.

In [14]:
# Fitting the training data into the grid search
grid_search_roc_auc.fit(X_train,y_train)

GridSearchCV(cv=15,
             estimator=Pipeline(steps=[('preprocessing',
                                        ColumnTransformer(transformers=[('standardscaler',
                                                                         StandardScaler(),
                                                                         ['TenureMonths',
                                                                          'MonthlyCharges',
                                                                          'TotalCharges']),
                                                                        ('onehotencoder',
                                                                         OneHotEncoder(handle_unknown='ignore'),
                                                                         ['Gender',
                                                                          'SeniorCitizen',
                                                                          'Partner',
    

In [33]:
# Finding the best estimator when using roc_auc
print(f"Best estimator: {grid_search_roc_auc.best_estimator_}")

Best estimator: Pipeline(steps=[('preprocessing',
                 ColumnTransformer(transformers=[('standardscaler',
                                                  StandardScaler(),
                                                  ['TenureMonths',
                                                   'MonthlyCharges',
                                                   'TotalCharges']),
                                                 ('onehotencoder',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['Gender', 'SeniorCitizen',
                                                   'Partner', 'Dependents',
                                                   'PhoneService',
                                                   'MultipleLines',
                                                   'InternetService',
                                                   'OnlineSecurity',
                        

In [34]:
# Finding the best parameters 
print(f"Best parameters: {grid_search_roc_auc.best_params_}")

Best parameters: {'classifier': LogisticRegression(C=10, max_iter=1000), 'classifier__C': 10, 'preprocessing': ColumnTransformer(transformers=[('standardscaler', StandardScaler(),
                                 ['TenureMonths', 'MonthlyCharges',
                                  'TotalCharges']),
                                ('onehotencoder',
                                 OneHotEncoder(handle_unknown='ignore'),
                                 ['Gender', 'SeniorCitizen', 'Partner',
                                  'Dependents', 'PhoneService', 'MultipleLines',
                                  'InternetService', 'OnlineSecurity',
                                  'OnlineBackup', 'DeviceProtection',
                                  'TechSupport', 'StreamingTV',
                                  'StreamingMovies', 'Contract',
                                  'PaperlessBilling', 'PaymentMethod'])])}


In [35]:
# The best score among the cycles
print(f"Best score: {grid_search_roc_auc.best_score_:.2f}")

Best score: 0.85


In [36]:
# Taking a look at the results using area under the curve
results_roc_auc = pd.DataFrame(grid_search_roc_auc.cv_results_)
results_roc_auc

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_classifier,param_classifier__C,param_classifier__gamma,param_classifier__kernel,param_preprocessing,params,...,split7_train_score,split8_train_score,split9_train_score,split10_train_score,split11_train_score,split12_train_score,split13_train_score,split14_train_score,mean_train_score,std_train_score
0,0.520842,0.004116,0.113833,0.000801,SVC(),0.0001,0.0001,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.842460,0.841360,0.841322,0.841754,0.844737,0.841343,0.842097,0.843102,0.841705,0.001437
1,0.516769,0.004293,0.113453,0.000865,SVC(),0.0001,0.01,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.842653,0.841783,0.842253,0.842650,0.845023,0.841777,0.842688,0.843142,0.842123,0.001524
2,0.517477,0.004593,0.113879,0.000999,SVC(),0.0001,0.1,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.831749,0.831881,0.829698,0.830471,0.835643,0.830247,0.831071,0.829884,0.830429,0.002083
3,0.517409,0.002687,0.113593,0.000940,SVC(),0.0001,0.2,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.835559,0.825634,0.833870,0.832283,0.838096,0.832330,0.834478,0.831169,0.832310,0.003611
4,0.516522,0.002351,0.112984,0.000147,SVC(),0.0001,0.5,rbf,ColumnTransformer(transformers=[('standardscal...,"{'classifier': SVC(), 'classifier__C': 0.0001,...",...,0.898315,0.892206,0.901703,0.894344,0.895146,0.900397,0.895353,0.894701,0.897258,0.004025
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67,0.044396,0.004100,0.004351,0.000289,"LogisticRegression(C=10, max_iter=1000)",0.1,,,ColumnTransformer(transformers=[('standardscal...,"{'classifier': LogisticRegression(C=10, max_it...",...,0.851371,0.850216,0.850297,0.850789,0.853713,0.849522,0.850912,0.851579,0.850400,0.001446
68,0.055396,0.005776,0.004756,0.001497,"LogisticRegression(C=10, max_iter=1000)",1,,,ColumnTransformer(transformers=[('standardscal...,"{'classifier': LogisticRegression(C=10, max_it...",...,0.852399,0.851328,0.851420,0.851951,0.854947,0.850544,0.852175,0.852683,0.851518,0.001460
69,0.045853,0.009075,0.004951,0.001660,"LogisticRegression(C=10, max_iter=1000)",10,,,ColumnTransformer(transformers=[('standardscal...,"{'classifier': LogisticRegression(C=10, max_it...",...,0.852511,0.851494,0.851546,0.852147,0.855101,0.850685,0.852304,0.852815,0.851663,0.001459
70,0.040953,0.004718,0.004566,0.001190,"LogisticRegression(C=10, max_iter=1000)",100,,,ColumnTransformer(transformers=[('standardscal...,"{'classifier': LogisticRegression(C=10, max_it...",...,0.852519,0.851507,0.851562,0.852150,0.855114,0.850709,0.852341,0.852820,0.851677,0.001460


### Results
* After running both grid searches, it appears LogisticRegression is the better estimator with `C` set to `10`, and roc_auc is the better scoring method.
#### Making a prediction
* As a remider, we will be predicting whether a customer is going to switch to a different provider or not.
* We're going to take a look at the unlabeled data to refresh our memories, and then use the `best_estimator_` attribute to preditct the target using the best model inside the grid search.

In [16]:
# Let's take a quick look at the unlabeled data
churn_unlabeled

Unnamed: 0,Gender,SeniorCitizen,Partner,Dependents,TenureMonths,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges
0,Male,0,Yes,Yes,17,Yes,Yes,Fiber optic,No,No,No,No,Yes,Yes,Month-to-month,Yes,Electronic check,92.55,1515.10
1,Male,0,Yes,Yes,58,No,No phone service,DSL,Yes,Yes,Yes,No,Yes,No,One year,No,Mailed check,50.00,2919.85
2,Female,0,Yes,No,61,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Credit card (automatic),20.25,1278.80
3,Male,0,Yes,No,14,Yes,Yes,Fiber optic,No,No,No,No,No,Yes,Month-to-month,Yes,Electronic check,85.15,1139.20
4,Female,0,No,No,12,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,No,Mailed check,20.10,223.60
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,Male,0,No,No,50,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,No,No,One year,Yes,Credit card (automatic),90.10,4549.45
296,Female,0,Yes,No,43,Yes,No,DSL,Yes,No,Yes,Yes,Yes,Yes,Two year,Yes,Electronic check,78.80,3460.30
297,Female,0,Yes,Yes,7,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,No,Mailed check,20.45,150.75
298,Male,0,No,Yes,22,Yes,No,Fiber optic,No,Yes,Yes,No,Yes,No,Month-to-month,Yes,Electronic check,89.40,2001.50


In [17]:
# Making the predictions using the best estimator
predictions = grid_search_roc_auc.best_estimator_.predict(churn_unlabeled)

In [18]:
# Writing it to a csv file
pd.Series(predictions, name='target').to_csv("Predictions_churn.csv", index=False)