# 3.3.4 [Challenge: Logistic Regressions No Penalty, Ridge, Lasso](https://courses.thinkful.com/data-201v1/project/3.3.4)

Pick a dataset of your choice with a binary outcome and the potential for at least 15 features.

Engineer your features, then create three models. Each model will be run on a training set and a test-set (or multiple test-sets, if you take a folds approach). The models should be:

1. Vanilla logistic regression
2. Ridge logistic regression - penalty argument L2
3. Lasso logistic regression - penalty argument L1


Evaluate the 3 models and select the best
* explain why this is the best model
* specify feature selection, regularizatio parameter selection, model evaluation criteria that led me to select my model
* list strengths and limitations of regression as a modeling approach
* Where there things I wished I could do that I couldn't?

# Reflection on Models
I am most satisfied with the Lasso Regression. I used train and test sets before and after downsampling. For model evaluation I used the Accuracy Score. I also printed the confusion matrix but did not look closely at the results (I should have). 
I think I liked Lasso best because it was not as impacted by my poor feature selection. 
I would like to do a better job of evaluating the conribution of features. It would be great to have a pipeline that split my train/test sets, downsamples, and split train/test sets again. I also havent been including variable interaction features as much as I would like. Would be cool if I ran a loop to optimize regularization variable and was more involved in scaling/normalization of features within my model fitting.

In [47]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy

%matplotlib inline

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.utils import resample
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.preprocessing import scale
from sklearn import preprocessing
from IPython.display import Image
import pydotplus
import graphviz

In [57]:
dt_call = pd.read_csv('unit_3_data/telco_churn_dataset.csv')
# Minutes/Call
dt_call['TotalMinutes'] = dt_call['TotalDayMinutes'] + dt_call['TotalEveMinutes'] + dt_call['TotalNightMinutes']
dt_call['MinutesPerCall'] = dt_call['TotalMinutes']/dt_call['TotalCall']

# Percent of Calls at night
dt_call['PercentNightCalls'] = dt_call['TotalNightCalls']/dt_call['TotalCall']
dt_call['PercentDayCalls'] = dt_call['TotalDayCalls']/dt_call['TotalCall']

# New subscriber flag (in first cycle of contract)
dt_call['tenure_num'] = np.where(dt_call['tenure']==0, 0.0, 
                                 np.where(dt_call['Contract']=='One year', dt_call['tenure']/12.0,
                                          np.where(dt_call['Contract']=='Two year', dt_call['tenure']/24.0,
                                                   dt_call['tenure']
                                                  )))
dt_call['new_sub_flag'] = np.where(dt_call['tenure_num']<=1, 1, 0)
dt_call['messages'] = np.where(dt_call['NumbervMailMessages'] >=1, 1, 0)

dt_call.drop(['customerID','TotalRevenue'],axis=1, inplace=True)



dt_call['Churn'] = np.where(dt_call['Churn']== 'Yes',1,0)

In [58]:
dt_call = dt_call[['Churn',   
                   'MultipleLines', 
                   'TotalIntlCalls', 
                   'TotalCall', 'CustomerServiceCalls',
                   'tenure','MinutesPerCall', 'tenure_num', 
                   'PercentDayCalls', 'PercentNightCalls', 
                   'new_sub_flag', 'SeniorCitizen', 'InternetService',
                   'Contract', 'PaperlessBilling', 'NumbervMailMessages' ]]
dt_call = pd.get_dummies(dt_call, drop_first=True)
#dt_call = pd.DataFrame(scale(dt_call), columns=dt_call.columns)

In [59]:
# Display class counts to see overall distribution of Churn
dt_call.Churn.value_counts()

0    2850
1     483
Name: Churn, dtype: int64

In [75]:
# Identify Churn Flag
y = dt_call.Churn
X = dt_call.drop('Churn', axis=1)

# Define the training and test sizes.

train, test = train_test_split(dt_call,
                               test_size=0.33,
                               stratify=y                                  
)


# Display.
X_holdout = test.drop('Churn',axis=1)
y_holdout = test['Churn']

# Display class counts for training set to determine what number to downsample to
minority_samples = train.Churn.value_counts().min()
print(minority_samples)

324


In [61]:
#

df_majority = train[train.Churn==0]
df_minority = train[train.Churn==1]

# Downsample Majority Class
df_majority_downsampled = resample(df_majority,
                                  replace=False,   # sample without replacement
                                  n_samples=minority_samples,   # to match minority class
                                  random_state=123 # reproducible results
                                  )

# Combine minority class with downsampled majority class
df_downsampled = pd.concat([df_majority_downsampled, df_minority])

# New Class Counts
df_downsampled.Churn.value_counts()

1    324
0    324
Name: Churn, dtype: int64

## Logistic Regression

In [62]:
# Declare logistic regression classifier.
# Parameter regulariation I'm going to leave it
lr = LogisticRegression()
X_train, X_test, y_train, y_test = train_test_split(df_downsampled.drop('Churn', axis=1),
                                                    df_downsampled['Churn'],
                                                    test_size=0.33, random_state=42)



In [76]:
# Fit the model to training set
fit = lr.fit(X_train, y_train)

# Display.
print('Coefficients')
print(X_train.columns.values)
print(fit.coef_)
print(fit.intercept_)

# In Sample Predictions
pred_y_train = lr.predict(X_train)

print('\n In Sample Accuracy by admission status')
print(pd.crosstab(pred_y_train, y_train))

print('\n In Sample Percentage Accuracy')
print(lr.score(X_train, y_train))



# Out of Sample Predictions
pred_y_test = lr.predict(X_test)

print('\n Out of Sample Accuracy by admission status')
print(pd.crosstab(pred_y_test, y_test))

print('\n Out Sample Percentage Accuracy')
print(lr.score(X_test, y_test))


# Original holdout Sample Predictions
pred_y_holdout = lr.predict(X_holdout)

print('\n Original Test Sample Accuracy by admission status')
print(pd.crosstab(pred_y_holdout, y_holdout))

print('\n Original Test Sample Percentage Accuracy')
print(lr.score(X_holdout, y_holdout))

Coefficients
['TotalIntlCalls' 'TotalCall' 'CustomerServiceCalls' 'tenure'
 'MinutesPerCall' 'tenure_num' 'PercentDayCalls' 'PercentNightCalls'
 'new_sub_flag' 'SeniorCitizen' 'NumbervMailMessages' 'MultipleLines_Yes'
 'InternetService_Fiber optic' 'InternetService_No' 'Contract_One year'
 'Contract_Two year' 'PaperlessBilling_Yes']
[[-0.04698645  0.00494297  0.36361792 -0.0386472   0.39980518 -0.00274286
  -0.27780156 -0.45519936  0.54649369  0.04243869 -0.02606389 -1.80094948
   1.31605354 -0.87387106 -0.72187925 -1.88231894  0.53886703]]
[-1.99031778]

 In Sample Accuracy by admission status
Churn    0    1
row_0          
0      159   28
1       47  200

 In Sample Percentage Accuracy
0.8271889400921659

 Out of Sample Accuracy by admission status
Churn   0   1
row_0        
0      83  13
1      35  83

 Out Sample Percentage Accuracy
0.7757009345794392

 Original Test Sample Accuracy by admission status
Churn    0    1
row_0          
0      711   25
1      230  134

 Original Tes

# Ridge Logistic Regression

In [65]:
# Fit Ridge Regression Model
# In Sklearn Alpha is the regularization parameter
# As alpha gets larger, parameter shrinkage becomes more pronounce. 
# Intercept is not regularized

# In the example the data is scaled

# Fit ridgeregr
ridge = LogisticRegression(penalty='l2', C=10, fit_intercept=True)
ridge.fit(X_train, y_train)

LogisticRegression(C=10, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [73]:
# Fit Ridge Regression Model
# In Sklearn Alpha is the regularization parameter
# As alpha gets larger, parameter shrinkage becomes more pronounce. 
# Intercept is not regularized

# In the example the data is scaled

# Fit ridgeregr
ridge = LogisticRegression(penalty='l2', C=10, fit_intercept=True)
ridge.fit(X_train, y_train)

# Display.
print('Coefficients')
print(X_train.columns.values)
print(fit.coef_)
print(fit.intercept_)

# In Sample Predictions
pred_y_train = ridge.predict(X_train)

print('\n In Sample Accuracy by admission status')
print(pd.crosstab(pred_y_train, y_train))

print('\n In Sample Percentage Accuracy')
print(ridge.score(X_train, y_train))


# Out of Sample Predictions
pred_y_test = ridge.predict(X_test)

print('\n Out of Sample Accuracy by admission status')
print(pd.crosstab(pred_y_test, y_test))

print('\n Out Sample Percentage Accuracy')
print(ridge.score(X_test, y_test))


# Original Holdout Test Set
pred_y_holdout = ridge.predict(X_holdout)

print('\n Original Test Sample Accuracy by admission status')
print(pd.crosstab(pred_y_holdout, y_holdout))

print('\n Original Test Sample Percentage Accuracy')
print(ridge.score(X_holdout, y_holdout))

Coefficients
['TotalIntlCalls' 'TotalCall' 'CustomerServiceCalls' 'tenure'
 'MinutesPerCall' 'tenure_num' 'PercentDayCalls' 'PercentNightCalls'
 'new_sub_flag' 'SeniorCitizen' 'NumbervMailMessages' 'MultipleLines_Yes'
 'InternetService_Fiber optic' 'InternetService_No' 'Contract_One year'
 'Contract_Two year' 'PaperlessBilling_Yes']
[[-0.04698645  0.00494297  0.36361792 -0.0386472   0.39980518 -0.00274286
  -0.27780156 -0.45519936  0.54649369  0.04243869 -0.02606389 -1.80094948
   1.31605354 -0.87387106 -0.72187925 -1.88231894  0.53886703]]
[-1.99031778]

 In Sample Accuracy by admission status
Churn    0    1
row_0          
0      164   33
1       42  195

 In Sample Percentage Accuracy
0.8271889400921659

 Out of Sample Accuracy by admission status
Churn   0   1
row_0        
0      91  14
1      27  82

 Out Sample Percentage Accuracy
0.8084112149532711

 Original Test Sample Accuracy by admission status
Churn    0    1
row_0          
0      715   23
1      226  136

 Original Tes

In [43]:
# Fit Ridge Regression Model
# In Sklearn Alpha is the regularization parameter
# As alpha gets larger, parameter shrinkage becomes more pronounce. 
# Intercept is not regularized

# In the example the data is scaled

# Fit ridgeregr
ridge = LogisticRegression(penalty='l2', C=20, fit_intercept=True)
ridge.fit(X_train, y_train)

# Display.
print('Coefficients')
print(X_train.columns.values)
print(fit.coef_)
print(fit.intercept_)

# In Sample Predictions
pred_y_train = ridge.predict(X_train)

print('\n In Sample Accuracy by admission status')
print(pd.crosstab(pred_y_train, y_train))

print('\n In Sample Percentage Accuracy')
print(ridge.score(X_train, y_train))



# Out of Sample Predictions
pred_y_test = ridge.predict(X_test)

print('\n Out of Sample Accuracy by admission status')
print(pd.crosstab(pred_y_test, y_test))

print('\n Out Sample Percentage Accuracy')
print(ridge.score(X_test, y_test))



# Original Hold out TestPredictions
pred_y_holdout = ridge.predict(X_holdout)

print('\n Original Test Sample Accuracy by admission status')
print(pd.crosstab(pred_y_holdout, y_holdout))

print('\n Original Test Sample Percentage Accuracy')
print(ridge.score(X_holdout, y_holdout))

Coefficients
['TotalIntlCalls' 'TotalCall' 'CustomerServiceCalls' 'tenure'
 'MinutesPerCall' 'tenure_num' 'PercentDayCalls' 'PercentNightCalls'
 'new_sub_flag' 'SeniorCitizen' 'NumbervMailMessages' 'MultipleLines_Yes'
 'InternetService_Fiber optic' 'InternetService_No' 'Contract_One year'
 'Contract_Two year' 'PaperlessBilling_Yes']
[[-0.01961952 -0.00281472  0.41899425 -0.02523088  1.15843322 -0.00260357
  -0.03780221 -0.07143994  0.56389646  0.77026105 -0.02493072 -2.06089247
   0.51320985 -1.31984186 -1.05342253 -1.86660257  0.06812434]]
[-1.06007715]

 In Sample Accuracy by admission status
Churn    0    1
row_0          
0      162   24
1       44  204

 In Sample Percentage Accuracy
0.8433179723502304

 Out of Sample Accuracy by admission status
Churn   0   1
row_0        
0      90  14
1      28  82

 Out Sample Percentage Accuracy
0.8037383177570093

 Original Test Sample Accuracy by admission status
Churn    0    1
row_0          
0      699   16
1      242  143

 Original Tes

# LASSO Logistic Regression

In [71]:
# Declare 
# Fit Ridge Regression Model
# In Sklearn Alpha is the regularization parameter
# As alpha gets larger, parameter shrinkage becomes more pronounce. 
# Intercept is not regularized

# In the example the data is scaled

# Fit ridgeregr
ridge = LogisticRegression(penalty='l1', C=4, fit_intercept=True)
ridge.fit(X_train, y_train)

# Display.
print('Coefficients')
print(X_train.columns.values)
print(fit.coef_)
print(fit.intercept_)

# In Sample Predictions
pred_y_train = ridge.predict(X_train)

print('\n In Sample Accuracy by admission status')
print(pd.crosstab(pred_y_train, y_train))

print('\n In Sample Percentage Accuracy')
print(ridge.score(X_train, y_train))



# Out of Sample Predictions
pred_y_test = ridge.predict(X_test)

print('\n Out of Sample Accuracy by admission status')
print(pd.crosstab(pred_y_test, y_test))

print('\n Out Sample Percentage Accuracy')
print(ridge.score(X_test, y_test))

# Display.
X_holdout = test.drop('Churn',axis=1)
y_holdout = test['Churn']

# Original Hold out TestPredictions
pred_y_holdout = ridge.predict(X_holdout)

print('\n Original Test Sample Accuracy by admission status')
print(pd.crosstab(pred_y_holdout, y_holdout))

print('\n Original Test Sample Percentage Accuracy')
print(ridge.score(X_holdout, y_holdout))

Coefficients
['TotalIntlCalls' 'TotalCall' 'CustomerServiceCalls' 'tenure'
 'MinutesPerCall' 'tenure_num' 'PercentDayCalls' 'PercentNightCalls'
 'new_sub_flag' 'SeniorCitizen' 'NumbervMailMessages' 'MultipleLines_Yes'
 'InternetService_Fiber optic' 'InternetService_No' 'Contract_One year'
 'Contract_Two year' 'PaperlessBilling_Yes']
[[-0.04698645  0.00494297  0.36361792 -0.0386472   0.39980518 -0.00274286
  -0.27780156 -0.45519936  0.54649369  0.04243869 -0.02606389 -1.80094948
   1.31605354 -0.87387106 -0.72187925 -1.88231894  0.53886703]]
[-1.99031778]

 In Sample Accuracy by admission status
Churn    0    1
row_0          
0      165   32
1       41  196

 In Sample Percentage Accuracy
0.8317972350230415

 Out of Sample Accuracy by admission status
Churn   0   1
row_0        
0      91  15
1      27  81

 Out Sample Percentage Accuracy
0.8037383177570093

 Original Test Sample Accuracy by admission status
Churn    0    1
row_0          
0      720   25
1      221  134

 Original Tes