# Logistic Regression Model

Last accessed December 6, 2020.

In [9]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate
import numpy as np
from KLS_Train_Test_Split import others_blamed_train, others_blamed_test,\
ke_train, ke_test


Fitting the logistic regression model.

In [10]:
# Logistic Regression model

logreg = LogisticRegression(penalty = 'none', random_state = 1)
logreg.fit(others_blamed_train, ke_train)

LogisticRegression(penalty='none', random_state=1)

Five-fold cross-validation on the logistic regression model.

In [11]:
# Model Selection using Cross Validation on logistic regression model

five_fold_log_select = cross_validate(estimator = logreg, X = others_blamed_train, y = ke_train, cv = 5, return_estimator = True, return_train_score = True)
five_fold_log_select

{'fit_time': array([0.00499797, 0.00499654, 0.0039978 , 0.0049963 , 0.00501418]),
 'score_time': array([0.        , 0.        , 0.        , 0.        , 0.00099945]),
 'estimator': (LogisticRegression(penalty='none', random_state=1),
  LogisticRegression(penalty='none', random_state=1),
  LogisticRegression(penalty='none', random_state=1),
  LogisticRegression(penalty='none', random_state=1),
  LogisticRegression(penalty='none', random_state=1)),
 'test_score': array([0.45454545, 0.7       , 0.7       , 0.7       , 0.6       ]),
 'train_score': array([0.7       , 0.65853659, 0.65853659, 0.63414634, 0.68292683])}

In this cell, we find the mean and standard deviation of the logistic regression model's test scores.

In [12]:
# Find the mean and standard deviation of the logistic regression model's test scores
log_select_mean = five_fold_log_select['test_score'].mean()

log_select_std = five_fold_log_select['test_score'].std()

# Print the results of the mean and standard deviation for the logistic regression model's test scores
print('Logistic Regression 5 fold cv results (Accuracy) %.3f +/- %.3f'%(log_select_mean, log_select_std))

Logistic Regression 5 fold cv results (Accuracy) 0.631 +/- 0.096


Fine tuning the model.  Applying an l2 penalty made the model more accurate.

In [18]:
# Logistic Regression model

logreg2 = LogisticRegression(penalty = 'l2', random_state = 1)
logreg2.fit(others_blamed_train, ke_train)

five_fold_log_select2 = cross_validate(estimator = logreg2, X = others_blamed_train, y = ke_train, cv = 5, return_estimator = True, return_train_score = True)

# Find the mean and standard deviation of the logistic regression model's test scores
log_select_mean2 = five_fold_log_select2['test_score'].mean()

log_select_std2 = five_fold_log_select2['test_score'].std()

# Print the results of the mean and standard deviation for the logistic regression model's test scores
print('Logistic Regression 5 fold cv results (Accuracy) %.3f +/- %.3f'%(log_select_mean2, log_select_std2))

Logistic Regression 5 fold cv results (Accuracy) 0.667 +/- 0.042


The logistic regression model predicted that all of the below scenarios would result in Karachi Electric being blamed with the exception of only NEPRA being blamed.

In [16]:
print(f'No others blamed: {logreg2.predict([[0, 0, 0]])}')
print(f'Just NEPRA blamed: {logreg2.predict([[1, 0, 0]])}')
print(f'Just SUI Gas blamed: {logreg2.predict([[0, 1, 0]])}')
print(f'Just Tehreeki Insaaf blamed: {logreg2.predict([[0, 0, 1]])}')

No others blamed: [1]
Just NEPRA blamed: [1]
Just SUI Gas blamed: [1]
Just Tehreeki Insaaf blamed: [1]


According to the logistic regression model, there is a 72% probability that Karachi Electric will be blamed if none of the other entities are blamed, a 63% probability if only NEPRA is blamed, a 70% probability if only Sui Gas is blamed, and a 74% probability if only Tehreeki Insaaf is blamed.

In [17]:
print(f'No others blamed: {logreg2.predict_proba([[0, 0, 0]])}')
print(f'Just NEPRA blamed: {logreg2.predict_proba([[1, 0, 0]])}')
print(f'Just Sui Gas blamed: {logreg2.predict_proba([[0, 1, 0]])}')
print(f'Just Tehreeki Insaaf blamed: {logreg2.predict_proba([[0, 0, 1]])}')

No others blamed: [[0.27971386 0.72028614]]
Just NEPRA blamed: [[0.3699073 0.6300927]]
Just Sui Gas blamed: [[0.30125282 0.69874718]]
Just Tehreeki Insaaf blamed: [[0.26482475 0.73517525]]
