1 Fit the logistic regression classifier to your training sample and transform, 
i.e. make predictions on the training sample

2 Evaluate your in-sample results using the model score, confusion matrix, and classification report.

3 Print and clearly label the following: Accuracy, true positive rate, false positive rate, true negative rate, 
false negative rate, precision, recall, f1-score, and support.

4 Look in the scikit-learn documentation to research the solver parameter. 
What is your best option(s) for the particular problem you are trying to solve and the data to be used?

5 Run through steps 2-4 using another solver (from question 5)
Which performs better on your in-sample data?

In [1]:
#plotting imports
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

# ignore warnings
import warnings
warnings.filterwarnings("ignore")


#modeling imports
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

from acquire import get_iris_data
from prepare import prep_iris

In [2]:
#1 Fit the logistic regression classifier to your training sample and transform, 
#i.e. make predictions on the training sample

In [3]:
#import and clean iris data
df = prep_iris(get_iris_data())
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [4]:
#split the data into 70/30 train/test groups for x and y
x=df[['sepal_length','sepal_width','petal_length','petal_width']]
y=df[['species']]
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.30,random_state=123)

In [5]:
# Fit logistic regression clasifier onto training samples and transform
logit=LogisticRegression(C=1,class_weight='balanced',random_state=123,solver='saga')
logit.fit(x_train,y_train)

LogisticRegression(C=1, class_weight='balanced', dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=123, solver='saga', tol=0.0001, verbose=0,
                   warm_start=False)

In [6]:
# generate predictions
y_pred = logit.predict(x_train)
y_pred_proba=logit.predict_proba(x_train)

y_pred_proba[:5]

array([[1.06581048e-03, 2.85360763e-01, 7.13573426e-01],
       [1.07334021e-03, 2.41758386e-01, 7.57168273e-01],
       [1.88099190e-02, 6.48735010e-01, 3.32455071e-01],
       [8.64033177e-01, 1.35935117e-01, 3.17061926e-05],
       [7.65439950e-01, 2.34438034e-01, 1.22016286e-04]])

In [7]:
# Evaluate in-sample results using the model score, confusion matrix, and classification report

In [8]:
# print model score
print('Accuracy of Logistic regression on training set: {:.2f}'.format(logit.score(x_train,y_train)))

Accuracy of Logistic regression on training set: 0.96


In [9]:
# print confution martix
print(confusion_matrix(y_train,y_pred))

[[32  0  0]
 [ 0 37  3]
 [ 0  1 32]]


In [10]:
# generate classifiaction report
cr=(classification_report(y_train,y_pred,output_dict=True))
cr

{'setosa': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 32},
 'versicolor': {'precision': 0.9736842105263158,
  'recall': 0.925,
  'f1-score': 0.9487179487179489,
  'support': 40},
 'virginica': {'precision': 0.9142857142857143,
  'recall': 0.9696969696969697,
  'f1-score': 0.9411764705882354,
  'support': 33},
 'accuracy': 0.9619047619047619,
 'macro avg': {'precision': 0.96265664160401,
  'recall': 0.9648989898989898,
  'f1-score': 0.9632981397687281,
  'support': 105},
 'weighted avg': {'precision': 0.9630361618331543,
  'recall': 0.9619047619047619,
  'f1-score': 0.9619765855059974,
  'support': 105}}

In [12]:
# Fit data to regression to new solver
logit=LogisticRegression(C=1,class_weight='balanced',random_state=123,solver='liblinear')
logit.fit(x_train,y_train)

LogisticRegression(C=1, class_weight='balanced', dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=123, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

In [13]:
# generate predictions
y_pred = logit.predict(x_train)
y_pred_proba=logit.predict_proba(x_train)

y_pred_proba[:5]

array([[8.73154083e-04, 2.15635131e-01, 7.83491715e-01],
       [8.97359857e-04, 1.76734891e-01, 8.22367749e-01],
       [1.67572589e-02, 6.35328913e-01, 3.47913829e-01],
       [8.96226906e-01, 1.03746789e-01, 2.63054825e-05],
       [8.13687747e-01, 1.86187330e-01, 1.24922605e-04]])

In [14]:
# print confution martix
print(confusion_matrix(y_train,y_pred))

[[32  0  0]
 [ 0 36  4]
 [ 0  1 32]]


In [15]:
# generate classifiaction report
cr=(classification_report(y_train,y_pred,output_dict=True))
cr

{'setosa': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 32},
 'versicolor': {'precision': 0.972972972972973,
  'recall': 0.9,
  'f1-score': 0.935064935064935,
  'support': 40},
 'virginica': {'precision': 0.8888888888888888,
  'recall': 0.9696969696969697,
  'f1-score': 0.927536231884058,
  'support': 33},
 'accuracy': 0.9523809523809523,
 'macro avg': {'precision': 0.953953953953954,
  'recall': 0.9565656565656565,
  'f1-score': 0.9542003889829976,
  'support': 105},
 'weighted avg': {'precision': 0.9547833547833547,
  'recall': 0.9523809523809523,
  'f1-score': 0.9524885052835362,
  'support': 105}}