# <font color=blue>Assignment</font>

In this assignment, you are going to measure the performance of the model you created with the Titanic dataset in the previous lesson. To complete this assignment, send a link to a Jupyter notebook containing solutions to the following tasks.

- Evaluate your model's performance with cross validation and using different metrics.
- Determine the model with the most appropriate parameters by hyperparameter tuning.

# <font color=blue>Solution</font>

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

import warnings
warnings.filterwarnings('ignore')


data=pd.read_csv('cleveland-0_vs_4.dat')

In [2]:
data['ca'] = pd.to_numeric(data.ca, errors='coerce')
data['ca'].fillna(0, inplace=True)
data['thal'] = pd.to_numeric(data.thal,errors='coerce')
data['thal'].fillna(3, inplace=True)
data['Class'] = pd.get_dummies(data.num, drop_first=True)

In [14]:
X = data.drop(['num', 'Class'], axis=1)
y= data.Class
logreg_model = LogisticRegression()

In [15]:
from sklearn.model_selection import cross_validate, cross_val_score

cv = cross_validate(estimator=logreg_model,
                    X=X,
                    y=y,
                    cv=10,
                    return_train_score=True,
                    scoring = ['accuracy', 'precision', 'r2']
                   )

In [16]:
print('Train Set Mean Accuracy  : {:.2f}  '.format(cv['train_accuracy'].mean()))
print('Train Set Mean R-square  : {:.2f}  '.format(cv['train_r2'].mean()))
print('Train Set Mean Precision : {:.2f}\n'.format(cv['train_precision'].mean()))

print('Test Set Mean Accuracy   : {:.2f}  '.format(cv['test_accuracy'].mean()))
print('Test Set Mean R-square   : {:.2f}  '.format(cv['test_r2'].mean()))
print('Test Set Mean Precision  : {:.2f}  '.format(cv['test_precision'].mean()))



Train Set Mean Accuracy  : 0.97  
Train Set Mean R-square  : 0.58  
Train Set Mean Precision : 0.88

Test Set Mean Accuracy   : 0.95  
Test Set Mean R-square   : 0.30  
Test Set Mean Precision  : 0.63  


### Hyperparameter Tuning

In [4]:
parameters = {"C": [10 ** x for x in range (-5, 5, 1)],
              "penalty": ['l1', 'l2']
             }

In [6]:
from sklearn.model_selection import GridSearchCV

grid_cv = GridSearchCV(estimator=logreg_model,
                       param_grid = parameters,
                       cv = 10
                      )

grid_cv.fit(X, y)

GridSearchCV(cv=10, estimator=LogisticRegression(),
             param_grid={'C': [1e-05, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100,
                               1000, 10000],
                         'penalty': ['l1', 'l2']})

In [7]:
print("Best Parameters : ", grid_cv.best_params_)
print("Best Score      : ", grid_cv.best_score_)

Best Parameters :  {'C': 1, 'penalty': 'l2'}
Best Score      :  0.9549019607843137


In [11]:
import pandas as pd

results = grid_cv.cv_results_

df = pd.DataFrame(results)
df = df[['param_penalty','param_C', 'mean_test_score']]
df = df.sort_values(by='mean_test_score', ascending = False)
df

Unnamed: 0,param_penalty,param_C,mean_test_score
11,l2,1.0,0.954902
13,l2,10.0,0.954902
15,l2,100.0,0.949346
17,l2,1000.0,0.943464
19,l2,10000.0,0.943464
9,l2,0.1,0.943137
1,l2,1e-05,0.926797
3,l2,0.0001,0.926797
5,l2,0.001,0.926797
7,l2,0.01,0.926797
