Perform hyperparameter tuning on prepared **Titanic dataset** using:
1. `GridSearchCV`
2. `RandomizedSearchCV`

Tune hyperparameters of `LogisticRegression` as follows:
- target metric: F1-score
- hyperparameters: `penalty` (either L1 or L2) and `C` between 0.01 and 10
- 8-fold CV

For both grid and randomized search check 200 combinations of hyperparameters. Pick the right `solver` and `max_iter` parameters. Note that boundaries for C hyperparameter must be the same for both approaches, but the implementation to enforce 100 combinations will be different.

Print best hyperparameters (`C` and `penalty`) for both `GridSearchCV` and`RandomizedSearchCV`. Are they similar?

## Data preprocessing 

In [74]:
from functools import partial

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LogisticRegression as LR
from scipy.stats import loguniform #distribution for Randomized Search
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.svm import SVC
from tqdm import tqdm #visualize the process
tqdm = partial(tqdm, position=0, leave=True)
plt.style.use("bmh")

In [75]:
dataset = pd.read_csv(
    "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv",
    sep=",",
    header=0,
)
dataset.drop(columns="Name", inplace=True)
dataset.Pclass = dataset.Pclass.astype(str)
ohe = OneHotEncoder(sparse_output=False)
ohe_data = ohe.fit_transform(dataset.select_dtypes("O"))
ohe_df = pd.DataFrame(data=ohe_data, columns=ohe.get_feature_names_out())

In [76]:
dataset = pd.concat([dataset.select_dtypes(exclude="O"), ohe_df], axis=1)

In [77]:
X = dataset.drop(columns="Survived")
y = dataset.Survived
X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.6, random_state=42
)

## Hyperparameter tuning

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression

Liblinear solver supports both penalties (as well as saga solver). Number of maximum iterations is set based on convergence. 

In [78]:
log_reg = LR(solver='liblinear', max_iter=10000)

Target metric is F1 score. For binary target f1 is used. 

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Number of folds is 8 according to the task. 

In [79]:
user_scoring = 'f1'
user_cv = 8 

In [80]:
def print_res(search_obj, user_scoring=user_scoring, user_cv=user_cv):
    search_type = 'GridSearchCV' if isinstance(search_obj, GridSearchCV) else 'RandomizedSearchCV'
    
    print(
        f'According to {search_type} procedure with {str(user_cv)} folds, '
        f'logistic regression with {search_obj.best_params_["penalty"]} penalty and '
        f'C (strength of regularization) equaled to {round(search_obj.best_params_["C"], 4)} '
        f'is the best based on {user_scoring}={round(search_obj.best_score_, 4)}.'
    )

## Grid Search 

For Grid Search I should check 200 combinations of hyperparameters, meaning 100 C values for each penalty

In [81]:
c_values = np.linspace(0.01, 10, 100) #list of possible C values
param_grid = {'penalty': ['l1', 'l2'],
              'C':c_values}
param_grid

{'penalty': ['l1', 'l2'],
 'C': array([ 0.01      ,  0.11090909,  0.21181818,  0.31272727,  0.41363636,
         0.51454545,  0.61545455,  0.71636364,  0.81727273,  0.91818182,
         1.01909091,  1.12      ,  1.22090909,  1.32181818,  1.42272727,
         1.52363636,  1.62454545,  1.72545455,  1.82636364,  1.92727273,
         2.02818182,  2.12909091,  2.23      ,  2.33090909,  2.43181818,
         2.53272727,  2.63363636,  2.73454545,  2.83545455,  2.93636364,
         3.03727273,  3.13818182,  3.23909091,  3.34      ,  3.44090909,
         3.54181818,  3.64272727,  3.74363636,  3.84454545,  3.94545455,
         4.04636364,  4.14727273,  4.24818182,  4.34909091,  4.45      ,
         4.55090909,  4.65181818,  4.75272727,  4.85363636,  4.95454545,
         5.05545455,  5.15636364,  5.25727273,  5.35818182,  5.45909091,
         5.56      ,  5.66090909,  5.76181818,  5.86272727,  5.96363636,
         6.06454545,  6.16545455,  6.26636364,  6.36727273,  6.46818182,
         6.56909091,

In [82]:
grid_model = GridSearchCV(estimator=log_reg,
                          param_grid=param_grid,
                          scoring=user_scoring,
                          cv=user_cv)

In [83]:
grid_model.fit(X_train,y_train)

In [84]:
grid_model.best_params_

{'C': 0.41363636363636364, 'penalty': 'l1'}

In [85]:
grid_model.best_score_

0.783022279828099

In [86]:
print_res(grid_model)

According to GridSearchCV procedure with 8 folds, logistic regression with l1 penalty and C (strength of regularization) equaled to 0.4136 is the best based on f1=0.783.


## Randomized Search

For Randomized Search I should check 200 combinations of hyperparameters from a distribution. I will use log-uniform distribution which is useful for exploring the values that vary over several orders of magnitude.

In [87]:
c_values = loguniform(0.01, 10) #definition of distribution
param_distr = {'penalty': ['l1', 'l2'],
              'C':c_values}
param_distr

{'penalty': ['l1', 'l2'],
 'C': <scipy.stats._distn_infrastructure.rv_continuous_frozen at 0x25a6e543b60>}

In [89]:
random_model = RandomizedSearchCV(estimator=log_reg,
                          param_distributions=param_distr,
                          scoring=user_scoring,
                          cv=user_cv,
                          n_iter=200) #enforcing 200 combinations

In [90]:
random_model.fit(X_train,y_train)

In [91]:
random_model.best_params_

{'C': 0.5344496974437439, 'penalty': 'l1'}

In [92]:
random_model.best_score_

0.783022279828099

In [93]:
print_res(random_model)

According to RandomizedSearchCV procedure with 8 folds, logistic regression with l1 penalty and C (strength of regularization) equaled to 0.5344 is the best based on f1=0.783.


## Conclusion

Both search models choose LASSO (L1) penalty. Strength of regularization is lower (C is higher) in the case of Randomized Search approach, leading to more complex model. However F1 score is the same for both approaches. 