# Table of content

- [Titanic](#Titanic)
  - [Setup](#Setup)
  - [Data](#Data)
    - [Download](#Download)
    - [Explore with QuickDA](#Explore-with-QuickDA)
    - [Split Data](#Split-Data)
  - [Model's Common Functions](#Model's-Common-Functions)
  - [Baseline Only Females Survived : 0.76315](#Baseline-Only-Females-Survived-:-0.76315)
  - [Log Sex Pclass : 0.76555](#Log-Sex-Pclass-:-0.76555)
    - [Transformations](#Transformations)
    - [Model](#Model)
    - [Submission](#Submission)
- [Titanic - Advanced](#Titanic-\--Advanced)
  - [Advanced Models' Common Functions](#Advanced-Models'-Common-Functions)
  - [Custom Transformers](#Custom-Transformers)
  - [Pipelines](#Pipelines)
  - [Logistic Regression : 0.76555](#Logistic-Regression-:-0.76555)

# Titanic

This notebook has been inspired from the book [*Handson-Machine Learning with Scikit-learn, Tensorflow and Keras*](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/). 

Thanks to the author, [Aurélien Géron](https://github.com/ageron).

## Setup

What does the environment require?

In [264]:
# Python ≥3.5 is required
from pathlib import Path
import sys

import numpy as np
import pandas as pd
import sklearn

assert sklearn.__version__ >= '0.20'
assert sys.version_info >= (3, 5)

np.random.seed(42)

*You will also need QuickDA (see [below](#Explore-with-QuickDA)).*

## Data

### Download

Make sure you have a directory named `titanic_dataset` with the .csv files in it.

In [265]:
for file_path in Path.cwd().joinpath('titanic_dataset').glob('**/*'):
    print(file_path)

/home/evc/Desktop/git/kaggle-titanic-template/titanic_dataset/gender_submission.csv
/home/evc/Desktop/git/kaggle-titanic-template/titanic_dataset/test.csv
/home/evc/Desktop/git/kaggle-titanic-template/titanic_dataset/train.csv


In [266]:
def load_titanic_dataset(filename, path='titanic_dataset'):
    csv_path = Path.joinpath(Path(path), filename)
    return pd.read_csv(csv_path)


data = load_titanic_dataset('train.csv')
submit = load_titanic_dataset('test.csv')
gender_submission = load_titanic_dataset('gender_submission.csv')

### Explore with QuickDA

We can now explore the data since it is downloaded. Let's use QuickDA which makes data exploration easily. Don't forget to download it from your terminal with the following command : `py -m pip install quickda`

In [267]:
# import quickda.explore_data as qda

# qda.explore(data, method='profile', report_name='Design Report')

### Custom Explorations

QuickDA is quick & easy, but you might need to go further in your data analysis.

In [268]:
data[['Sex', 'Survived']].groupby(['Sex']).mean()

Unnamed: 0_level_0,Survived
Sex,Unnamed: 1_level_1
female,0.742038
male,0.188908


In [269]:
data[['Pclass', 'Survived']].groupby(['Pclass']).mean()

Unnamed: 0_level_0,Survived
Pclass,Unnamed: 1_level_1
1,0.62963
2,0.472826
3,0.242363


### Split Data

Before you modify your data, you need to create a test set!

The sex is an important feature. Therefore, we need to keep the same proportions of males and females in both sets with `stratify=data['Sex']`.

In [270]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data,
                               test_size=0.2,
                               random_state=42,
                               stratify=data['Sex'])

## Model's Common Functions

For [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) codes' purposes. Don't mind them, come back if you need to.

In [271]:
from sklearn.metrics import accuracy_score


def show_performance(predictions, ground_truth):
    accuracy = accuracy_score(predictions, ground_truth)
    print(f'Test set\'s accuracy : {accuracy:.5f}.')


def submit_csv(submission, file_name):
    output_dir = Path('submissions')
    output_dir.mkdir(parents=True, exist_ok=True)

    submission.to_csv(output_dir.joinpath(file_name), index=False)

## Baseline Only Females Survived : 0.76315

Since the females' survival rate is 74.2% and the males, 18.9%,
we can do a quick & easy model in which every female survived and every male died.

In [272]:
def baseline_female(data):
    predictions = np.zeros(data.shape[0])
    predictions[data['Sex'] == 'male'] = 0
    predictions[data['Sex'] == 'female'] = 1
    return predictions

In [273]:
predictions = baseline_female(test)
show_performance(predictions, test['Survived'])

Test set's accuracy : 0.77654.


Not bad, let's do a submission :

In [274]:
predictions = baseline_female(submit)
predictions[1] = 0  # Otherwise, Kaggle won't compute your score...

submission = pd.DataFrame({
    'PassengerId': submit['PassengerId'],
    'Survived': predictions
})

FILE_NAME = 'baseline_female.csv'
submit_csv(submission, FILE_NAME)
submission.head()

Unnamed: 0,PassengerId,Survived
0,892,0.0
1,893,0.0
2,894,0.0
3,895,0.0
4,896,1.0


## Log Sex Pclass : 0.76555

Now, let's do a *machine learning* model.

In [275]:
train_copy = train.copy()
test_copy = test.copy()
submit_copy = submit.copy()

### Transformations

Here, you can add some attributes to `x_att`.

In [276]:
from sklearn.preprocessing import OneHotEncoder

x_att = ['Sex', 'Pclass']
y_att = ['Survived']

x_train = train_copy[x_att]
y_train = train_copy[y_att]
x_test = test_copy[x_att]
y_test = test_copy[y_att]
x_submit = submit_copy[x_att]

**You can do here some data preprocessing.**

Machine learning algorithms don't understand strings, but do understand vectors.

Therefore you should use `OneHotEncoder()` to transform your data this way :
* `'male' -> [1, 0]`
* `'female' -> [0, 1]`

Can you tell why we do the same thing with `Pclass`?

In [277]:
one_hot = OneHotEncoder()

x_train_tfm = one_hot.fit_transform(x_train)
x_test_tfm = one_hot.fit_transform(x_test)
x_submit_tfm = one_hot.fit_transform(x_submit)

### Model

Once you transformed the data into something *edible* for your algorithm, you can train it.

In [278]:
from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression(random_state=42)
log_reg.fit(x_train_tfm, np.array(y_train).ravel())
test_pred = log_reg.predict(x_test_tfm)

show_performance(test_pred, y_test['Survived'])

Test set's accuracy : 0.77654.


This is same result that we got with the baseline. How should you interpret this accuracy?

### Submission

In [279]:
submit_pred = log_reg.predict(x_submit_tfm)
submission = pd.DataFrame({
    'PassengerId': submit['PassengerId'],
    'Survived': submit_pred
})

FILE_NAME = 'log_sex_pclass.csv'
submit_csv(submission, FILE_NAME)

**You're now done with the first part. Congrats!**

**Before going further, try to get a higher score by :**

* Adding some features
* Doing some data preprocessing
* Tweaking the model's hyperparameters
* Trying another model

Also, don't forget to share your score with us!

# Titanic - Advanced


Here, you are going to create *custom transformers* and to use *Pipeline* and *GridsearchCV*.

## Advanced Models' Common Functions

Once again, some functions for [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) code. Don't mind them, come back if you need to.

In [280]:
import joblib
from IPython.display import Audio

SOUND_FILE_NAME = './no_sound.wav'
USE_SOUND_FILE = False


def ring(use_sound_file=USE_SOUND_FILE, sound_file=SOUND_FILE_NAME):
    if use_sound_file:
        return Audio(sound_file, rate=1, autoplay=True)


def print_model_stats(cv_clf):
    values = [(param[0], palist_values_clf, model_name, x_test_tfm):
    predictions = cv_clf.predict(x_test_tfm)

    submission = pd.DataFrame({
        'PassengerId': submit['PassengerId'],
        'Survived': predictions
    })

    file_name = f'{model_name}.csv'

    output_dir = Path('submissions')
    output_dir.mkdir(parents=True, exist_ok=True)

    submission.to_csv(output_dir.joinpath(file_name), index=False)

    joblib.dump(cv_clf, output_dir.joinpath(f'{model_name}.pkl'))
    return joblib.load(output_dir.joinpath(f'{model_name}.pkl'))

## Custom Transformers

In [281]:
from sklearn.base import TransformerMixin, BaseEstimator


class tfm_example(TransformerMixin, BaseEstimator):
    def __init__(self, do_tfm=False):
        self.do_tfm = do_tfm

    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        if self.do_tfm:
            return X
        return X

## Pipelines

In [282]:
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer

x_att = ['Sex']
y_att = ['Survived']

x_train = train[x_att]
y_train = train[y_att]
x_test = test[x_att]
y_test = test[y_att]
x_submit = submit[x_att]

categorical_tfm = Pipeline([('one_hot', OneHotEncoder())])

survived_idx = 0
pipeline = ColumnTransformer([('cat', categorical_tfm, [survived_idx])],
                             remainder='drop')

## Logistic Regression : 0.76555


In [283]:
from sklearn.linear_model import LogisticRegression

log_pipeline = Pipeline([('pipe', pipeline),
                         ('log', LogisticRegression(random_state=42))])

print(log_pipeline.get_params().keys())

dict_keys(['memory', 'steps', 'verbose', 'pipe', 'log', 'pipe__n_jobs', 'pipe__remainder', 'pipe__sparse_threshold', 'pipe__transformer_weights', 'pipe__transformers', 'pipe__verbose', 'pipe__cat', 'pipe__cat__memory', 'pipe__cat__steps', 'pipe__cat__verbose', 'pipe__cat__one_hot', 'pipe__cat__one_hot__categories', 'pipe__cat__one_hot__drop', 'pipe__cat__one_hot__dtype', 'pipe__cat__one_hot__handle_unknown', 'pipe__cat__one_hot__sparse', 'log__C', 'log__class_weight', 'log__dual', 'log__fit_intercept', 'log__intercept_scaling', 'log__l1_ratio', 'log__max_iter', 'log__multi_class', 'log__n_jobs', 'log__penalty', 'log__random_state', 'log__solver', 'log__tol', 'log__verbose', 'log__warm_start'])


In [284]:
from sklearn.model_selection import GridSearchCV

params = {
    'log__C': [1, 10],
    'log__dual': [True, False],
    'log__fit_intercept': [True, False],
    'log__max_iter': [10**4],
    'log__penalty': ['l1', 'l2', 'elasticnet', 'none'],
    'log__solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
}
cv_log = GridSearchCV(log_pipeline, params, verbose=2, scoring='accuracy')
cv_log.fit(x_train, y_train.values.ravel())
ring()

Fitting 5 folds for each of 160 candidates, totalling 800 fits
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got l1 penalty.

Traceback (most recent call last):
  File "/home/evc/Desktop

[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=saga; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__f

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 446, in _check_solver
    raise ValueError("Solver %s supports only "
ValueError: Solver sag supports only dual=False, got dual=True

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/s

[CV] END log__C=1, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=1, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=lbfgs; total time=   0.0s
[CV] EN

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 446, in _check_solver
    raise ValueError("Solver %s supports only "
ValueError: Solver newton-cg supports only dual=False, got dual=True

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/pytho

[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fi

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "/home/evc

[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=sag; total time=   0.0s
[CV] END 

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got l1 penalty.

Traceback (most recent call last):
  File "/home/evc/Desktop

[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l2, log__solver=liblinear; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l2, log__solver=sag; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l2, log__solver=saga; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "/home/evc/Des

[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=saga; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=saga; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=saga; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=saga; total time=   0.0s
[CV] END log__C=1, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=none, log__solver=saga; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1358, in fit
    self.coef_, self.intercept_, n_iter_ = _fit_liblinear(
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/svm/_base.py", line 974, in _fit_liblinear
    solver_type = _get_liblinear_solver_type(multi_class, penalty, loss, dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/svm/_base.py", line 830, in _get_liblinear_solver_type
    raise 

[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=elasticnet, log__solver=sag; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=elasticnet, log__solver=sag; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=elasticnet, log__solver=sag; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=elasticnet, log__solver=saga; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=elasticnet, log__solver=saga; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=elasticnet, log__solver=saga; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=True, log__max_iter=10000, log__penalty=elasticnet, log__solver=saga; total time=   

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 446, in _check_solver
    raise ValueError("Solver %s supports only "
ValueError: Solver sag supports only dual=False, got dual=True

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/s

[CV] END log__C=10, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=True, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log_

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 446, in _check_solver
    raise ValueError("Solver %s supports only "
ValueError: Solver saga supports only dual=False, got dual=True

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/

[CV] END log__C=10, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=l2, log__solver=liblinear; total time=   0.0s
[CV] END log__C=10, log__d

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "/home/evc

[CV] END log__C=10, log__dual=False, log__fit_intercept=True, log__max_iter=10000, log__penalty=none, log__solver=saga; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l1, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=l1, log__solver=lbfgs; total time=   0.0s
[CV] EN

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

Traceback (most recent call last):
  File "/home/evc/Desktop/git

[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=newton-cg; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=elasticnet, log__solver=lbfgs; total time=   0.0s
[CV] END log__C=10, log__dual=False, log__fit_intercept=False, log__max_iter=10000, log__penalty=e

Traceback (most recent call last):
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/home/evc/Desktop/git/kaggle-titanic-template/env/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "/home/evc/Des

In [285]:
print_model_stats(cv_log)
model_name = 'log_000'
joblib_log = save_and_load_model(cv_log, model_name, x_submit)
score = round(joblib_log.score(x_test, y_test), 5)
print(f'Test set\'s score : {score:.5f}')

params = {
     'log__C': [1],
    'log__dual': [True],
    'log__fit_intercept': [True],
    'log__max_iter': [10000],
    'log__penalty': ['l2'],
    'log__solver': ['liblinear'] 
} 
CV's best accuracy : 0.78928
Test set's score : 0.77654
