# Предсказание выживет или нет пассажир Титаника

**Описание задачи:** Обучить группу моделей (Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, MLPClassifier) с поиском их гиперпарамтров.

**Исходные данные:** датасет для обучающей выборки train_students.csv и даасет для тестовой выборки tast_students.csv

**Формат решения:**   
Модели  Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, MLPClassifier с точностью предсказания выше 80% 

### Оглавление
[1. Загрузка/изучение/предобработка данных](#id-section1_)  
[2. Строим модели обучения](#id-section2_)  
[2.1. RandomForestClassifier](#id-section2.1._)  
[2.2. DecisionTreeClassifier](#id-section2.2._)  
[2.3. GradientBoostingClassifier](#id-section2.3._)  
[2.4. Logistic Regression](#id-section2.4._)   
[2.5. MLPClassifier](#id-section2.5._)   
[3. Вывод](#id-section3_)

## Общие функции для работы 

In [6]:
import os
import sys
import random
import pandas as pd
import numpy as np

from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import recall_score, accuracy_score, f1_score, precision_score

from matplotlib import pyplot as plt

In [7]:
#фиксируем параметры псевдослучайного генератора
seed = 42
random.seed(seed)
np.random.seed(seed)

<div id='id-section1_'/> 

## 1. Загрузка/изучение/предобработка данных

In [8]:
train = pd.read_csv('train_students.csv')
test = pd.read_csv ('test_students.csv')

In [9]:
train


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C
1,440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31.00,0,0,C.A. 18723,10.5000,,S
2,841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20.00,0,0,SOTON/O2 3101287,7.9250,,S
3,721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6.00,0,1,248727,33.0000,,S
4,40,1,3,"Nicola-Yarred, Miss. Jamila",female,14.00,1,0,2651,11.2417,,C
...,...,...,...,...,...,...,...,...,...,...,...,...
708,640,0,3,"Thorneycroft, Mr. Percival",male,,1,0,376564,16.1000,,S
709,879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S
710,825,0,3,"Panula, Master. Urho Abraham",male,2.00,4,1,3101295,39.6875,,S
711,804,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C


In [10]:
train.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [11]:
#преобразуем категориальные переменные в числовые в train
train = pd.get_dummies(train,columns=["Embarked"])

In [12]:
train

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked_C,Embarked_Q,Embarked_S
0,710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,1,0,0
1,440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31.00,0,0,C.A. 18723,10.5000,,0,0,1
2,841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20.00,0,0,SOTON/O2 3101287,7.9250,,0,0,1
3,721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6.00,0,1,248727,33.0000,,0,0,1
4,40,1,3,"Nicola-Yarred, Miss. Jamila",female,14.00,1,0,2651,11.2417,,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
708,640,0,3,"Thorneycroft, Mr. Percival",male,,1,0,376564,16.1000,,0,0,1
709,879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,0,0,1
710,825,0,3,"Panula, Master. Urho Abraham",male,2.00,4,1,3101295,39.6875,,0,0,1
711,804,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,1,0,0


In [13]:
#преобразуем категориальные переменные в числовые в test
test = pd.get_dummies(test,columns=["Embarked"])

In [14]:
#Замена пола на численные значения
train['Sex'].replace({'male':0, 'female':1}, inplace=True)
test['Sex'].replace({'male':0, 'female':1}, inplace=True)

In [15]:
#разбиение на train и test
train_X = train.drop(columns=['Survived']) # создаем train_X без колонки survived 
train_y = train['Survived'] #создаем train_y c колонкой survived 

test_X = test.drop(columns=['Survived']) 
test_y = test['Survived']

In [16]:
#выбор колонок для использования в виде входных признаков 
features = ['Pclass','Age', 'Sex','SibSp', 'Parch', 'Fare', 'Embarked_C','Embarked_Q', 'Embarked_S']
train_X = train_X[features]
test_X = test_X[features]

In [17]:
# суммируем пропуски по колонкам в train
train_X.isna().sum() 

Pclass          0
Age           144
Sex             0
SibSp           0
Parch           0
Fare            0
Embarked_C      0
Embarked_Q      0
Embarked_S      0
dtype: int64

In [18]:
#ищем средний возраст пассажиров мужского пола 1-го класса в train
mask1 = (train_X['Pclass'] == 1) & (train_X['Sex']==0) 
avg_filler1 = train_X.loc[mask1, 'Age'].median()

In [19]:
#обновляем значеня в train
train_X.loc[train_X['Age'].isnull() & mask1, 'Age'] = avg_filler1

In [20]:
#ищем средний возраст пассажиров женского пола 1-го класса в train
mask1_1 = (train_X['Pclass'] == 1) & (train_X['Sex'] == 1) 
avg_filler1_1 = train_X.loc[mask1_1, 'Age'].median()

In [21]:
#обновляем значеня в train
train_X.loc[train_X['Age'].isnull() & mask1_1, 'Age'] = avg_filler1_1

In [22]:
#ищем средний возраст пассажиров мужского пола 2-го класса в train
mask2 = (train_X['Pclass'] == 2) & (train_X['Sex'] == 0)
avg_filler2 = train_X.loc[mask2, 'Age'].median()

In [23]:
#обновляем значеня в train
train_X.loc[train_X['Age'].isnull() & mask2, 'Age'] = avg_filler2

In [24]:
#ищем средний возраст пассажиров женского пола 2-го класса в train
mask2_1 = (train_X['Pclass'] == 2) & (train_X['Sex'] == 1)
avg_filler2_1 = train_X.loc[mask2_1, 'Age'].median()

In [25]:
#обновляем значеня в train
train_X.loc[train_X['Age'].isnull() & mask2_1, 'Age'] = avg_filler2_1

In [26]:
#ищем средний возраст пассажиров мужского пола 3-го класса в train
mask3 = (train_X['Pclass'] == 3) & (train_X['Sex']==0 )
avg_filler3 = train_X.loc[mask3, 'Age'].median()

In [27]:
#обновляем значеня в train
train_X.loc[train_X['Age'].isnull() & mask3, 'Age'] = avg_filler3

In [28]:
#ищем средний возраст пассажиров женского пола 3-го класса в train
mask3_1 = (train_X['Pclass'] == 3) & (train_X['Sex'] == 1)
avg_filler3_1 = train_X.loc[mask3_1, 'Age'].median()

In [29]:
#обновляем значеня в train
train_X.loc[train_X['Age'].isnull() & mask3_1, 'Age'] = avg_filler3_1

In [30]:
test_X.isna().sum()

Pclass         0
Age           33
Sex            0
SibSp          0
Parch          0
Fare           0
Embarked_C     0
Embarked_Q     0
Embarked_S     0
dtype: int64

In [31]:
#заполняем средний возраст пассажиров мужского пола 1-го класса в tast данными из train
mask1 = (test_X['Pclass'] == 1)& (test_X['Sex'] == 0)


In [32]:
#обновляем значеня в tast
test_X.loc[test_X['Age'].isnull() & mask1, 'Age'] = avg_filler1

In [33]:
#ищем средний возраст пассажиров женского пола 1-го класса в tast данными из train
mask1_1 = (test_X['Pclass'] == 1)& (test_X['Sex'] == 1)


In [34]:
#обновляем значеня в tast
test_X.loc[test_X['Age'].isnull() & mask1_1, 'Age'] = avg_filler1_1

In [35]:
#ищем средний возраст пассажиров мужского пола 2-го класса в tast данными из train
mask2 = (test_X['Pclass'] == 2)& (test_X['Sex'] == 0)


In [36]:
#обновляем значеня в tast
test_X.loc[test_X['Age'].isnull() & mask2, 'Age'] = avg_filler2

In [37]:
#ищем средний возраст пассажиров женского пола 2-го класса в tast данными из train
mask2_1 = (test_X['Pclass'] == 2)& (test_X['Sex'] == 1)


In [38]:
#обновляем значеня в tast
test_X.loc[test_X['Age'].isnull() & mask2_1, 'Age'] = avg_filler2_1

In [39]:
#ищем средний возраст пассажиров мужского пола 3-го класса в tast данными из train
mask3 = (test_X['Pclass'] == 3)& (test_X['Sex'] == 0)


In [40]:
#обновляем значеня в tast
test_X.loc[test_X['Age'].isnull() & mask3, 'Age'] = avg_filler3

In [41]:
#ищем средний возраст пассажиров женского пола 3-го класса в tast данными из train
mask3_1 = (test_X['Pclass'] == 3)& (test_X['Sex'] == 1)


In [42]:
#обновляем значеня в tast
test_X.loc[test_X['Age'].isnull() & mask3_1, 'Age'] = avg_filler3_1

In [43]:
test_X.isna().sum()

Pclass        0
Age           0
Sex           0
SibSp         0
Parch         0
Fare          0
Embarked_C    0
Embarked_Q    0
Embarked_S    0
dtype: int64

<div id='id-section2_'/>

## 2. Строим модели обучения

<div id='id-section2.1._'/>

### 2.1. RandomForestClassifier

In [44]:
#RandomForestClassifier
clf = RandomForestClassifier(random_state=seed, max_depth=13) #  с учетом параметров наилучшей модел
clf.fit(train_X, train_y)
preds_train = clf.predict(train_X) # predictions based on train_X 
preds_test = clf.predict(test_X) # predictions based on test_X

print('####### TRAIN RESULTS #######')
print('Recall: ', round(recall_score(train_y, preds_train), 2))
print('Accuracy: ', round(accuracy_score(train_y, preds_train), 2))
print('Precision: ', round(precision_score(train_y, preds_train), 2))
print('F1 score: ', round(f1_score(train_y, preds_train), 2))

print('####### TEST RESULTS #######')
print('Recall: ', round(recall_score(test_y, preds_test), 2))
print('Accuracy: ', round(accuracy_score(test_y, preds_test), 2))
print('Precision: ', round(precision_score(test_y, preds_test), 2))
print('F1 score: ', round(f1_score(test_y, preds_test), 2))
results = pd.DataFrame()
for max_depth in range(1, 100, 2): # строим несколько моделей и смотрим их результаты 
    clf = RandomForestClassifier(random_state=seed, max_depth=max_depth)
    clf.fit(train_X, train_y)
    preds_train = clf.predict(train_X)
    preds_test = clf.predict(test_X)
    
    results_dict = {
                    'max_depth':[max_depth],
                    'recall_train':[recall_score(train_y, preds_train)], 
                    'acc_train':[accuracy_score(train_y, preds_train)],
                    'prec_train':[precision_score(train_y, preds_train)],
                    'f1_train':[f1_score(train_y, preds_train)],
                    'recall_test':[recall_score(test_y, preds_test)],
                    'acc_test':[accuracy_score(test_y, preds_test)],
                    'prec_test':[precision_score(test_y, preds_test)],
                    'f1_test':[f1_score(test_y, preds_test)]
                   }
    
    results_df = pd.DataFrame.from_dict(results_dict)
    results = results.append(results_df)

results.reset_index(inplace=True, drop=True)

####### TRAIN RESULTS #######
Recall:  0.94
Accuracy:  0.97
Precision:  0.99
F1 score:  0.96
####### TEST RESULTS #######
Recall:  0.72
Accuracy:  0.81
Precision:  0.76
F1 score:  0.74


In [45]:
results[results.recall_test == results.recall_test.max()]

Unnamed: 0,max_depth,recall_train,acc_train,prec_train,f1_train,recall_test,acc_test,prec_test,f1_test
1,3,0.700361,0.820477,0.811715,0.751938,0.723077,0.825843,0.783333,0.752
3,7,0.808664,0.903226,0.933333,0.866538,0.723077,0.842697,0.824561,0.770492
4,9,0.873646,0.934081,0.952756,0.911488,0.723077,0.842697,0.824561,0.770492
5,11,0.913357,0.959327,0.98062,0.945794,0.723077,0.825843,0.783333,0.752
6,13,0.938628,0.97195,0.988593,0.962963,0.723077,0.814607,0.758065,0.740157


<div id='id-section2.2._'/>

### 2.2. DecisionTreeClassifier

In [46]:
#DecisionTreeClassifier
clf = DecisionTreeClassifier(max_depth=5, random_state=seed)
clf.fit(train_X, train_y)
preds_train = clf.predict(train_X) # predictions based on train_X 
preds_test = clf.predict(test_X) # predictions based on test_X

print('####### TRAIN RESULTS #######')
print('Recall: ', round(recall_score(train_y, preds_train), 2))
print('Accuracy: ', round(accuracy_score(train_y, preds_train), 2))
print('Precision: ', round(precision_score(train_y, preds_train), 2))
print('F1 score: ', round(f1_score(train_y, preds_train), 2))

print('####### TEST RESULTS #######')
print('Recall: ', round(recall_score(test_y, preds_test), 2))
print('Accuracy: ', round(accuracy_score(test_y, preds_test), 2))
print('Precision: ', round(precision_score(test_y, preds_test), 2))
print('F1 score: ', round(f1_score(test_y, preds_test), 2))
results = pd.DataFrame()
for max_depth in range(1, 100, 2):
    clf = DecisionTreeClassifier(random_state=seed, max_depth=max_depth)
    clf.fit(train_X, train_y)
    preds_train = clf.predict(train_X)
    preds_test = clf.predict(test_X)
    
    results_dict = {
                    'max_depth':[max_depth],
                    'recall_train':[recall_score(train_y, preds_train)], 
                    'acc_train':[accuracy_score(train_y, preds_train)],
                    'prec_train':[precision_score(train_y, preds_train)],
                    'f1_train':[f1_score(train_y, preds_train)],
                    'recall_test':[recall_score(test_y, preds_test)],
                    'acc_test':[accuracy_score(test_y, preds_test)],
                    'prec_test':[precision_score(test_y, preds_test)],
                    'f1_test':[f1_score(test_y, preds_test)]
                   }
    
    results_df = pd.DataFrame.from_dict(results_dict)
    results = results.append(results_df)

results.reset_index(inplace=True, drop=True)

####### TRAIN RESULTS #######
Recall:  0.79
Accuracy:  0.85
Precision:  0.81
F1 score:  0.8
####### TEST RESULTS #######
Recall:  0.8
Accuracy:  0.8
Precision:  0.7
F1 score:  0.75


In [47]:
results[results.recall_test == results.recall_test.max()]

Unnamed: 0,max_depth,recall_train,acc_train,prec_train,f1_train,recall_test,acc_test,prec_test,f1_test
2,5,0.790614,0.845722,0.808118,0.79927,0.8,0.803371,0.702703,0.748201


<div id='id-section2.3._'/>

### 2.3. GradientBoostingClassifier

In [48]:
# GradientBoostingClassifier
clf = GradientBoostingClassifier(random_state=seed, max_depth=5)  
clf.fit(train_X, train_y) 
preds_train = clf.predict(train_X) 
preds_test = clf.predict(test_X) 

print('####### TRAIN RESULTS #######')
print('Recall: ', round(recall_score(train_y, preds_train), 2))
print('Accuracy: ', round(accuracy_score(train_y, preds_train), 2))
print('Precision: ', round(precision_score(train_y, preds_train), 2))
print('F1 score: ', round(f1_score(train_y, preds_train), 2))

print('####### TEST RESULTS #######')
print('Recall: ', round(recall_score(test_y, preds_test), 2))
print('Accuracy: ', round(accuracy_score(test_y, preds_test), 2))
print('Precision: ', round(precision_score(test_y, preds_test), 2))
print('F1 score: ', round(f1_score(test_y, preds_test), 2))
results = pd.DataFrame()
for max_depth in range(1, 50, 2):
    clf = GradientBoostingClassifier(random_state=seed, max_depth=max_depth)
    clf.fit(train_X, train_y)
    preds_train = clf.predict(train_X)
    preds_test = clf.predict(test_X)
    
    results_dict = {
                    'max_depth':[max_depth],
                    'recall_train':[recall_score(train_y, preds_train)], 
                    'acc_train':[accuracy_score(train_y, preds_train)],
                    'prec_train':[precision_score(train_y, preds_train)],
                    'f1_train':[f1_score(train_y, preds_train)],
                    'recall_test':[recall_score(test_y, preds_test)],
                    'acc_test':[accuracy_score(test_y, preds_test)],
                    'prec_test':[precision_score(test_y, preds_test)],
                    'f1_test':[f1_score(test_y, preds_test)]
                   }
    
    results_df = pd.DataFrame.from_dict(results_dict)
    results = results.append(results_df)

results.reset_index(inplace=True, drop=True)


####### TRAIN RESULTS #######
Recall:  0.94
Accuracy:  0.97
Precision:  0.98
F1 score:  0.96
####### TEST RESULTS #######
Recall:  0.77
Accuracy:  0.82
Precision:  0.75
F1 score:  0.76


In [49]:
results[results.recall_test == results.recall_test.max()]

Unnamed: 0,max_depth,recall_train,acc_train,prec_train,f1_train,recall_test,acc_test,prec_test,f1_test
2,5,0.938628,0.967742,0.977444,0.957643,0.769231,0.820225,0.746269,0.757576


<div id='id-section2.4._'/>

### 2.4. Logistic Regression

In [50]:
#Logistic Regression
clf = LogisticRegression(random_state=seed, C=27, max_iter=500)  
clf.fit(train_X, train_y) 
preds_train = clf.predict(train_X) 
preds_test = clf.predict(test_X) 

print('####### РЕЗУЛЬТАТЫ ДЛЯ ОБУЧЕНИЯ #######')
print('Recall: ', round(recall_score(train_y, preds_train), 2))
print('Accuracy: ', round(accuracy_score(train_y, preds_train), 2))
print('Precision: ', round(precision_score(train_y, preds_train), 2))
print('F1 score: ', round(f1_score(train_y, preds_train), 2))

print('####### РЕЗУЛЬТАТЫ ДЛЯ ТЕСТА #######')
print('Recall: ', round(recall_score(test_y, preds_test), 2))
print('Accuracy: ', round(accuracy_score(test_y, preds_test), 2))
print('Precision: ', round(precision_score(test_y, preds_test), 2))
print('F1 score: ', round(f1_score(test_y, preds_test), 2))

results = pd.DataFrame()
for C_search in range(1, 100, 2):
    clf = LogisticRegression(random_state=seed, C=C_search, max_iter=1000)
    clf.fit(train_X, train_y)
    preds_train = clf.predict(train_X)
    preds_test = clf.predict(test_X)
    
    results_dict = {
                    'C_search':[C_search],
                    'recall_train':[recall_score(train_y, preds_train)], 
                    'acc_train':[accuracy_score(train_y, preds_train)],
                    'prec_train':[precision_score(train_y, preds_train)],
                    'f1_train':[f1_score(train_y, preds_train)],
                    'recall_test':[recall_score(test_y, preds_test)],
                    'acc_test':[accuracy_score(test_y, preds_test)],
                    'prec_test':[precision_score(test_y, preds_test)],
                    'f1_test':[f1_score(test_y, preds_test)]
                   }
    
    results_df = pd.DataFrame.from_dict(results_dict)
    results = results.append(results_df)

results.reset_index(inplace=True, drop=True)


####### РЕЗУЛЬТАТЫ ДЛЯ ОБУЧЕНИЯ #######
Recall:  0.7
Accuracy:  0.81
Precision:  0.79
F1 score:  0.74
####### РЕЗУЛЬТАТЫ ДЛЯ ТЕСТА #######
Recall:  0.71
Accuracy:  0.8
Precision:  0.73
F1 score:  0.72


In [51]:
results[results.recall_test == results.recall_test.max()]

Unnamed: 0,C_search,recall_train,acc_train,prec_train,f1_train,recall_test,acc_test,prec_test,f1_test
0,1,0.703971,0.810659,0.78629,0.742857,0.707692,0.792135,0.71875,0.713178
1,3,0.703971,0.810659,0.78629,0.742857,0.707692,0.797753,0.730159,0.71875
2,5,0.703971,0.810659,0.78629,0.742857,0.707692,0.797753,0.730159,0.71875
3,7,0.703971,0.810659,0.78629,0.742857,0.707692,0.797753,0.730159,0.71875
4,9,0.703971,0.810659,0.78629,0.742857,0.707692,0.797753,0.730159,0.71875
5,11,0.703971,0.810659,0.78629,0.742857,0.707692,0.797753,0.730159,0.71875
6,13,0.703971,0.809257,0.783133,0.741445,0.707692,0.797753,0.730159,0.71875
7,15,0.703971,0.810659,0.78629,0.742857,0.707692,0.797753,0.730159,0.71875
8,17,0.703971,0.810659,0.78629,0.742857,0.707692,0.797753,0.730159,0.71875
9,19,0.703971,0.810659,0.78629,0.742857,0.707692,0.797753,0.730159,0.71875


<div id='id-section2.5._'/>

### 2.5. MLPClassifier

In [52]:
#MLPClassifier
nn = MLPClassifier(random_state=seed, max_iter=10000, hidden_layer_sizes=(10,),)
nn.fit(train_X, train_y)
preds_train = nn.predict(train_X) # предсказания для train_X 
preds_test = nn.predict(test_X) # предсказания для test_X
print('####### РЕЗУЛЬТАТЫ ДЛЯ ОБУЧЕНИЯ #######')
#print('hidden_size: ', hidden_layer_sizes)
print('Recall: ', round(recall_score(train_y, preds_train), 2))
print('Accuracy: ', round(accuracy_score(train_y, preds_train), 2))
print('Precision: ', round(precision_score(train_y, preds_train), 2))
print('F1 score: ', round(f1_score(train_y, preds_train), 2))

print('####### РЕЗУЛЬТАТЫ ДЛЯ ТЕСТА #######')
print('Recall: ', round(recall_score(test_y, preds_test), 2))
print('Accuracy: ', round(accuracy_score(test_y, preds_test), 2))
print('Precision: ', round(precision_score(test_y, preds_test),2))
print('F1 score: ', round(f1_score(test_y, preds_test), 2))

results = pd.DataFrame()
for hidden_size in range(1, 15):
    nn = MLPClassifier(hidden_layer_sizes=(hidden_size, ), random_state=seed, max_iter=10000)
    nn.fit(train_X, train_y)
    preds_train = nn.predict(train_X)
    preds_test = nn.predict(test_X)
    
    results_dict = {
                    'hidden_size':[hidden_size],
                    'recall_train':[recall_score(train_y, preds_train)], 
                    'acc_train':[recall_score(train_y, preds_train)],
                    'prec_train':[precision_score(train_y, preds_train)],
                    'f1_train':[f1_score(train_y, preds_train)],
                    'recall_test':[recall_score(test_y, preds_test)],
                    'acc_test':[accuracy_score(test_y, preds_test)],
                    'prec_test':[precision_score(test_y, preds_test)],
                    'f1_test':[f1_score(test_y, preds_test)]
                   }
    
    results_df = pd.DataFrame.from_dict(results_dict)
    results = results.append(results_df)

results.reset_index(inplace=True, drop=True)

####### РЕЗУЛЬТАТЫ ДЛЯ ОБУЧЕНИЯ #######
Recall:  0.7
Accuracy:  0.79
Precision:  0.75
F1 score:  0.73
####### РЕЗУЛЬТАТЫ ДЛЯ ТЕСТА #######
Recall:  0.75
Accuracy:  0.82
Precision:  0.75
F1 score:  0.75


In [53]:
results[results.recall_test == results.recall_test.max()]

Unnamed: 0,hidden_size,recall_train,acc_train,prec_train,f1_train,recall_test,acc_test,prec_test,f1_test
9,10,0.700361,0.700361,0.751938,0.725234,0.753846,0.820225,0.753846,0.753846


<div id='id-section3_'/>

## 3. Вывод

Рейтинг результатов теста Accuracy для тестовой выборки:
1. Лучше всего себя проявили модели GradientBoostingClassifier и MLPClassifier. Точность на тестовой выборке 0.82
2. Точность в 0.81 показала модель RandomForestClassifitest
3. Модели DecisionTreeClassifier и Logistic Regression показали одинаковую точность - 0.8