# <center>Композиции алгоритмов

Будем решать задачу кредитного скоринга.

#### Данные по кредитному скорингу представлены следующим образом:

##### Прогнозируемая  переменная
* SeriousDlqin2yrs	– наличие длительных просрочек выплат платежей за 2 года.

##### Независимые признаки
* age – возраст заёмщика (число полных лет);
* NumberOfTime30-59DaysPastDueNotWorse	– количество раз, когда заёмщик имел просрочку выплаты других кредитов 30-59 дней в течение последних двух лет;
* NumberOfTime60-89DaysPastDueNotWorse – количество раз, когда заёмщик имел просрочку выплаты других кредитов 60-89 дней в течение последних двух лет;
* NumberOfTimes90DaysLate – количество раз, когда заёмщик имел просрочку выплаты других кредитов более 90 дней;
* DebtRatio – ежемесячные отчисления на задолжености (кредиты, алименты и т.д.) / совокупный месячный доход;
* MonthlyIncome	– месячный доход в долларах;
* NumberOfDependents – число человек в семье кредитозаёмщика.

In [1]:
import numpy as np
import pandas as pd

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.metrics import roc_auc_score
from sklearn.decomposition import PCA

import random

import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
def impute_nan_with_median(table):
    for col in table.columns:
        table[col]= table[col].fillna(table[col].median())
    return table

In [3]:
data = pd.read_csv('credit_scoring_sample.csv', sep=";")
data.head()

Unnamed: 0,SeriousDlqin2yrs,age,NumberOfTime30-59DaysPastDueNotWorse,DebtRatio,NumberOfTimes90DaysLate,NumberOfTime60-89DaysPastDueNotWorse,MonthlyIncome,NumberOfDependents
0,0,64,0,0.249908,0,0,8158.0,0.0
1,0,58,0,3870.0,0,0,,0.0
2,0,41,0,0.456127,0,0,6666.0,0.0
3,0,43,0,0.00019,0,0,10500.0,2.0
4,1,49,0,0.27182,0,0,400.0,0.0


In [4]:
independent_columns_names = data.columns.values
independent_columns_names = [x for x in data if x != 'SeriousDlqin2yrs']
independent_columns_names

['age',
 'NumberOfTime30-59DaysPastDueNotWorse',
 'DebtRatio',
 'NumberOfTimes90DaysLate',
 'NumberOfTime60-89DaysPastDueNotWorse',
 'MonthlyIncome',
 'NumberOfDependents']

In [5]:
table = impute_nan_with_median(data)

In [6]:
X = table[independent_columns_names]
y = table['SeriousDlqin2yrs']

In [7]:
y.value_counts()

0    35037
1    10026
Name: SeriousDlqin2yrs, dtype: int64

Задайте решающее дерево, пользуясь встроенной функцией `DecisionTreeClassifier` с параметрами `random_state=17` и `class_weight='balanced'`.

In [8]:
model = DecisionTreeClassifier(random_state=17, class_weight='balanced')

Используйте функцию `GridSearchCV` для выбора оптимального набора гиперпараметров для указанной задачи. В качестве метрики качества возьмите ROC AUC.

In [9]:
max_depth_values = [3, 5, 6, 7, 9]
max_features_values = [4, 5, 6, 7]
tree_params = {'max_depth': max_depth_values,
               'max_features': max_features_values}

In [10]:
grid = GridSearchCV(model, tree_params, scoring='roc_auc')
grid.fit(X, y)

GridSearchCV(estimator=DecisionTreeClassifier(class_weight='balanced',
                                              random_state=17),
             param_grid={'max_depth': [3, 5, 6, 7, 9],
                         'max_features': [4, 5, 6, 7]},
             scoring='roc_auc')

In [11]:
grid.best_params_

{'max_depth': 6, 'max_features': 7}

In [12]:
dataOfResGrid = pd.DataFrame(grid.cv_results_)
dataOfResGrid.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_max_depth,param_max_features,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.029404,0.0004915384,0.003804,0.0004008576,3,4,"{'max_depth': 3, 'max_features': 4}",0.763372,0.776734,0.755287,0.769073,0.763516,0.765597,0.007095,20
1,0.031806,0.0003996621,0.003602,0.0004893369,3,5,"{'max_depth': 3, 'max_features': 5}",0.773655,0.785205,0.772289,0.777337,0.775345,0.776766,0.004545,17
2,0.036007,5.722046e-07,0.003403,0.0004900186,3,6,"{'max_depth': 3, 'max_features': 6}",0.773609,0.786241,0.771464,0.776643,0.775233,0.776638,0.005102,18
3,0.037555,0.0004582664,0.003402,0.0004899996,3,7,"{'max_depth': 3, 'max_features': 7}",0.773661,0.785173,0.771464,0.776643,0.775233,0.776435,0.004696,19
4,0.039681,0.001107365,0.004002,9.93379e-07,5,4,"{'max_depth': 5, 'max_features': 4}",0.806437,0.82515,0.804452,0.814051,0.813761,0.81277,0.007283,11


In [13]:
best_roc_auc = dataOfResGrid['mean_test_score'].max()
best_roc_auc

0.8196783763511417

Зафиксируйте кросс-валидацию с помощью функции `StratifiedKFold` на 5 разбиений с перемешиванием, `random_state=17`.

In [14]:
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=17)
skf.get_n_splits(X, y)

5

In [15]:
type(X.values[[0,1,2,3]])

numpy.ndarray

In [16]:
r_auc_best = 0
for train_index, test_index in skf.split(X, y):
    mdl = DecisionTreeClassifier(random_state=17, class_weight='balanced', max_depth=6, max_features=7)
    mdl.fit((X.values)[train_index], (y.values)[train_index])
    y_pred = mdl.predict((X.values)[test_index])
    r_auc = roc_auc_score((y.values)[test_index], y_pred)
    print(r_auc)
    if r_auc > r_auc_best:
        r_auc_best = r_auc

0.7598875955089446
0.7555734664480351
0.7503783070653888
0.7493027457045982
0.7414323474886354


Какое максимальное значение ROC AUC получилось?

In [17]:
r_auc_best

0.7598875955089446

# Реализация случайного леса

Теперь реализуйте случайный лес. В качестве базового алгоритма здесь всегда выступает `DecisionTreeClassifier`.

На каждом шаге алгоритма необходимо запоминать индексы признаков, которые участвовали в обучении леса.

- В методе `fit` в цикле (`i` от 0 до `n_estimators-1`) фиксируйте seed, равный (`random_state + i`). Это нужно для того, чтобы на каждой итерации seed был новый, при этом все значения можно было бы воспроизвести.
- Зафиксировав seed, сделайте bootstrap-выборку (т.е. с замещением) из множества id объектов. Размер bootstrap-выборки = размеру исходной.
- Зафиксировав seed, выберите **без замещения** `max_features` признаков, сохраните список выбранных id признаков в `self.feat_ids_by_tree`.
- Обучите дерево с теми же `max_depth`, `max_features` и `random_state`, что и у `RandomForestClassifierCustom` на выборке с нужным подмножеством объектов и признаков.
- В методе `predict_proba` у тестовой выборки нужно взять те признаки, на которых соответствующее дерево обучалось, и сделать прогноз вероятностей (`predict_proba` уже для дерева, вернуть вероятности класса 1). Метод должен вернуть усреднение прогнозов по всем деревьям.

In [47]:
from sklearn.base import BaseEstimator

class RandomForestClassifierCustom(BaseEstimator):
    def __init__(self, n_estimators=10, max_depth=3, max_features=5, random_state=17):
        self.n_estimators = n_estimators
        self.max_depth = max_depth
        self.max_features = max_features
        self.random_state = random_state
        # в данном списке будем хранить отдельные деревья
        self.trees = []
        # тут будем хранить списки индексов признаков, на которых обучалось каждое дерево 
        self.feat_ids_by_tree = []

        
    def fit(self, X, y):
        for i in range(self.n_estimators-1):
            seed = self.random_state + i
            
            indices = np.random.randint( 0, len(X), (len(X.columns), len(X)) )
            bootstrap = (X.values)[indices]
            
            col_list = list(range(len(X.columns)))
            random.shuffle(col_list)
            col_list = col_list[:self.max_features]
            self.feat_ids_by_tree.append(col_list)
            
            tree = DecisionTreeClassifier(max_depth=self.max_depth, max_features=self.max_features, random_state=self.random_state)
            tree.fit(bootstrap[col_list][0], y)
            self.trees.append(tree)
            
        
    def predict_proba(self, X):
        all_pred = []
        for i in range(self.n_estimators-1):
            test_data = (X.values)[(self.feat_ids_by_tree)[i]] 
            pred_proba = (self.trees[i]).predict_proba(test_data)
            all_pred.append(pred_proba)
            print((self.feat_ids_by_tree)[i])
        return all_pred #.mean()

Проведите кросс-валидацию. Какое получилось среднее значение ROC AUC на кросс-валидации? Сравните качество вашей реализации с реализацией `RandomForestClassifier` из `sklearn`. Аналогично предыдущему заданию, подберите гиперпараметры для случайного леса.

In [19]:
model = RandomForestClassifierCustom()
grid = GridSearchCV(model, tree_params, scoring='roc_auc')
grid.fit(X, y)

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
AttributeError: 'RandomForestClassifierCustom' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "c:\program files\python39\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    y_pred = 

GridSearchCV(estimator=RandomForestClassifierCustom(),
             param_grid={'max_depth': [3, 5, 6, 7, 9],
                         'max_features': [4, 5, 6, 7]},
             scoring='roc_auc')

In [20]:
grid.best_params_

{'max_depth': 3, 'max_features': 4}

In [21]:
dataOfResGrid = pd.DataFrame(grid.cv_results_)
dataOfResGrid.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_max_depth,param_max_features,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.271147,0.006854,0.002203,0.0004,3,4,"{'max_depth': 3, 'max_features': 4}",,,,,,,,1
1,0.337614,0.009291,0.001833,0.000333,3,5,"{'max_depth': 3, 'max_features': 5}",,,,,,,,18
2,0.369872,0.011701,0.002137,0.000452,3,6,"{'max_depth': 3, 'max_features': 6}",,,,,,,,17
3,0.39262,0.004846,0.00205,0.000285,3,7,"{'max_depth': 3, 'max_features': 7}",,,,,,,,16
4,0.362036,0.010343,0.00195,0.000131,5,4,"{'max_depth': 5, 'max_features': 4}",,,,,,,,15


In [22]:
best_roc_auc = dataOfResGrid['mean_test_score'].max()
best_roc_auc

nan

In [48]:
model = RandomForestClassifierCustom()
model.fit(X,y)

In [49]:
model.predict_proba(X)

[6, 0, 5, 1, 3]
[6, 2, 0, 3, 1]
[5, 0, 1, 2, 6]
[3, 2, 5, 0, 1]
[0, 1, 2, 6, 3]
[5, 2, 0, 1, 3]
[0, 6, 4, 3, 1]
[2, 6, 5, 4, 1]
[6, 4, 5, 1, 0]


[array([[0.77103929, 0.22896071],
        [0.78046709, 0.21953291],
        [0.78046709, 0.21953291],
        [0.78046709, 0.21953291],
        [0.78046709, 0.21953291]]),
 array([[0.77508103, 0.22491897],
        [0.77508103, 0.22491897],
        [0.77508103, 0.22491897],
        [0.77508103, 0.22491897],
        [0.78503337, 0.21496663]]),
 array([[0.77565809, 0.22434191],
        [0.77565809, 0.22434191],
        [0.78400223, 0.21599777],
        [0.77565809, 0.22434191],
        [0.77565809, 0.22434191]]),
 array([[0.78357438, 0.21642562],
        [0.77240597, 0.22759403],
        [0.78357438, 0.21642562],
        [0.78357438, 0.21642562],
        [0.77240597, 0.22759403]]),
 array([[0.77811473, 0.22188527],
        [0.75612672, 0.24387328],
        [0.77811473, 0.22188527],
        [0.77811473, 0.22188527],
        [0.77811473, 0.22188527]]),
 array([[0.77731126, 0.22268874],
        [0.77731126, 0.22268874],
        [0.77731126, 0.22268874],
        [0.77731126, 0.22268874],
    

In [54]:
l = X.values
l[0]

array([6.40000e+01, 0.00000e+00, 2.49908e-01, 0.00000e+00, 0.00000e+00,
       8.15800e+03, 0.00000e+00])

In [55]:
l[0,1,3]

IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

In [None]:
indices = np.random.randint( 0, len(X), (len(X.columns), len(X)) )
bootstrap = (X.values)[indices]

In [None]:
len(X), len(indices)

In [None]:
indices

In [None]:
bootstrap = X.values[indices]

In [None]:
bootstrap[0][0]

In [None]:
X.columns

In [None]:
bootstrap[:,[6, 5, 0, 1, 2]][0,0]

In [None]:
bootstrap[0,0]

In [None]:
(bootstrap[:, [0, 1, 2, 5, 6]])[0,0] 

In [None]:
l = [1,2,3,4,5]
l[:3]

In [None]:
col_list = list(range(len(X.columns)))
random.shuffle(col_list)

In [None]:
list(col_list1)

In [None]:
col_list