<center>
<img src="../../img/ods_stickers.jpg">
## Открытый курс по машинному обучению. Сессия № 2
Автор материала: программист-исследователь Mail.ru Group, старший преподаватель Факультета Компьютерных Наук ВШЭ Юрий Кашницкий. Материал распространяется на условиях лицензии [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Можно использовать в любых целях (редактировать, поправлять и брать за основу), кроме коммерческих, но с обязательным упоминанием автора материала.

# <center>Тема 10. Бустинг
## <center>Часть 8. Оценка результатов Xgboost

## Загрузка бибилиотек

In [1]:
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

## Загрузка и подготовка данных

Посмотрим на примере данных по оттоку клиентов из телеком-компании.

In [2]:
df = pd.read_csv("../../data/telecom_churn.csv")

In [3]:
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,No,No,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,Yes,No,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,Yes,No,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


**Штаты просто занумеруем, а признаки International plan (наличие международного роуминга), Voice mail plan (наличие голосовой почтыы) и целевой Churn сделаем бинарными.**

In [4]:
state_enc = LabelEncoder()
df["State"] = state_enc.fit_transform(df["State"])
df["International plan"] = (df["International plan"] == "Yes").astype("int")
df["Voice mail plan"] = (df["Voice mail plan"] == "Yes").astype("int")
df["Churn"] = (df["Churn"]).astype("int")

In [5]:
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,16,128,415,0,1,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,0
1,35,107,415,0,1,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,0
2,31,137,415,0,0,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,0
3,35,84,408,1,0,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,0
4,36,75,415,1,0,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,0


**Разделим данные на обучающую и тестовую выборки в отношении 7:3. Создадим соотв. объекты DMAtrix.**

In [6]:
X_train, X_test, y_train, y_test = train_test_split(
    df.drop("Churn", axis=1), df["Churn"], test_size=0.3, random_state=42
)
dtrain = xgb.DMatrix(X_train, y_train)
dtest = xgb.DMatrix(X_test, y_test)

**Зададим параметры Xgboost.**

In [7]:
params = {"objective": "binary:logistic", "max_depth": 3, "silent": 1, "eta": 0.5}

num_rounds = 10

**Будем отслеживать качество модели и на обучающей выборке, и на валидационной.**

In [8]:
watchlist = [(dtest, "test"), (dtrain, "train")]

## Использование встроенных метрик 
В Xgboost реализованы большинство популярных метрик для классификации, регрессии и ранжирования:

- `rmse` - [root mean square error](https://www.wikiwand.com/en/Root-mean-square_deviation)
- `mae` - [mean absolute error](https://en.wikipedia.org/wiki/Mean_absolute_error?oldformat=true)
- `logloss` - [negative log-likelihood](https://en.wikipedia.org/wiki/Likelihood_function?oldformat=true)
- `error` (по умолчанию) - доля ошибок в бинарной классификации
- `merror` - доля ошибок в классификации на несколько классов
- `auc` - [area under curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic?oldformat=true)
- `ndcg` - [normalized discounted cumulative gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain?oldformat=true)
- `map` - [mean average precision](https://en.wikipedia.org/wiki/Information_retrieval?oldformat=true)

In [9]:
xgb_model = xgb.train(params, dtrain, num_rounds, watchlist)

Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


[0]	test-logloss:0.43152	train-logloss:0.42606
[1]	test-logloss:0.32608	train-logloss:0.31925
[2]	test-logloss:0.26807	train-logloss:0.26154
[3]	test-logloss:0.23736	train-logloss:0.23201
[4]	test-logloss:0.21130	train-logloss:0.20370
[5]	test-logloss:0.19646	train-logloss:0.19084
[6]	test-logloss:0.18922	train-logloss:0.18075
[7]	test-logloss:0.18332	train-logloss:0.17447
[8]	test-logloss:0.18060	train-logloss:0.16928
[9]	test-logloss:0.17848	train-logloss:0.16587




**Чтоб отслеживать log_loss, просто добавим ее в словарь params.**

In [10]:
params["eval_metric"] = "logloss"
xgb_model = xgb.train(params, dtrain, num_rounds, watchlist)

Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


[0]	test-logloss:0.43152	train-logloss:0.42606
[1]	test-logloss:0.32608	train-logloss:0.31925
[2]	test-logloss:0.26807	train-logloss:0.26154
[3]	test-logloss:0.23736	train-logloss:0.23201
[4]	test-logloss:0.21130	train-logloss:0.20370
[5]	test-logloss:0.19646	train-logloss:0.19084
[6]	test-logloss:0.18922	train-logloss:0.18075
[7]	test-logloss:0.18332	train-logloss:0.17447
[8]	test-logloss:0.18060	train-logloss:0.16928
[9]	test-logloss:0.17848	train-logloss:0.16587


**Можно отслеживать сразу несколько метрик.**

In [13]:
params

{'objective': 'binary:logistic',
 'max_depth': 3,
 'silent': 1,
 'eta': 0.5,
 'eval_metric': ['logloss', 'auc']}

In [11]:
params["eval_metric"] = ["logloss", "auc"]
xgb_model = xgb.train(params, dtrain, num_rounds, watchlist)

Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


[0]	test-logloss:0.43152	test-auc:0.83111	train-logloss:0.42606	train-auc:0.83474
[1]	test-logloss:0.32608	test-auc:0.89705	train-logloss:0.31925	train-auc:0.88842
[2]	test-logloss:0.26807	test-auc:0.90229	train-logloss:0.26154	train-auc:0.89562
[3]	test-logloss:0.23736	test-auc:0.91246	train-logloss:0.23201	train-auc:0.90125
[4]	test-logloss:0.21130	test-auc:0.91926	train-logloss:0.20370	train-auc:0.90869
[5]	test-logloss:0.19646	test-auc:0.92112	train-logloss:0.19084	train-auc:0.91147
[6]	test-logloss:0.18922	test-auc:0.92284	train-logloss:0.18075	train-auc:0.91454
[7]	test-logloss:0.18332	test-auc:0.92410	train-logloss:0.17447	train-auc:0.91686
[8]	test-logloss:0.18060	test-a

## Создание собственной метрики качества

**Чтобы создать свою метрику качества, достаточно определить функцию, принимающую 2 аргумента: вектор предсказанных вероятностей и объект `DMatrix` с истинными метками.  
В этом примере функция вернет просто число объектов, на которых классификатор ошибся, когда относил к классу 1 при превышении предсказанной вероятности класса 1 порога 0.5. 
Далее передаем эту функцию в xgb.train (параметр feval), если метрика тем лучше, чем меньше, надо дополнительно указать `maximize=False`.**


In [14]:
# custom evaluation metric
def misclassified(pred_probs, dmatrix):
    labels = dmatrix.get_label()  # obtain true labels
    preds = pred_probs > 0.5  # obtain predicted values
    return "misclassified", np.sum(labels != preds)

In [15]:
xgb_model = xgb.train(
    params, dtrain, num_rounds, watchlist, feval=misclassified, maximize=False
)

Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


[0]	test-logloss:0.43152	test-auc:0.83111	test-misclassified:99.00000	train-logloss:0.42606	train-auc:0.83474	train-misclassified:216.00000
[1]	test-logloss:0.32608	test-auc:0.89705	test-misclassified:108.00000	train-logloss:0.31925	train-auc:0.88842	train-misclassified:228.00000
[2]	test-logloss:0.26807	test-auc:0.90229	test-misclassified:86.00000	train-logloss:0.26154	train-auc:0.89562	train-misclassified:179.00000
[3]	test-logloss:0.23736	test-auc:0.91246	test-misclassified:87.00000	train-logloss:0.23201	train-auc:0.90125	train-misclassified:177.00000
[4]	test-logloss:0.21130	test-auc:0.91926	test-misclassified:77.00000	train-logloss:0.20370	train-auc:0.90869	train-misclassif



**С помощью параметра evals_result можно сохранить значения метрик по итерациям.**

In [16]:
evals_result = {}
xgb_model = xgb.train(
    params,
    dtrain,
    num_rounds,
    watchlist,
    feval=misclassified,
    maximize=False,
    evals_result=evals_result,
)

Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


[0]	test-logloss:0.43152	test-auc:0.83111	test-misclassified:99.00000	train-logloss:0.42606	train-auc:0.83474	train-misclassified:216.00000
[1]	test-logloss:0.32608	test-auc:0.89705	test-misclassified:108.00000	train-logloss:0.31925	train-auc:0.88842	train-misclassified:228.00000
[2]	test-logloss:0.26807	test-auc:0.90229	test-misclassified:86.00000	train-logloss:0.26154	train-auc:0.89562	train-misclassified:179.00000
[3]	test-logloss:0.23736	test-auc:0.91246	test-misclassified:87.00000	train-logloss:0.23201	train-auc:0.90125	train-misclassified:177.00000
[4]	test-logloss:0.21130	test-auc:0.91926	test-misclassified:77.00000	train-logloss:0.20370	train-auc:0.90869	train-misclassif

In [17]:
evals_result

{'test': OrderedDict([('logloss',
               [0.4315233405232429,
                0.3260817745625973,
                0.2680735386013985,
                0.2373635537661612,
                0.21130389650538564,
                0.19646011152490975,
                0.18922280770353972,
                0.18332423341833054,
                0.18059794115647673,
                0.17848075662180782]),
              ('auc',
               [0.8311070493100832,
                0.8970469437213895,
                0.9022855790650423,
                0.9124609346312964,
                0.9192581047890266,
                0.921122634658224,
                0.9228362069668954,
                0.9241009865280577,
                0.9342885818965165,
                0.9348312131275959]),
              ('misclassified',
               [99.0,
                108.0,
                86.0,
                87.0,
                77.0,
                67.0,
                72.0,
                65.0,
      

## Ранняя остановка
**Ранняя остановка используется для того, чтобы прекратить обучение модели, если ошибка за несколько итераций не уменьшилась.**

In [18]:
params["eval_metric"] = "error"
num_rounds = 1500

xgb_model = xgb.train(params, dtrain, num_rounds, watchlist, early_stopping_rounds=10)

Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


[0]	test-error:0.10000	train-error:0.09130
[1]	test-error:0.09000	train-error:0.08830
[2]	test-error:0.07300	train-error:0.06772
[3]	test-error:0.06700	train-error:0.06044
[4]	test-error:0.05800	train-error:0.04629
[5]	test-error:0.05600	train-error:0.04972
[6]	test-error:0.05700	train-error:0.04629
[7]	test-error:0.05600	train-error:0.04329
[8]	test-error:0.05200	train-error:0.04415
[9]	test-error:0.05300	train-error:0.04243
[10]	test-error:0.05700	train-error:0.04201
[11]	test-error:0.05500	train-error:0.04158
[12]	test-error:0.05200	train-error:0.04029
[13]	test-error:0.05400	train-error:0.03943
[14]	test-error:0.05100	train-error:0.03986
[15]	test-error:0.05400	train-error:0

In [19]:
params["eval_metric"]

'error'

In [21]:
print("Booster best train score: {}".format(xgb_model.best_score))
print("Booster best iteration: {}".format(xgb_model.best_iteration))

Booster best train score: 0.00085726532361766
Booster best iteration: 122


## Кросс-валидация с Xgboost
**Продемонстрируем функцию xgboost.cv.**

In [22]:
num_rounds = 10
hist = xgb.cv(params, dtrain, num_rounds, nfold=10, metrics={"error"}, seed=42)
hist

Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "silent" } might not be used.

  This could be 

Unnamed: 0,train-error-mean,train-error-std,test-error-mean,test-error-std
0,0.095156,0.006252,0.101587,0.016502
1,0.087727,0.004761,0.093436,0.013484
2,0.072486,0.006521,0.087871,0.013049
3,0.055294,0.00447,0.069,0.01166
4,0.047578,0.004258,0.060012,0.010868
5,0.046911,0.003211,0.057876,0.010774
6,0.042721,0.003302,0.055297,0.008692
7,0.041196,0.003075,0.056152,0.010409
8,0.039768,0.002707,0.05701,0.009981
9,0.038768,0.002038,0.054008,0.009813


Замечания:

- по умолчанию на выходе DataFrame (можно поменять параметр `as_pandas`),
- метрики передатся как параметр (можно и несколько),
- можно использовать и свои метрики (параметры `feval` и `maximize`),
- можно также использовать раннюю остановку ( `early_stopping_rounds`)