![](./img/chinahadoop.png)
# 风控实战项目 -- 电信行业风控建模
**[小象学院](http://www.chinahadoop.cn/course/landpage/15)《机器学习集训营》实战项目案例 by [@寒小阳](http://www.chinahadoop.cn/user/49339/about)**

## 各种模型交叉验证调参
提示：
- 不要一次给网格搜索太多候选参数，否则跑起来的速度非常慢
- 这是一个类别不均衡问题的建模，请在交叉验证时，尤其注意其中的分层抽样处理

In [None]:
import pandas as pd
import numpy as np
from datetime import date

import warnings
warnings.filterwarnings('ignore')

import lightgbm
import xgboost
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import f1_score, roc_auc_score, classification_report
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline

# 准备训练集和测试集
# your code here



# 各种模型建模
# 举例：这里是LightGBM

lgb1 = lightgbm.LGBMClassifier(random_state=2020)
lgb = lightgbm.LGBMClassifier(boosting_type='gbdt', 
          objective= 'binary',
          metric= 'auc',
          min_child_weight= 2,
          num_leaves = 2**5,
          lambda_l2= 10,
          subsample= 0.7,
          colsample_bytree= 0.5,
          colsample_bylevel= 0.5,
          learning_rate= 0.1,
          scale_pos_weight= 20,
          seed= 2020,
          nthread= 4,
          silent= True)

# 网格搜索+交叉验证调参
# your code here
# ...
lgb_y_proba = lgb_y_proba*0.55 + lgb_y_proba1*0.45
lgb_y_pred = (lgb_y_proba >= 0.5)*1

f1score = f1_score(y_train, lgb_y_pred)
aucscore = roc_auc_score(y_train, lgb_y_proba)
print('F1:', f1score,
      'AUC:', aucscore,
      'Score:', f1score*0.4 + aucscore*0.6)
print(classification_report(y_train, lgb_y_pred))
print("LGB:", sum(lgb_y_pred))

train_with_proba = train[['uid','label']]
train_with_proba['proba'] = lgb_y_proba
train_with_proba['pred'] = lgb_y_pred
train_with_proba.sort_values('proba', ascending=False, inplace=True)

# LGB
# F1: 0.763783510369 AUC: 0.95628093573 Score: 0.879281965585
# F1: 0.776717557252 AUC: 0.967378883283 Score: 0.891114352871
# F1: 0.772875058059 AUC: 0.967378883283 Score: 0.889577353193

"""
F1: 0.766902119072 AUC: 0.955587541677 Score: 0.880113372635
Num: 1082

(150) 0.847886
F1: 0.771614192904 AUC: 0.959340760619 Score: 0.884250133533
             precision    recall  f1-score   support

          0       0.97      0.92      0.94      4099
          1       0.70      0.86      0.77       900

avg / total       0.92      0.91      0.91      4999

1101
0    2392
1     608

(150) 0.851817
F1: 0.77878643096 AUC: 0.967494744498 Score: 0.892011419083
             precision    recall  f1-score   support

          0       0.97      0.94      0.95      5037
          1       0.72      0.85      0.78       963

avg / total       0.93      0.92      0.93      6000
"""

## xgboost/lightGBM的早停调参
提示与要求：
- 可以基于上面的交叉验证拿到的初步参数，切分验证集，对建模进行early stopping的最佳轮数学习
- 输出特征重要度，绘制出特征重要度图(可参考给到的模板)

## 模型集成与预测
提示与要求：
- 对各模型根据交叉验证效果，配以合适的权重进行集成(对预估概率做加权平均)

## 模型的持久化
提示与要求：
- 把模型持久化，即存储在目录子文件夹下，待后续使用