Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stacking notes #20

Closed
JiaxiangBU opened this issue Jan 7, 2020 · 7 comments
Closed

stacking notes #20

JiaxiangBU opened this issue Jan 7, 2020 · 7 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@JiaxiangBU
Copy link
Owner

JiaxiangBU commented Jan 7, 2020

10.2 stacking
https://jiaxiangbu.github.io/learn_kaggle/learning_notes.html

在训练阶段,第一层模型,只要保证一个样本不被两层训练即可

@JiaxiangBU JiaxiangBU added enhancement New feature or request good first issue Good for newcomers labels Jan 7, 2020
@JiaxiangBU
Copy link
Owner Author

JiaxiangBU commented Jan 7, 2020

1、第一层的模型个数,

没有要求。

2、第二层模型能用集成模型吗,

最好不。参考华泰证券的研究报告,我回头发你看下。

3、判断融合模型是否过拟合,

正常的判断,训练集和测试集的评价指标

4、第一层模型调参

至少kfold的交叉验证。

@JiaxiangBU
Copy link
Owner Author

JiaxiangBU commented Jan 7, 2020

模型原理、样本分离都是可以处理的方式,更直接的是在 stacking 的时候使用,证明各个预测值相关性低。
这个可以查看华泰证券这篇研究报告,在之前凤凰金融我们用到过。

林晓明, 陈烨, and 李子钰. 2018. 人工智能选股之stacking集成学习. 华泰证券股份有限公司.

具体地见,https://jiaxiangbu.github.io/phoenix-finance/output/fcontest_output30.html

https://github.com/JiaxiangBU/tutoring/issues/54

@JiaxiangBU
Copy link
Owner Author

https://www.kaggle.com/lijiaxiang/stacking 这个我开源了。

@JiaxiangBU
Copy link
Owner Author

@JiaxiangBU
Copy link
Owner Author

https://jiaxiangbu.github.io/learn_fe/target_encoding_learning_notes.html#%E6%80%BB%E7%BB%93
stacking 的思路和 target encoding 非常类似,我这里举了一个例子,不正确的做 target encoding ,会把一个随机变量弄显著。

1 similar comment
@JiaxiangBU
Copy link
Owner Author

https://jiaxiangbu.github.io/learn_fe/target_encoding_learning_notes.html#%E6%80%BB%E7%BB%93
stacking 的思路和 target encoding 非常类似,我这里举了一个例子,不正确的做 target encoding ,会把一个随机变量弄显著。

@JiaxiangBU
Copy link
Owner Author

@Ricardo627721141 上次跟你说的 stacking 处理方式有些出入,正确的理解是
同一个训练集不要重复使用,是指的是不要再两层,同一层可以重复调用。
https://www.kaggle.com/lijiaxiang/stacking 这是一个 demo。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant