Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about claims in RECIPE paper #586

Closed
PGijsbers opened this issue Oct 5, 2017 · 2 comments
Closed

Question about claims in RECIPE paper #586

PGijsbers opened this issue Oct 5, 2017 · 2 comments
Labels

Comments

@PGijsbers
Copy link
Contributor

This is just a question about the current state of TPOT, I hope that's fine.

I was reading a paper introducing another GP-based AutoML tool, RECIPE.
The paper is from March 2017, and also talks about TPOT.
Having worked on TPOT, some of the claims contradicted my findings, and I wanted to do a double check on them.
It think it probably is just the case that, since the paper was from March 2017, some of the claims are just true for older versions of TPOT.
They most likely used version 0.4-0.6. Perhaps the authors contacted you as a result of their findings and that's why things already seem fixed for me :)

The claims are as follows:

  1. "*One of the major drawbacks of TPOT is that it can create ML pipelines that
    are arbitrary/invalid, i.e., it can create a ML pipeline that fails to solve a clas-
    sification problem, as there are no constraints in which type of components can
    be combined. "
    Furthmore, Fig.5 shows the number of invalid pipelines is sometimes as high as 30 out of 100.

Ways to create invalid pipelines still exist, for example using invalid hyperparameter combinations (eg. calling Logistic Regression with dual=True and penalty=L1), but this should be mostly mitigated by the pre_test decorator which evaluates such an individual on a small test set.
The number of invalid pipelines still existing should by no means be close to the number suggested in the paper. Was this simply the case in the old version? Or was this perhaps a misinterpretation of the bug that caused a high number of individuals (10~40%) to be duplicates?

  1. "For example, TPOT can create a pipeline without a classification algorithm.*"

At the root of the GP tree, there needs to be an operator which performs classification. Classification algorithm operators are added twice here, of which once marked as root indicating that they are the only valid options for the final pipeline step.
Thus I believe that currently this is no longer the case.

I once again want to make clear that this is not an attack on the credibility of the authors.
I assume the findings they made were true at the time, for the version of TPOT they used.
However, I want to make sure I do understand the current version of TPOT, and that the issues have (mostly) been fixed.

@PGijsbers PGijsbers changed the title Are claims made in this paper still true? Question about claims in RECIPE paper Oct 5, 2017
@weixuanfu
Copy link
Contributor

weixuanfu commented Oct 5, 2017

@PG-TUe Thank you for these findings. I agree that RECIPE paper claims are true for old version of TPOT. I think the invalid hyperparameter combinations was fixed in the verison 0.5 (e.g. Logistic Regression ) even without pre_test decorator but not in the version 0.4.
Also the root of GP tree as you noticed was added since version 0.5 (here).

@PGijsbers
Copy link
Contributor Author

Alright, thank you :) good to know for sure problems have been fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants