Question about claims in RECIPE paper #586

PGijsbers · 2017-10-05T13:36:28Z

This is just a question about the current state of TPOT, I hope that's fine.

I was reading a paper introducing another GP-based AutoML tool, RECIPE.
The paper is from March 2017, and also talks about TPOT.
Having worked on TPOT, some of the claims contradicted my findings, and I wanted to do a double check on them.
It think it probably is just the case that, since the paper was from March 2017, some of the claims are just true for older versions of TPOT.
They most likely used version 0.4-0.6. Perhaps the authors contacted you as a result of their findings and that's why things already seem fixed for me :)

The claims are as follows:

"*One of the major drawbacks of TPOT is that it can create ML pipelines that
are arbitrary/invalid, i.e., it can create a ML pipeline that fails to solve a clas-
sification problem, as there are no constraints in which type of components can
be combined. "
Furthmore, Fig.5 shows the number of invalid pipelines is sometimes as high as 30 out of 100.

Ways to create invalid pipelines still exist, for example using invalid hyperparameter combinations (eg. calling Logistic Regression with dual=True and penalty=L1), but this should be mostly mitigated by the pre_test decorator which evaluates such an individual on a small test set.
The number of invalid pipelines still existing should by no means be close to the number suggested in the paper. Was this simply the case in the old version? Or was this perhaps a misinterpretation of the bug that caused a high number of individuals (10~40%) to be duplicates?

"For example, TPOT can create a pipeline without a classification algorithm.*"

At the root of the GP tree, there needs to be an operator which performs classification. Classification algorithm operators are added twice here, of which once marked as root indicating that they are the only valid options for the final pipeline step.
Thus I believe that currently this is no longer the case.

I once again want to make clear that this is not an attack on the credibility of the authors.
I assume the findings they made were true at the time, for the version of TPOT they used.
However, I want to make sure I do understand the current version of TPOT, and that the issues have (mostly) been fixed.

The text was updated successfully, but these errors were encountered:

weixuanfu · 2017-10-05T13:53:09Z

@PG-TUe Thank you for these findings. I agree that RECIPE paper claims are true for old version of TPOT. I think the invalid hyperparameter combinations was fixed in the verison 0.5 (e.g. Logistic Regression ) even without pre_test decorator but not in the version 0.4.
Also the root of GP tree as you noticed was added since version 0.5 (here).

PGijsbers · 2017-10-05T13:57:29Z

Alright, thank you :) good to know for sure problems have been fixed

PGijsbers changed the title ~~Are claims made in this paper still true?~~ Question about claims in RECIPE paper Oct 5, 2017

weixuanfu added the question label Oct 5, 2017

PGijsbers closed this as completed Oct 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about claims in RECIPE paper #586

Question about claims in RECIPE paper #586

PGijsbers commented Oct 5, 2017

weixuanfu commented Oct 5, 2017 •

edited

PGijsbers commented Oct 5, 2017

Question about claims in RECIPE paper #586

Question about claims in RECIPE paper #586

Comments

PGijsbers commented Oct 5, 2017

weixuanfu commented Oct 5, 2017 • edited

PGijsbers commented Oct 5, 2017

weixuanfu commented Oct 5, 2017 •

edited