You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is just a question about the current state of TPOT, I hope that's fine.
I was reading a paper introducing another GP-based AutoML tool, RECIPE.
The paper is from March 2017, and also talks about TPOT.
Having worked on TPOT, some of the claims contradicted my findings, and I wanted to do a double check on them. It think it probably is just the case that, since the paper was from March 2017, some of the claims are just true for older versions of TPOT.
They most likely used version 0.4-0.6. Perhaps the authors contacted you as a result of their findings and that's why things already seem fixed for me :)
The claims are as follows:
"*One of the major drawbacks of TPOT is that it can create ML pipelines that
are arbitrary/invalid, i.e., it can create a ML pipeline that fails to solve a clas-
sification problem, as there are no constraints in which type of components can
be combined. "
Furthmore, Fig.5 shows the number of invalid pipelines is sometimes as high as 30 out of 100.
Ways to create invalid pipelines still exist, for example using invalid hyperparameter combinations (eg. calling Logistic Regression with dual=True and penalty=L1), but this should be mostly mitigated by the pre_test decorator which evaluates such an individual on a small test set.
The number of invalid pipelines still existing should by no means be close to the number suggested in the paper. Was this simply the case in the old version? Or was this perhaps a misinterpretation of the bug that caused a high number of individuals (10~40%) to be duplicates?
"For example, TPOT can create a pipeline without a classification algorithm.*"
At the root of the GP tree, there needs to be an operator which performs classification. Classification algorithm operators are added twice here, of which once marked as root indicating that they are the only valid options for the final pipeline step.
Thus I believe that currently this is no longer the case.
I once again want to make clear that this is not an attack on the credibility of the authors.
I assume the findings they made were true at the time, for the version of TPOT they used.
However, I want to make sure I do understand the current version of TPOT, and that the issues have (mostly) been fixed.
The text was updated successfully, but these errors were encountered:
PGijsbers
changed the title
Are claims made in this paper still true?
Question about claims in RECIPE paper
Oct 5, 2017
@PG-TUe Thank you for these findings. I agree that RECIPE paper claims are true for old version of TPOT. I think the invalid hyperparameter combinations was fixed in the verison 0.5 (e.g. Logistic Regression ) even without pre_test decorator but not in the version 0.4.
Also the root of GP tree as you noticed was added since version 0.5 (here).
This is just a question about the current state of TPOT, I hope that's fine.
I was reading a paper introducing another GP-based AutoML tool, RECIPE.
The paper is from March 2017, and also talks about TPOT.
Having worked on TPOT, some of the claims contradicted my findings, and I wanted to do a double check on them.
It think it probably is just the case that, since the paper was from March 2017, some of the claims are just true for older versions of TPOT.
They most likely used version 0.4-0.6. Perhaps the authors contacted you as a result of their findings and that's why things already seem fixed for me :)
The claims are as follows:
are arbitrary/invalid, i.e., it can create a ML pipeline that fails to solve a clas-
sification problem, as there are no constraints in which type of components can
be combined. "
Furthmore, Fig.5 shows the number of invalid pipelines is sometimes as high as 30 out of 100.
Ways to create invalid pipelines still exist, for example using invalid hyperparameter combinations (eg. calling Logistic Regression with
dual=True
andpenalty=L1
), but this should be mostly mitigated by thepre_test
decorator which evaluates such an individual on a small test set.The number of invalid pipelines still existing should by no means be close to the number suggested in the paper. Was this simply the case in the old version? Or was this perhaps a misinterpretation of the bug that caused a high number of individuals (10~40%) to be duplicates?
At the root of the GP tree, there needs to be an operator which performs classification. Classification algorithm operators are added twice here, of which once marked as root indicating that they are the only valid options for the final pipeline step.
Thus I believe that currently this is no longer the case.
I once again want to make clear that this is not an attack on the credibility of the authors.
I assume the findings they made were true at the time, for the version of TPOT they used.
However, I want to make sure I do understand the current version of TPOT, and that the issues have (mostly) been fixed.
The text was updated successfully, but these errors were encountered: