Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPOT one-line pipeline representation #101

Closed
jni opened this issue Mar 2, 2016 · 7 comments
Closed

TPOT one-line pipeline representation #101

jni opened this issue Mar 2, 2016 · 7 comments

Comments

@jni
Copy link

jni commented Mar 2, 2016

Just a quick (I hope) question. When I interrupt TPOT training, I get the following line output:

Best pipeline: _random_forest(ARG0, mul(41, mul(38, 5)), sub(3, 94))

But when I export the pipeline, I just get a simple RF with 500 estimators. Is this expected? What's the format of these one-line summaries?

@rhiever
Copy link
Contributor

rhiever commented Mar 2, 2016

That's simply because the most trees allowed in a TPOT RF is 500. So when
the mathematical expression evaluates to >500, we clamp it to 500.

On Wednesday, March 2, 2016, Juan Nunez-Iglesias notifications@github.com
wrote:

Just a quick (I hope) question. When I interrupt TPOT training, I get the
following line output:

Best pipeline: _random_forest(ARG0, mul(41, mul(38, 5)), sub(3, 94))

But when I export the pipeline, I just get a simple RF with 500
estimators. Is this expected? What's the format of these one-line summaries?


Reply to this email directly or view it on GitHub
#101.

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

@jni
Copy link
Author

jni commented Mar 2, 2016

Ah sure, so those expressions ((41 * (38 * 5), (3 - 94)?) evaluate to parameters to the sklearn RandomForestClassifier? Which parameters?

@rhiever
Copy link
Contributor

rhiever commented Mar 3, 2016

That's right. The first one is the # of trees, the second one is
max_features: https://github.com/rhiever/tpot/blob/master/tpot/tpot.py#L439

On Wed, Mar 2, 2016 at 6:18 PM, Juan Nunez-Iglesias <
notifications@github.com> wrote:

Ah sure, so those expressions ((41 * (38 * 5), (3 - 94)?) evaluate to
parameters to the sklearn RandomForestClassifier? Which parameters?


Reply to this email directly or view it on GitHub
#101 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

@jni
Copy link
Author

jni commented Mar 3, 2016

Ah, thanks for the source pointer, that clarifies things a lot!

But LOL it's a little bit depressing that TPOT came back to me with "use a random forest with as many trees as you can afford and automatic feature selection". =D

@jni jni closed this as completed Mar 3, 2016
@rhiever
Copy link
Contributor

rhiever commented Mar 3, 2016

How long did you run it for? Throwing a random forest at it usually comes out on top early on, depending on the data. Random forests became an industry standard for a long time for a reason. :-)

@jni
Copy link
Author

jni commented Mar 3, 2016

@rhiever in this case, I think it was like 30 generations (and the best hadn't changed for a big chunk of those).

My hope using TPOT was that it would find something that could match RF's accuracy but have better test-time speed. This result was with a toy dataset though, I'll keep fiddling. =)

@rhiever
Copy link
Contributor

rhiever commented Mar 3, 2016

Gotcha. I'm currently doing a comprehensive analysis of TPOT on about 180 data sets. It'll be interesting to see what comes out of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants