Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is creating an ensemble out of the TPOT population useful? #105

Open
rhiever opened this issue Mar 6, 2016 · 17 comments
Open

Is creating an ensemble out of the TPOT population useful? #105

rhiever opened this issue Mar 6, 2016 · 17 comments

Comments

@rhiever
Copy link
Contributor

rhiever commented Mar 6, 2016

One of the common arguments against population-based optimization methods is that they are significantly slower than methods that work with one (or a few) solutions at a time. I think one smart way to turn that argument on its head would be to see if creating an ensemble out of the TPOT population would be useful.

An initial exploration could be to run TPOT as normal, and collect additional statistics about the performance of the population as an ensemble. This could be done with a very "hacky" version of TPOT; no need to engineer it before we prove this idea's efficacy.

Basically, for every generation:

  1. Store the classifications of every individual

  2. Use various ensemble methods to combine their classifications into a single classification (min, max, threshold, majority, weighted based on performance on training set)

  3. Plot the effectiveness of all of these population ensemble methods over time

What to look for:

  • Does the population ensemble perform better than the absolute best individual (early on, later on, always)?
  • Does the population ensemble perform better as more generations pass?
  • What ensemble method(s) perform best?
@bartleyn
Copy link
Contributor

bartleyn commented Mar 7, 2016

Should we choose a couple/few data sets to test on, to try and create a more robust analysis? Which ones might be most appropriate? MNIST, wine, breast cancer?

@rhiever
Copy link
Contributor Author

rhiever commented Mar 7, 2016

If you run this script, you'll have access to a whole bunch of 'em. Take your pick. :-)

I think just one data set is fine to start with though, as a proof of concept.

@rhiever
Copy link
Contributor Author

rhiever commented Mar 12, 2016

Ping. Run into any issues with this?

@bartleyn
Copy link
Contributor

I've created my own version of the eaSimple algorithm where we can dig into the individual/ensemble statistics, but I've had difficulty in exposing and aggregating each individual pipeline's guesses. But in the last day or so I've broken through that, and have started getting numbers. I think I'm gonna spin up a cheap AWS instance and just run a ton of tests.

@rhiever
Copy link
Contributor Author

rhiever commented Mar 14, 2016

I don't think you'll need to roll your own version of eaSimple. Here's some code from another project I ran where you can store the population in the log and then do post-analysis on the population in the log.

stats = tools.Statistics(lambda ind: (int(ind.fitness.values[0]), round(ind.fitness.values[1], 2)))
stats.register("Minimum", np.min, axis=0)
stats.register("Maximum", np.max, axis=0)
# This should store a copy of pop every generation
stats.register("Population", lambda x: copy.deepcopy(pop))

# Use normal TPOT settings, of course -- not these settings
pop, log = algorithms.eaSimple(pop, toolbox, cxpb=0., mutpb=0.5, ngen=1000, 
                               stats=stats, halloffame=hof, verbose=False)

Let me know if that works. Alternatively, you can modify the HOF to store the top 100 best pipelines discovered so far, and change

stats.register("Population", lambda x: copy.deepcopy(pop))

to

stats.register("HOF", lambda x: copy.deepcopy(hof))

and that will only change the analysis slightly -- using the best 100 pipelines ever as the ensemble instead of the pipelines currently in the population.

@bartleyn
Copy link
Contributor

Interesting, I had convinced myself that the Statistics object wouldn't be able to give us access to the population directly. I'll test this out; it shouldn't change my analysis after the fact that much.

@bartleyn
Copy link
Contributor

Got it working with the statistics object, thanks for the tips. I'm gonna spin up these tests.

@rhiever
Copy link
Contributor Author

rhiever commented Mar 14, 2016

Great! 👍

@bartleyn
Copy link
Contributor

Some of the shorter tests are wrapping up and I think I have enough for some preliminary results -- I'll try to clean things up and link them here in the next couple of days.

@rhiever
Copy link
Contributor Author

rhiever commented Mar 20, 2016

Let me know if you want to schedule another video chat. I'm excited to hear how this turned out!

@bartleyn
Copy link
Contributor

Hey so I've cleaned up some of my data and made it available here. I've been trying to come up with useful visualizations, and figured it'd be more productive to share it in the meantime.

@rhiever
Copy link
Contributor Author

rhiever commented Mar 23, 2016

What are each of the new columns? I'm looking at the data this morning.

@bartleyn
Copy link
Contributor

Alright, so I took the same ideas from the consensus operators that we tried and applied them here. Each individual / population is evaluated on the test dataset.

Weights

acc_* – each individual's guess is weighted according to their individual accuracy.
uni_* – each individual's guess has the same weight.

Selection

*_max_class – the class that has the highest weight (or in the uni case, the highest frequency) is the ensemble's guess for that test instance.
*_mean_class – the class with the mean weight / frequency is the ensemble's guess
*_median_class – the class with the median weight / frequency is the ensemble's guess
*_min_class – the class with the minimum weight / frequency is the ensemble's guess
*_threshold_class – the first class that passes a certain threshold in percentage of weight is the ensemble's guess.

@bartleyn
Copy link
Contributor

I'm getting a lot of variance in a few of the columns, so I may run some more trials.

@rhiever
Copy link
Contributor Author

rhiever commented Mar 24, 2016

What benchmarks are you running it on? It looks like the classification accuracy for many of the runs are fairly high, so there probably isn't much room for ensembles to improve. What about a harder data set, e.g., GAMETES-hard? Maybe we should just run a large benchmark on the HPCC?

@bartleyn
Copy link
Contributor

You're right that the data I was using was perhaps too easy– I was using testing code that tested with the sklearn digits dataset, rather than MNIST! This is embarrassing to say the least. On the bright side, at least these tests suggest that the operators are somewhat robust in the smaller-data, slightly-longer, slightly-bigger population 'regime'.

In the interest of time, how about I'll run the same tests on random samples from the GAMETES-hard and MNIST proper to see if there's promise, and in the mean-time we can prep for a larger HPCC benchmark? I can run my tests in a more parallel manner so it's not a week turnaround.

@rhiever
Copy link
Contributor Author

rhiever commented Mar 24, 2016

Sounds good to me. Want to send in a PR on this branch?

I'm currently finishing up some other TPOT benchmarks -- shouldn't take
more than the weekend -- but I can slate this benchmark for the next batch.

On Thu, Mar 24, 2016 at 6:54 PM, Nathan notifications@github.com wrote:

You're right that the data I was using was perhaps too easy– I was using
testing code that tested with the sklearn digits dataset, rather than
MNIST! This is embarrassing to say the least. On the bright side, at least
these tests suggest that the operators are somewhat robust in the
smaller-data, slightly-longer, slightly-bigger population 'regime'.

In the interest of time, how about I'll run the same tests on random
samples from the GAMETES-hard and MNIST proper to see if there's promise,
and in the mean-time we can prep for a larger HPCC benchmark? I can run my
tests in a more parallel manner so it's not a week turnaround.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#105 (comment)

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants