Add Consensus operator #77

rhiever · 2016-02-03T13:43:47Z

Currently, the only way to combine the classifications of two classifier operators is through the combine_dfs() method, which only takes the classifications of one of the classifiers and throws out the other.

We should add a Consensus operator that allows an arbitrary number of DataFrames to be passed to it, and it uses various ensemble decision criteria (max, mean, majority, etc. -- this would be an evolvable parameter) to actually combine the DataFrame's classifications in some meaningful way.

The text was updated successfully, but these errors were encountered:

rhiever · 2016-02-03T13:45:20Z

It's not immediately clear to me how to create a pipeline operator that takes an arbitrary number of DataFrames. I think this would be a non-trivial task, at least looking at it at a high level. It may be necessary to implement multiple versions of the Consensus operator that take a varying number of DataFrames as input.

kadarakos · 2016-02-04T09:48:21Z

I'm not sure if this is helpful but in sklearn the VotingClassifier takes an arbitrary number of estimators as input and returns a single estimator.

http://scikit-learn.org/stable/auto_examples/ensemble/plot_voting_decision_regions.html#example-ensemble-plot-voting-decision-regions-py

bartleyn · 2016-02-04T16:52:35Z

I think the primary constraint is whether or not DEAP's PrimitiveSetTyped will allow for operators that take in iterables as parameters, no? I can dig around to see if there's something obvious we're missing.

rhiever · 2016-02-04T18:18:26Z

I'm pretty sure it would allow lists as an input, but how could we then make an easy-to-evolve list of pipelines to pass to it?

bartleyn · 2016-02-05T17:16:37Z

Would it be naive to implement two additional 'helper' operators alongside the consensus one, one of type [list, DataFrame] -> [list] and the other of type [DataFrame, DataFrame] -> [list]? It would balloon the number of operators, but might give us that flexibility.

rhiever · 2016-02-07T17:53:55Z

That seems to be one way to do it, but I think it would balloon the number of operators and make it more difficult for evolution to work with. I suspect the "best" way to do this is to add a bunch of Consensus operators with increasing numbers of DataFrames as input.

Alternatively, we can think of the GP population of pipelines as the ensemble and add an evolvable parameter to allow evolution to pick the best way to combine their classifications.

bartleyn · 2016-02-07T20:28:03Z

I suppose that we could reliably constrain the total number of pipelines being combined together, as it's not like we'll be combining hundreds of pipelines.

As for the ensemble approach, would it be as simple as adding a parameter, or would we have to roll our own version of eaSimple?

rhiever · 2016-02-07T20:36:17Z

Right. It's not 100% clear how large the ensembles should be.

The population ensemble approach would require a custom version of eaSimple
because the population is evaluated together. Probably worth looking into
learning classifier systems (specifically, Michigan style LCS) for
inspiration on that end.

In the near future, I think the former approach is more promising. Just
need to make sure that evolution can actually make use of those Consensus
operators.

On Sunday, February 7, 2016, Nathan notifications@github.com wrote:

I suppose that we could reliably constrain the total number of pipelines
being combined together, as it's not like we'll be combining hundreds of
pipelines.

As for the ensemble approach, would it be as simple as adding a parameter,
or would we have to roll our own version of eaSimple?

—
Reply to this email directly or view it on GitHub
#77 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

bartleyn · 2016-02-07T21:15:35Z

Agreed. I was actually thinking about a similar ensemble approach to see if it helps address overfitting even more than we have in #64, so I wonder if we should just implement a set number of Consensus operators for now and flesh out this ensemble approach elsewhere.

As for how to meaningfully combine the classifications in addition to what you mentioned, I bet we can take inspiration from some meta-learning algorithms like AdaBoost. I'll look into it.

rhiever · 2016-02-07T21:19:22Z

👍 Looking forward to seeing what we can do with this idea.

On Sunday, February 7, 2016, Nathan notifications@github.com wrote:

Agreed. I was actually thinking about a similar ensemble approach to see
if it helps address overfitting even more than we have in #64
#64, so I wonder if we should
just implement a set number of Consensus operators for now and flesh out
this ensemble approach elsewhere.

As for how to meaningfully combine the classifications, I bet we can take
inspiration from some meta-learning algorithms like AdaBoost. I'll look
into it.

—
Reply to this email directly or view it on GitHub
#77 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

rhiever · 2016-02-17T20:07:29Z

Hey @bartleyn, I wanted to check in to see how this issue is coming along. Want to video chat about it?

bartleyn · 2016-02-19T04:31:33Z

Sure, if you'd like. I've got most of the logic in for the Consensus pipeline operator(s), but I'm getting some major memory blowups (which I suppose was to be expected to some degree). I've taken the approach to allow for weighting each DataFrame by some metric (accuracy of the guesses, uniform weights, etc) before combining them with some evolvable method (max, mean, etc).

rhiever · 2016-02-19T12:53:50Z

Interesting. Is that even with Pareto optimization (as implemented in the
latest version of TPOT)? We should video chat to discuss what's going on.

On Thu, Feb 18, 2016 at 11:31 PM, Nathan notifications@github.com wrote:

Sure, if you'd like. I've got most of the logic in for the Consensus
pipeline operator(s), but I'm getting some major memory blowups (which was
to be expected to some degree). I've taken the approach to allow for
weighting each DataFrame by some metric (accuracy of the guesses, uniform
weights, etc) before combining them with some evolvable method (max, mean,
etc).

—
Reply to this email directly or view it on GitHub
#77 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

rhiever added the enhancement label Feb 3, 2016

rhiever added the being worked on label Feb 19, 2016

bartleyn mentioned this issue Feb 27, 2016

Add Consensus Operators #96

Closed

rhiever closed this as completed Apr 18, 2016

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this issue Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Consensus operator #77

Add Consensus operator #77

rhiever commented Feb 3, 2016

rhiever commented Feb 3, 2016

kadarakos commented Feb 4, 2016

bartleyn commented Feb 4, 2016

rhiever commented Feb 4, 2016

bartleyn commented Feb 5, 2016

rhiever commented Feb 7, 2016

bartleyn commented Feb 7, 2016

rhiever commented Feb 7, 2016

bartleyn commented Feb 7, 2016

rhiever commented Feb 7, 2016

rhiever commented Feb 17, 2016

bartleyn commented Feb 19, 2016

rhiever commented Feb 19, 2016

Add Consensus operator #77

Add Consensus operator #77

Comments

rhiever commented Feb 3, 2016

rhiever commented Feb 3, 2016

kadarakos commented Feb 4, 2016

bartleyn commented Feb 4, 2016

rhiever commented Feb 4, 2016

bartleyn commented Feb 5, 2016

rhiever commented Feb 7, 2016

bartleyn commented Feb 7, 2016

rhiever commented Feb 7, 2016

bartleyn commented Feb 7, 2016

rhiever commented Feb 7, 2016

rhiever commented Feb 17, 2016

bartleyn commented Feb 19, 2016

rhiever commented Feb 19, 2016