Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim out data transformations operators that are downstream of the last classification step #70

Closed
dmarx opened this issue Jan 4, 2016 · 5 comments

Comments

@dmarx
Copy link

dmarx commented Jan 4, 2016

Sometimes the optimized pipeline will look something like this:

transformation -> transformation -> classification -> transformation

The last transformation step adds nothing. We should cleanup the pipeline by adding a post-processing step to tpot.fit that trims out unnecessary operators from the optimized pipeline. This will be trivial after incorporating the refactor in #63 as we could just add an attribute to the base classes to identify whether or not an operator can be the pipeline terminus. Something like:

class BasicOperator(object): 
        ...
        self._terminal_operator = False
        ...

class LearnerOperator(object): 
        ...
        self._terminal_operator = True
        ...

I felt it'd probably be better to create a new issue for this topic rather than unilaterally adding a commit downstream of the #63 HEAD.

@kadarakos
Copy link
Contributor

Could we solve this by having a multi-objective fitness function? Combined from:

  • reward accuracy/f-score
  • penalize the number of operations
  • penalize runtime

As far as I understand deap supports multi-objective optimization out-of-the-box.

@rhiever
Copy link
Contributor

rhiever commented Jan 6, 2016

I'm currently developing and testing a multi-objective version of TPOT.
Including various measures of model complexity as an axis to minimize does
indeed eliminate cases like this.

On Tuesday, January 5, 2016, kadarakos notifications@github.com wrote:

Could we solve this by having a multi-objective fitness function? Combined
from:

  • reward accuracy/f-score
  • penalize the number of operations
  • penalize runtime

As far as I understand deap supports multi-objective optimization
out-of-the-box.


Reply to this email directly or view it on GitHub
#70 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

@kadarakos
Copy link
Contributor

What kind of measures are you using for complexity?

@rhiever
Copy link
Contributor

rhiever commented Jan 6, 2016

The two you mentioned -- number of pipeline operators and runtime -- but
also the number of features in the pipeline. Interested to hear more ideas
if you have some.

On Wed, Jan 6, 2016 at 12:49 AM, kadarakos notifications@github.com wrote:

What kind of measures are you using for complexity?


Reply to this email directly or view it on GitHub
#70 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

@rhiever
Copy link
Contributor

rhiever commented Aug 13, 2016

This is now encapsulated in #206, so I'm going to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants