Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an abstract function for the classifier pipeline operators #43

Closed
rhiever opened this issue Dec 5, 2015 · 13 comments
Closed

Create an abstract function for the classifier pipeline operators #43

rhiever opened this issue Dec 5, 2015 · 13 comments

Comments

@rhiever
Copy link
Contributor

rhiever commented Dec 5, 2015

There's quite a bit of repeated code in the classifier functions. Abstract the common code to a single function then have the classifiers call that function and pass their custom parameters/model.

@rhiever rhiever changed the title Abstract classifier pipeline operators Abstract the classifier pipeline operators Dec 5, 2015
@rhiever rhiever changed the title Abstract the classifier pipeline operators Create an abstract function for the classifier pipeline operators Dec 5, 2015
@bartleyn
Copy link
Contributor

I've got a long plane ride tomorrow evening that I could use to knock this out, especially now that I'm more familiar with DEAP.

@rhiever
Copy link
Contributor Author

rhiever commented Dec 10, 2015

Awesome!

On Wednesday, December 9, 2015, Nathan notifications@github.com wrote:

I've got a long plane ride tomorrow evening that I could use to knock this
out, especially now that I'm more familiar with DEAP.


Reply to this email directly or view it on GitHub
#43 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

@dmarx
Copy link

dmarx commented Dec 17, 2015

I think a better approach would be to create a class for each operator, where the separate operators would inherit from base classes that define the expected API similar to how scikit-learn is organized. If done correctly, this would greatly simplify adding any arbitrary operator in the future, including empowering users to create customized operators. I'm thinking of a framework that would look something like this (non-working demo to roughly demonstrate the idea): https://gist.github.com/dmarx/7d72263a02b82cd276e1

@bartleyn
Copy link
Contributor

Agreed with the oo approach, as among other things it'll also avoid bloating the main TPOT class down the line. Just to flesh this idea out, where would you propose handling input validation? An abstract function that each operator implements?

@dmarx
Copy link

dmarx commented Dec 17, 2015

We could probably just attach that to the base class. In the gist, I included inputtypes as an input parameter to the base class with the intention of using that to pass the appropriate values to pset.addPrimitive. I imagine we could add a simple input validation function to the base class as well that would access these values. Probably wouldn't need to be overridden in most cases.

Personally, I'm more concerned about what the best way would be to tie instances of these "operator" objects with deap pipeline nodes (i.e. what we're currently referring to as "operators" in the tpot code). I'm new to deap (and only just started looking at tpot today as well) so that might actually be a fairly trivial consideration.

Do we even need a separate input validation function for each class? Or does deap.gp.PrimitiveSetTyped already handle input validation for us?

@bartleyn
Copy link
Contributor

I should specify that I mean validating that the model parameters and input dataframe are usable (e.g., non-negative parameters and a minimum number of columns in the dataframe), rather than just type validation, which is the only thing I think addPrimitive provides.

As for tying together the instances of operator objects to the pipeline, that's a good question. Perhaps the tpot object would have a function that wraps addPrimitive and allows the user to add additional operators to a default set of operators. As far as I know, all we would really need to do is come up with the 'contract' (in the functional programming sense) for each operator in order to add it to the deap pipeline.

@dmarx
Copy link

dmarx commented Dec 17, 2015

Ok, I see. An abstract function sounds like a good approach. You're right about treating this as a "contract" as well. I was starting to get into attaching learned hyperparameters to operator instances (which would move towards satisfying issue #11) but that might be getting ahead of ourselves. Just abstracting out the "contract" would significantly simplify the code for the TPOT class, and is really all I was doing in the demo I linked anyway.

In fact, while we're keeping the scope constrained to the "contract" here, maybe it'd be simpler to just let each respective model's "fit" method be responsible for input validation.

@dmarx
Copy link

dmarx commented Dec 18, 2015

I've started playing with this in my fork: https://github.com/dmarx/tpot/tree/modularize_operators

This is obviously going to be a significant refactoring of the project so I'd appreciate the feedback of anyone who cares enough to poke around (here's lookin at you, @rhiever) as I'd rather not make significant unilateral decisions on a project I haven't previously been involved with.

I've managed to get it working for one operator (random forest), shouldn't be too hard to port the rest over now that I'm past that hurdle.

@rhiever
Copy link
Contributor Author

rhiever commented Dec 18, 2015

Oooohh, this is interesting @dmarx! I'm poking around in your fork now to see how things work. I like how this could also allow us to tie the export() code to the individual objects rather than having a giant single function as we do now.

wrt input validation: DEAP doesn't currently perform input validation. All PrimitiveSetTyped does is ensure that the right type is passed, but doesn't perform any validation such as ensuring that the integer is positive.

@rhiever
Copy link
Contributor Author

rhiever commented Dec 18, 2015

Also: I'll note that I'm currently trying to nail down a semi-stable version of TPOT for the next conference paper, so I probably won't pursue a major refactor until ~late January. But that doesn't mean we can't develop it in a fork in the meantime.

@rhiever
Copy link
Contributor Author

rhiever commented Dec 18, 2015

@dmarx, I'm 👍 on this refactor. It looks fantastic! How much do you think it's going to be able to reduce the code size of export()?

@dmarx
Copy link

dmarx commented Dec 18, 2015

@rhiever quite a bit. My goal is to push all code generation particular to operators to two loops: one that constructs the import statement and one that constructs the pipeline. So basically everything below the comment # Replace the TPOT functions with their corresponding Python code is going to go, as is the block that constructs the import statements. Call it a reduction of about 250 lines (for that one function. Much more for the TPOT class in general)? After I'm done the function should look something like this: https://gist.github.com/dmarx/a35a94cb0e42b3cb7811

@rhiever
Copy link
Contributor Author

rhiever commented Dec 20, 2015

#62 is now merged if you want to merge that into your fork, @dmarx.

Pretty stoked to see this refactor!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants