Create an abstract function for the classifier pipeline operators #43

rhiever · 2015-12-05T01:22:24Z

There's quite a bit of repeated code in the classifier functions. Abstract the common code to a single function then have the classifiers call that function and pass their custom parameters/model.

bartleyn · 2015-12-10T04:46:01Z

I've got a long plane ride tomorrow evening that I could use to knock this out, especially now that I'm more familiar with DEAP.

rhiever · 2015-12-10T14:14:09Z

Awesome!

On Wednesday, December 9, 2015, Nathan notifications@github.com wrote:

I've got a long plane ride tomorrow evening that I could use to knock this
out, especially now that I'm more familiar with DEAP.

—
Reply to this email directly or view it on GitHub
#43 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

dmarx · 2015-12-17T20:46:40Z

I think a better approach would be to create a class for each operator, where the separate operators would inherit from base classes that define the expected API similar to how scikit-learn is organized. If done correctly, this would greatly simplify adding any arbitrary operator in the future, including empowering users to create customized operators. I'm thinking of a framework that would look something like this (non-working demo to roughly demonstrate the idea): https://gist.github.com/dmarx/7d72263a02b82cd276e1

bartleyn · 2015-12-17T20:59:42Z

Agreed with the oo approach, as among other things it'll also avoid bloating the main TPOT class down the line. Just to flesh this idea out, where would you propose handling input validation? An abstract function that each operator implements?

dmarx · 2015-12-17T21:10:47Z

We could probably just attach that to the base class. In the gist, I included inputtypes as an input parameter to the base class with the intention of using that to pass the appropriate values to pset.addPrimitive. I imagine we could add a simple input validation function to the base class as well that would access these values. Probably wouldn't need to be overridden in most cases.

Personally, I'm more concerned about what the best way would be to tie instances of these "operator" objects with deap pipeline nodes (i.e. what we're currently referring to as "operators" in the tpot code). I'm new to deap (and only just started looking at tpot today as well) so that might actually be a fairly trivial consideration.

Do we even need a separate input validation function for each class? Or does deap.gp.PrimitiveSetTyped already handle input validation for us?

bartleyn · 2015-12-17T21:27:27Z

I should specify that I mean validating that the model parameters and input dataframe are usable (e.g., non-negative parameters and a minimum number of columns in the dataframe), rather than just type validation, which is the only thing I think addPrimitive provides.

As for tying together the instances of operator objects to the pipeline, that's a good question. Perhaps the tpot object would have a function that wraps addPrimitive and allows the user to add additional operators to a default set of operators. As far as I know, all we would really need to do is come up with the 'contract' (in the functional programming sense) for each operator in order to add it to the deap pipeline.

dmarx · 2015-12-17T21:39:59Z

Ok, I see. An abstract function sounds like a good approach. You're right about treating this as a "contract" as well. I was starting to get into attaching learned hyperparameters to operator instances (which would move towards satisfying issue #11) but that might be getting ahead of ourselves. Just abstracting out the "contract" would significantly simplify the code for the TPOT class, and is really all I was doing in the demo I linked anyway.

In fact, while we're keeping the scope constrained to the "contract" here, maybe it'd be simpler to just let each respective model's "fit" method be responsible for input validation.

dmarx · 2015-12-18T03:35:41Z

I've started playing with this in my fork: https://github.com/dmarx/tpot/tree/modularize_operators

This is obviously going to be a significant refactoring of the project so I'd appreciate the feedback of anyone who cares enough to poke around (here's lookin at you, @rhiever) as I'd rather not make significant unilateral decisions on a project I haven't previously been involved with.

I've managed to get it working for one operator (random forest), shouldn't be too hard to port the rest over now that I'm past that hurdle.

rhiever · 2015-12-18T16:31:18Z

Oooohh, this is interesting @dmarx! I'm poking around in your fork now to see how things work. I like how this could also allow us to tie the export() code to the individual objects rather than having a giant single function as we do now.

wrt input validation: DEAP doesn't currently perform input validation. All PrimitiveSetTyped does is ensure that the right type is passed, but doesn't perform any validation such as ensuring that the integer is positive.

rhiever · 2015-12-18T16:33:33Z

Also: I'll note that I'm currently trying to nail down a semi-stable version of TPOT for the next conference paper, so I probably won't pursue a major refactor until ~late January. But that doesn't mean we can't develop it in a fork in the meantime.

rhiever · 2015-12-18T16:53:11Z

@dmarx, I'm 👍 on this refactor. It looks fantastic! How much do you think it's going to be able to reduce the code size of export()?

dmarx · 2015-12-18T17:09:04Z

@rhiever quite a bit. My goal is to push all code generation particular to operators to two loops: one that constructs the import statement and one that constructs the pipeline. So basically everything below the comment # Replace the TPOT functions with their corresponding Python code is going to go, as is the block that constructs the import statements. Call it a reduction of about 250 lines (for that one function. Much more for the TPOT class in general)? After I'm done the function should look something like this: https://gist.github.com/dmarx/a35a94cb0e42b3cb7811

rhiever · 2015-12-20T16:13:56Z

#62 is now merged if you want to merge that into your fork, @dmarx.

Pretty stoked to see this refactor!

rhiever added the enhancement label Dec 5, 2015

rhiever changed the title ~~Abstract classifier pipeline operators~~ Abstract the classifier pipeline operators Dec 5, 2015

rhiever changed the title ~~Abstract the classifier pipeline operators~~ Create an abstract function for the classifier pipeline operators Dec 5, 2015

bartleyn mentioned this issue Dec 14, 2015

Abstract function for the classifier pipeline operators #57

Merged

rhiever mentioned this issue Dec 18, 2015

Break export() down into 3 separate functions #44

Closed

pronojitsaha mentioned this issue Dec 19, 2015

Break export() into sub-functions #62

Merged

dmarx mentioned this issue Dec 22, 2015

Refactored operators into separate classes to simplify TPOT class and future incorporation of new operators #63

Closed

rhiever closed this as completed Feb 9, 2016

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this issue Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create an abstract function for the classifier pipeline operators #43

Create an abstract function for the classifier pipeline operators #43

rhiever commented Dec 5, 2015

bartleyn commented Dec 10, 2015

rhiever commented Dec 10, 2015

dmarx commented Dec 17, 2015

bartleyn commented Dec 17, 2015

dmarx commented Dec 17, 2015

bartleyn commented Dec 17, 2015

dmarx commented Dec 17, 2015

dmarx commented Dec 18, 2015

rhiever commented Dec 18, 2015

rhiever commented Dec 18, 2015

rhiever commented Dec 18, 2015

dmarx commented Dec 18, 2015

rhiever commented Dec 20, 2015

Create an abstract function for the classifier pipeline operators #43

Create an abstract function for the classifier pipeline operators #43

Comments

rhiever commented Dec 5, 2015

bartleyn commented Dec 10, 2015

rhiever commented Dec 10, 2015

dmarx commented Dec 17, 2015

bartleyn commented Dec 17, 2015

dmarx commented Dec 17, 2015

bartleyn commented Dec 17, 2015

dmarx commented Dec 17, 2015

dmarx commented Dec 18, 2015

rhiever commented Dec 18, 2015

rhiever commented Dec 18, 2015

rhiever commented Dec 18, 2015

dmarx commented Dec 18, 2015

rhiever commented Dec 20, 2015