-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an abstract function for the classifier pipeline operators #43
Comments
I've got a long plane ride tomorrow evening that I could use to knock this out, especially now that I'm more familiar with DEAP. |
Awesome! On Wednesday, December 9, 2015, Nathan notifications@github.com wrote:
Randal S. Olson, Ph.D. E-mail: rso@randalolson.com | Twitter: @randal_olson |
I think a better approach would be to create a class for each operator, where the separate operators would inherit from base classes that define the expected API similar to how scikit-learn is organized. If done correctly, this would greatly simplify adding any arbitrary operator in the future, including empowering users to create customized operators. I'm thinking of a framework that would look something like this (non-working demo to roughly demonstrate the idea): https://gist.github.com/dmarx/7d72263a02b82cd276e1 |
Agreed with the oo approach, as among other things it'll also avoid bloating the main TPOT class down the line. Just to flesh this idea out, where would you propose handling input validation? An abstract function that each operator implements? |
We could probably just attach that to the base class. In the gist, I included Personally, I'm more concerned about what the best way would be to tie instances of these "operator" objects with Do we even need a separate input validation function for each class? Or does |
I should specify that I mean validating that the model parameters and input dataframe are usable (e.g., non-negative parameters and a minimum number of columns in the dataframe), rather than just type validation, which is the only thing I think As for tying together the instances of operator objects to the pipeline, that's a good question. Perhaps the |
Ok, I see. An abstract function sounds like a good approach. You're right about treating this as a "contract" as well. I was starting to get into attaching learned hyperparameters to operator instances (which would move towards satisfying issue #11) but that might be getting ahead of ourselves. Just abstracting out the "contract" would significantly simplify the code for the TPOT class, and is really all I was doing in the demo I linked anyway. In fact, while we're keeping the scope constrained to the "contract" here, maybe it'd be simpler to just let each respective model's "fit" method be responsible for input validation. |
I've started playing with this in my fork: https://github.com/dmarx/tpot/tree/modularize_operators This is obviously going to be a significant refactoring of the project so I'd appreciate the feedback of anyone who cares enough to poke around (here's lookin at you, @rhiever) as I'd rather not make significant unilateral decisions on a project I haven't previously been involved with. I've managed to get it working for one operator (random forest), shouldn't be too hard to port the rest over now that I'm past that hurdle. |
Oooohh, this is interesting @dmarx! I'm poking around in your fork now to see how things work. I like how this could also allow us to tie the wrt input validation: DEAP doesn't currently perform input validation. All |
Also: I'll note that I'm currently trying to nail down a semi-stable version of TPOT for the next conference paper, so I probably won't pursue a major refactor until ~late January. But that doesn't mean we can't develop it in a fork in the meantime. |
@dmarx, I'm 👍 on this refactor. It looks fantastic! How much do you think it's going to be able to reduce the code size of |
@rhiever quite a bit. My goal is to push all code generation particular to operators to two loops: one that constructs the import statement and one that constructs the pipeline. So basically everything below the comment |
There's quite a bit of repeated code in the classifier functions. Abstract the common code to a single function then have the classifiers call that function and pass their custom parameters/model.
The text was updated successfully, but these errors were encountered: