Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract function for the classifier pipeline operators #57

Merged
merged 2 commits into from
Dec 16, 2015

Conversation

bartleyn
Copy link
Contributor

Per #43 , this abstracts the shared code between static models into a static method that each model can call. In addition, I built a test for testing all the static models / this new abstract method indirectly, and tweaked the num_trees parameter in random_forest to n_estimators to facilitate testing.

Potential improvements:

  1. Maybe it's just me, but It feels a little odd to call a static function from another static function. Does it make sense to move this method (and perhaps some others) to a utility function module? Either way, is it faster to call this abstract method as a static method inside the class, a class method inside the class, or something else?
  2. Is there a more appropriate name for the method?
  3. Is there a better place to do the input validation (i.e. if there are only three columns in the input_df) than within this nested function call?

input_df.loc[:, sf_identifier] = input_df['guess'].values

return input_df
return TPOT._train_model_and_predict(input_df, DecisionTreeClassifier, max_features=max_features, max_depth=max_depth, random_state=42)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the TPOT. is extraneous: It should be possible to call _train_model_and_predict() without it.

@rhiever
Copy link
Contributor

rhiever commented Dec 15, 2015

Great work as always! Thank you for adding the test.

Addressing your questions:

  1. Maybe it's just me, but It feels a little odd to call a static function from another static function. Does it make sense to move this method (and perhaps some others) to a utility function module? Either way, is it faster to call this abstract method as a static method inside the class, a class method inside the class, or something else?

AFAIK, there is no performance difference between the two. The difference between a static and regular class function is whether the function relies on the current state of the object, i.e., it affects the scope. Static functions always have their own scope, whereas class functions share the scope of the object they're called on.

I think we have to ask ourselves: Do we want users using these functions outside of TPOT? e.g.,

from tpot import random_forest

result = random_forest(...)
...

I originally envisioned the export() function doing something like that, but I ended up exporting the pipelines directly to sklearn code (even if it looks ugly). Perhaps we should change all of the functions to regular class methods since we don't really want users doing that. I don't think any of the functions in TPOT were really made to be used outside of TPOT.

  1. Is there a more appropriate name for the method?

Name looks fine to me!

  1. Is there a better place to do the input validation (i.e. if there are only three columns in the input_df) than within this nested function call?

That's a tough one, since we're never really sure where the pipeline operator functions will be getting their input from and where they're sending it to. I think putting in that quick check at the beginning of the function is the best we can do.

@rhiever
Copy link
Contributor

rhiever commented Dec 15, 2015

IMO we should keep _train_model_and_predict() within the TPOT class. It's simply an abstraction of the TPOT classifier operators.

@bartleyn
Copy link
Contributor Author

I settled on moving _train_model_and_predict() out of the class to see if we could remove the TPOT. that prefaced the function call in the static models; I agree that ideally the function should remain in the TPOT class. However, without passing an instantiated TPOT object into the static model call, I'm not sure how to avoid the static TPOT._train_model_and_predict().

@rhiever
Copy link
Contributor

rhiever commented Dec 15, 2015

I see. We should definitely remove all of the @staticfunction decorators then, and add self as the first parameter to these functions. That will allow us to call the internal functions as self.function_name().

…sing to the sklearn model easier, changed static model test name

removed static decorators for static models and
_train_model_and_predict()

Cleaning up.
@bartleyn
Copy link
Contributor Author

I'll try to squash some of my commits for brevity's sake.

@rhiever
Copy link
Contributor

rhiever commented Dec 15, 2015

This looks ready to merge now. Anything else you planning on adding to this PR?

@bartleyn
Copy link
Contributor Author

I think I'm good for now. If you think it's ready, then great!

rhiever pushed a commit that referenced this pull request Dec 16, 2015
Abstract function for the classifier pipeline operators
@rhiever rhiever merged commit c946b15 into EpistasisLab:master Dec 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants