Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Major refactor] Incorporate OO redesign #91

Closed
rhiever opened this issue Feb 25, 2016 · 5 comments
Closed

[Major refactor] Incorporate OO redesign #91

rhiever opened this issue Feb 25, 2016 · 5 comments

Comments

@rhiever
Copy link
Contributor

rhiever commented Feb 25, 2016

See #63 for an example of the new OO design on an old version of TPOT. This will require a large overhaul of TPOT.

@rhiever rhiever changed the title Incorporate OO redesign [Major refactor] Incorporate OO redesign Jun 1, 2016
@rhiever
Copy link
Contributor Author

rhiever commented Jun 1, 2016

@tonyfast: I just remembered that we have this issue open to discuss the major refactor coming up after the 0.4 release.

@teaearlgraycold, please link your WIP refactor here so we can take a look at it.

@danthedaniel
Copy link
Contributor

danthedaniel commented Jun 2, 2016

Don't try to run this code - atm it's just a structural layout

Edit: Some of it will kinda work now

Edit 2: You can actually do a fit_predict run now

https://github.com/teaearlgraycold/tpot/tree/refactor

Code of interest is in tpot/operators and tpot/tpot.py#134

@danthedaniel
Copy link
Contributor

danthedaniel commented Jun 2, 2016

So with this setup if you want to do something like refactor TPOT so it just uses Numpy matrices instad of Pandas DataFrames you can edit the Operator class, Classifer class, and PreProcessor class and leave all actuall classifiers and preprocessors untouched.

Edit:

And something I'm interested in doing is largely forgoing the preprocess_args() method (from the refactor) as it is now, and just implement some general rules for arguments that will be applied based off of what argument names are used.

So for example:

If you have a Classifier that takes arguments 'max_features' and 'max_depth', there will be general rules that apply that say max_features should be between 1 and len(training_features.columns). There will be another rule that states max_depth should be between 1 and max_depth.

So when you add a new classifier or pre-processor you don't need to add extra code that threshholds values that we've already determined reasonable limits for. You'd just need to say you want some set of parameters, and if any of them have pre-defined limits it'll use those. Failing that it would run any code you specify to limit the arguments.

This however assumes that an argument's name can reliably be used to determine what kind of threshholding is useful for that argument.

Doing this would also means that instead of testing each operator with extreme values that are covered in argument preprocessing code, you can test the general rules once.

@tonyfast
Copy link

tonyfast commented Jun 2, 2016

I started looking into a refactor myself to understand the project a little bit more. I haven't put this into scripts yet, but the idea in drawn out in the notebook.

There were a few main opinions for this to study.

  • training and testing classes information are contained in a Pandas DataFrame multiindex.
  • Make a custom BaseEstimator that has a fit_predict classmethod. This method fits the model with the training then applies a transform, predict, or selection/support operation.
  • New models are created by subclassing an existing sklearn model with some defaults.

With a limited corpus of models so far this gets all the way through. There is a problem with the scoring function at the moment.

Are there are reasons not to use a direct sklearn models as subclasses? The BaseEstimator provides method for get_params and set_params which may help here.

@danthedaniel
Copy link
Contributor

danthedaniel commented Jun 2, 2016

@tonyfast I pushed out a commit so your line number is off. You were referring to the _apply_default_params() method though, correct?

That method is largely there so that certain parameters can be blindly applied to all estimators (regardless of whether they're applicable), and behaves differently than set_params.

Also, currently my refactor branch is in a state where it can be ran - albeit with only 2 classifiers and one preprocessor at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants