Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instantiate sklearn model or get_params from Configuration object #886

Closed
chrisbarber opened this issue Jun 27, 2020 · 2 comments
Closed
Assignees
Labels
enhancement A new improvement or feature

Comments

@chrisbarber
Copy link

I was looking around in the code for a way to instantiate an sklearn model based on a Configuration object. My use-case is that I try to implement a generalized way of getting standard metadata about a completed autosklearn run. Eg, I call autosklearn_model.get_models_with_weights() and that ends up containing some Configuration objects. But these may for example just describe an sklearn model, although I understand it is possible to extend and register other model types as well. In either case I guess I would like access to an instance of the model with matching configuration, so that I can try calling get_params() on it to see if it supports that interface. Maybe this is simply accessible somewhere else, but my idea was to re-instantiate a dummy model based on the Configuration, and then do this calling of get_params(). Ideal would be that I could dynamically instantiate whatever model is described by the __choice__ (even if it's not sklearn), according to however autosklearn does it internally.

I was poking around in the code and found eg

from sklearn.ensemble import RandomForestClassifier

I was expecting to find though some place where this import is dynamically selected based on choice, but maybe this is just a wrapper class and is itself chosen dynamically?

Then bits like this:

classifier = self.module
config = configuration_space.sample_configuration()
classifier = classifier(random_state=np.random.RandomState(1),
**{hp_name: config[hp_name] for hp_name in
config if config[hp_name] is not None})

Could you point me in the right direction? Or advise if I am missing some fundamental point about how this should work.

To make sure I am clear above I'll also include an example. I have a config object like this:

Configuration:
  balancing:strategy, Value: 'none'
  classifier:__choice__, Value: 'random_forest'
  classifier:random_forest:bootstrap, Value: 'True'
  classifier:random_forest:criterion, Value: 'gini'
  classifier:random_forest:max_depth, Constant: 'None'
  classifier:random_forest:max_features, Value: 0.5
  classifier:random_forest:max_leaf_nodes, Constant: 'None'
  classifier:random_forest:min_impurity_decrease, Constant: 0.0
  classifier:random_forest:min_samples_leaf, Value: 1
  classifier:random_forest:min_samples_split, Value: 2
  classifier:random_forest:min_weight_fraction_leaf, Constant: 0.0
  data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'one_hot_encoding'
  data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
  data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.01
  data_preprocessing:numerical_transformer:imputation:strategy, Value: 'mean'
  data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'standardize'
  feature_preprocessor:__choice__, Value: 'no_preprocessing'

And I want something that produces the equivalent of this (but without hard-coding the model choice and removing parameters that sklearn doesn't like etc):

            hps = {hp_name.rsplit(':')[-1]: config[hp_name] for hp_name in config if config[hp_name] is not None}
            from sklearn.ensemble import RandomForestClassifier
            hps = {k:v for k,v in hps.items() if k not in ['strategy', '__choice__', 'minimum_fraction']}
            return to_mls_sklearn(RandomForestClassifier(**hps))
@mfeurer
Copy link
Contributor

mfeurer commented Jul 3, 2020

I think you are looking for the pipeline which accepts a configuration: https://github.com/automl/auto-sklearn/blob/master/autosklearn/pipeline/classification.py

Regarding your second request, unfortunately, this won't be easy as we have re-defined a lot of hyperparameters to have a different meaning than in sklearn and also introduce a lot of them ourselves.

@mfeurer
Copy link
Contributor

mfeurer commented Apr 13, 2021

We just added this feature via #1096 and it will be available in the next release.

@mfeurer mfeurer closed this as completed Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new improvement or feature
Projects
None yet
Development

No branches or pull requests

3 participants