instantiate sklearn model or get_params from Configuration object #886

chrisbarber · 2020-06-27T06:40:22Z

I was looking around in the code for a way to instantiate an sklearn model based on a Configuration object. My use-case is that I try to implement a generalized way of getting standard metadata about a completed autosklearn run. Eg, I call autosklearn_model.get_models_with_weights() and that ends up containing some Configuration objects. But these may for example just describe an sklearn model, although I understand it is possible to extend and register other model types as well. In either case I guess I would like access to an instance of the model with matching configuration, so that I can try calling get_params() on it to see if it supports that interface. Maybe this is simply accessible somewhere else, but my idea was to re-instantiate a dummy model based on the Configuration, and then do this calling of get_params(). Ideal would be that I could dynamically instantiate whatever model is described by the __choice__ (even if it's not sklearn), according to however autosklearn does it internally.

I was poking around in the code and found eg

auto-sklearn/autosklearn/pipeline/components/classification/random_forest.py

Line 46 in bb8396b

from sklearn.ensemble import RandomForestClassifier

I was expecting to find though some place where this import is dynamically selected based on choice, but maybe this is just a wrapper class and is itself chosen dynamically?

Then bits like this:

auto-sklearn/test/test_pipeline/components/classification/test_base.py

Lines 279 to 283 in bb8396b

    
           classifier = self.module 
        
           config = configuration_space.sample_configuration() 
        
           classifier = classifier(random_state=np.random.RandomState(1), 
        
                                   **{hp_name: config[hp_name] for hp_name in 
        
                                      config if config[hp_name] is not None})

Could you point me in the right direction? Or advise if I am missing some fundamental point about how this should work.

To make sure I am clear above I'll also include an example. I have a config object like this:

Configuration:
  balancing:strategy, Value: 'none'
  classifier:__choice__, Value: 'random_forest'
  classifier:random_forest:bootstrap, Value: 'True'
  classifier:random_forest:criterion, Value: 'gini'
  classifier:random_forest:max_depth, Constant: 'None'
  classifier:random_forest:max_features, Value: 0.5
  classifier:random_forest:max_leaf_nodes, Constant: 'None'
  classifier:random_forest:min_impurity_decrease, Constant: 0.0
  classifier:random_forest:min_samples_leaf, Value: 1
  classifier:random_forest:min_samples_split, Value: 2
  classifier:random_forest:min_weight_fraction_leaf, Constant: 0.0
  data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'one_hot_encoding'
  data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
  data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.01
  data_preprocessing:numerical_transformer:imputation:strategy, Value: 'mean'
  data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'standardize'
  feature_preprocessor:__choice__, Value: 'no_preprocessing'

And I want something that produces the equivalent of this (but without hard-coding the model choice and removing parameters that sklearn doesn't like etc):

            hps = {hp_name.rsplit(':')[-1]: config[hp_name] for hp_name in config if config[hp_name] is not None}
            from sklearn.ensemble import RandomForestClassifier
            hps = {k:v for k,v in hps.items() if k not in ['strategy', '__choice__', 'minimum_fraction']}
            return to_mls_sklearn(RandomForestClassifier(**hps))

The text was updated successfully, but these errors were encountered:

mfeurer · 2020-07-03T08:29:24Z

I think you are looking for the pipeline which accepts a configuration: https://github.com/automl/auto-sklearn/blob/master/autosklearn/pipeline/classification.py

Regarding your second request, unfortunately, this won't be easy as we have re-defined a lot of hyperparameters to have a different meaning than in sklearn and also introduce a lot of them ourselves.

mfeurer · 2021-04-13T08:44:47Z

We just added this feature via #1096 and it will be available in the next release.

chrisbarber mentioned this issue Jul 3, 2020

mls from autosklearn SwissDataScienceCenter/mlschema-model-converters#5

Open

franchuterivera added the enhancement A new improvement or feature label Feb 17, 2021

franchuterivera self-assigned this Mar 12, 2021

franchuterivera mentioned this issue Mar 12, 2021

Enabled pipeline fit #1096

Merged

mfeurer closed this as completed Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instantiate sklearn model or get_params from Configuration object #886

instantiate sklearn model or get_params from Configuration object #886

chrisbarber commented Jun 27, 2020

mfeurer commented Jul 3, 2020

mfeurer commented Apr 13, 2021

instantiate sklearn model or get_params from Configuration object #886

instantiate sklearn model or get_params from Configuration object #886

Comments

chrisbarber commented Jun 27, 2020

mfeurer commented Jul 3, 2020

mfeurer commented Apr 13, 2021