Generate a model file and reuse model to classify new samples (eg streaming big data) #30

giorgio79 · 2016-01-13T14:11:26Z

Can autosklearn generate a model file that can be reused for classifying new data? Would be useful for classifying big data streams.

mfeurer · 2016-01-13T14:21:55Z

Yes, that would be useful, but so far it can't. What you can do is use show_models(). It outputs something like:

[(weight, constructor),
 (weight, constructor)]

which determines the final ensemble. You can use that in order to retrain your model on the full data and pickle it in your own code.

giorgio79 · 2016-03-05T13:25:34Z

Looks like scikit learn uses some external libs
http://stackoverflow.com/questions/10592605/save-classifier-to-disk-in-scikit-learn

Motorrat · 2016-05-06T11:13:27Z

is there a simple programmatic way to convert the output of show_models() into a string that can be used to construct the classifiers in the code? Currently it comes out as

(0.040000, SimpleClassificationPipeline(configuration={
  'balancing:strategy': 'weighting',
  'classifier:__choice__': 'random_forest',
  'classifier:random_forest:bootstrap': 'False',
  'classifier:random_forest:criterion': 'entropy',
  'classifier:random_forest:max_depth': 'None',
  'classifier:random_forest:max_features': 1.6519823800472522,
  'classifier:random_forest:max_leaf_nodes': 'None',
  'classifier:random_forest:min_samples_leaf': 14,
  'classifier:random_forest:min_samples_split': 13,
  'classifier:random_forest:min_weight_fraction_leaf': 0.0,
  'classifier:random_forest:n_estimators': 100,
  'imputation:strategy': 'mean',
  'one_hot_encoding:use_minimum_fraction': 'False',
  'preprocessor:__choice__': 'no_preprocessing',
  'rescaling:__choice__': 'min/max'})),
(0.040000, SimpleClassificationPipeline(configuration={
  'balancing:strategy': 'weighting',
  'classifier:__choice__': 'sgd',
  'classifier:sgd:alpha': 8.157889958167601e-05,
  'classifier:sgd:average': 'False',
  'classifier:sgd:eta0': 0.042599381735495594,
  'classifier:sgd:fit_intercept': 'True',
  'classifier:sgd:learning_rate': 'optimal',
  'classifier:sgd:loss': 'perceptron',
  'classifier:sgd:n_iter': 25,
  'classifier:sgd:penalty': 'l2',
  'imputation:strategy': 'median',
  'one_hot_encoding:minimum_fraction': 0.040130045634589266,
  'one_hot_encoding:use_minimum_fraction': 'True',
  'preprocessor:__choice__': 'no_preprocessing',
  'rescaling:__choice__': 'normalize'})),

mfeurer · 2016-05-06T11:15:20Z

Have a look at this.

Motorrat · 2016-05-06T11:19:42Z

Also show_models() can be very slow and occupies a lot of memory - takes tens of minutes and tens of GB in my case.

Instead I am using
for quality in $(grep obj $ats/log-run*|sed -e 's/^.*obj\ $.*$$/\1/'|sort|uniq|head -10); do grep final -A 1 $(grep -l "$quality" $ats/log-run*|sort|head -1); done;
ats=salted_temp_dir_of_autoscklearn
to get top 10 classifiers that were chosen as having best scores from the log files and obviously this virtually takes no time at all.
I wonder if there is a reason show_models does what it does.

mfeurer · 2016-10-17T12:13:24Z

The latest version of auto-sklearn features pickleable classifiers/regressors. If there is still an issue with model persistence, please open a new issue.

Motorrat mentioned this issue May 6, 2016

show_models or score() is "killed" #92

Closed

mfeurer mentioned this issue Sep 23, 2016

Serializing AutoSklearnClassifier #138

Closed

mfeurer closed this as completed Oct 17, 2016

mfeurer mentioned this issue Nov 9, 2016

How to proceed after fit() #184

Closed

mfeurer mentioned this issue Nov 15, 2017

convert to scikit learn code. #388

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate a model file and reuse model to classify new samples (eg streaming big data) #30

Generate a model file and reuse model to classify new samples (eg streaming big data) #30

giorgio79 commented Jan 13, 2016

mfeurer commented Jan 13, 2016

giorgio79 commented Mar 5, 2016

Motorrat commented May 6, 2016 •

edited

Loading

mfeurer commented May 6, 2016

Motorrat commented May 6, 2016 •

edited

Loading

mfeurer commented Oct 17, 2016

Generate a model file and reuse model to classify new samples (eg streaming big data) #30

Generate a model file and reuse model to classify new samples (eg streaming big data) #30

Comments

giorgio79 commented Jan 13, 2016

mfeurer commented Jan 13, 2016

giorgio79 commented Mar 5, 2016

Motorrat commented May 6, 2016 • edited Loading

mfeurer commented May 6, 2016

Motorrat commented May 6, 2016 • edited Loading

mfeurer commented Oct 17, 2016

Motorrat commented May 6, 2016 •

edited

Loading

Motorrat commented May 6, 2016 •

edited

Loading