Pre-processing the training data #45

hjweide · 2015-02-13T17:43:05Z

I want to pre-process my training data by subtracting the mean. I could do this by subtracting the mean from my training data before I pass it to nolearn.lasagne.NeuralNet, but this would contaminate my validation set. Instead, it would be nice if one could pass a StandardScaler to the NeuralNet, which could compute the mean on the training set, apply it to the validation set, and store the StandardScaler for when the NeuralNet is used to predict on a held-out test set.

This might be done in the train_loop just after the train_test_split happens.

dnouri · 2015-02-19T22:28:29Z

One way to do this is to subclass NeuralNet and override the train_test_split method to use a StandardScaler in the way you describe. In this method, store the StandardScaler as an attribute on self, and access in the predict_proba method; that you'll have to override as well.

I'll be happy to hear any suggestions on making this more dynamic. In #42, I briefly discussed making train_test_split overridable with a parameter, but in your case, it seems you'd need to subclass for predict_proba anyway.

hjweide · 2015-02-21T22:18:22Z

Thanks for getting back to me. I think your suggestion of subclassing NeuralNet and overriding train_test_split and predict_proba is perfectly fine for my situation.

An alternative, but more involved solution, could be to add a standard_scaler=None parameter to the NeuralNet. Then, in train_test_split, one could check if self.standard_scaler is not None and then use it to fit_transform the training set X_train and then transform the validation set X_valid. The same check and transform would have to be done in predict_proba.

hjweide · 2015-03-19T21:18:49Z

For my use case, I decided that it would be simpler to implement it as described in my post above. Here is a link to the code in case anyone else wants to do something similar: hjweide@7f30634

Any suggestions for improvements are also welcome.

dnouri · 2016-03-26T04:06:10Z

The TrainSplit interface has since been added which should give you a good opportunity to apply correct scaling.

dnouri mentioned this issue Feb 19, 2015

Improve train_test_split #12

Closed

dnouri closed this as completed Mar 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-processing the training data #45

Pre-processing the training data #45

hjweide commented Feb 13, 2015

dnouri commented Feb 19, 2015

hjweide commented Feb 21, 2015

hjweide commented Mar 19, 2015

dnouri commented Mar 26, 2016

Pre-processing the training data #45

Pre-processing the training data #45

Comments

hjweide commented Feb 13, 2015

dnouri commented Feb 19, 2015

hjweide commented Feb 21, 2015

hjweide commented Mar 19, 2015

dnouri commented Mar 26, 2016