Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-processing the training data #45

Closed
hjweide opened this issue Feb 13, 2015 · 4 comments
Closed

Pre-processing the training data #45

hjweide opened this issue Feb 13, 2015 · 4 comments

Comments

@hjweide
Copy link

hjweide commented Feb 13, 2015

I want to pre-process my training data by subtracting the mean. I could do this by subtracting the mean from my training data before I pass it to nolearn.lasagne.NeuralNet, but this would contaminate my validation set. Instead, it would be nice if one could pass a StandardScaler to the NeuralNet, which could compute the mean on the training set, apply it to the validation set, and store the StandardScaler for when the NeuralNet is used to predict on a held-out test set.

This might be done in the train_loop just after the train_test_split happens.

@dnouri
Copy link
Owner

dnouri commented Feb 19, 2015

One way to do this is to subclass NeuralNet and override the train_test_split method to use a StandardScaler in the way you describe. In this method, store the StandardScaler as an attribute on self, and access in the predict_proba method; that you'll have to override as well.

I'll be happy to hear any suggestions on making this more dynamic. In #42, I briefly discussed making train_test_split overridable with a parameter, but in your case, it seems you'd need to subclass for predict_proba anyway.

@hjweide
Copy link
Author

hjweide commented Feb 21, 2015

Thanks for getting back to me. I think your suggestion of subclassing NeuralNet and overriding train_test_split and predict_proba is perfectly fine for my situation.

An alternative, but more involved solution, could be to add a standard_scaler=None parameter to the NeuralNet. Then, in train_test_split, one could check if self.standard_scaler is not None and then use it to fit_transform the training set X_train and then transform the validation set X_valid. The same check and transform would have to be done in predict_proba.

@hjweide
Copy link
Author

hjweide commented Mar 19, 2015

For my use case, I decided that it would be simpler to implement it as described in my post above. Here is a link to the code in case anyone else wants to do something similar: hjweide@7f30634

Any suggestions for improvements are also welcome.

@dnouri
Copy link
Owner

dnouri commented Mar 26, 2016

The TrainSplit interface has since been added which should give you a good opportunity to apply correct scaling.

@dnouri dnouri closed this as completed Mar 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants