Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to make the nets more deterministic #26

Closed
run2 opened this issue Jan 21, 2015 · 8 comments
Closed

Option to make the nets more deterministic #26

run2 opened this issue Jan 21, 2015 · 8 comments

Comments

@run2
Copy link

run2 commented Jan 21, 2015

What do you think about introducing an option to seed the random before doing the KFold on train test split ? That way the net predictions and loss details will be more deterministic over multiple runs on the same set.
Thanks

@dnouri
Copy link
Owner

dnouri commented Jan 22, 2015

I think right now you can do a np.random.seed(42) before training and it'll be deterministic?

@run2
Copy link
Author

run2 commented Jan 22, 2015

Sure yes. I just wanted to check if you want to add it as a option. I will close this and you can reopen it if and when you want to.

@run2 run2 closed this as completed Jan 22, 2015
@cancan101
Copy link
Contributor

The default for KFold is actually NOT to shuffle (i.e. it is deterministic):
https://github.com/scikit-learn/scikit-learn/blob/38104ff4e8f1c9c39a6f272eff7818b75a27da46/sklearn/cross_validation.py#L320

@dnouri
Copy link
Owner

dnouri commented Feb 11, 2015

Ah yes, very useful if your data is not independently distributed.

@cancan101
Copy link
Contributor

It might be worth tossing a note in the nolearn docs along the lines of:

By default no shuffling occurs, including for the (stratified) K fold cross- validation....Keep in mind that train_test_split still returns a random split.

Related: #12

@dnouri
Copy link
Owner

dnouri commented Feb 19, 2015

Yes, you're right. And that reminds me that I should be working on proper docstrings soon.

@neilsummers
Copy link

From what I can see there is more randomness than just in the train test splits. From the tests I have done there is randomness when I include a dropout layer. I can't seem to make this the same for each run by doing a np.random.seed(42) before the run. I have tried tracing it back throug the source, and it appears to be setting a seed by default in RandomStreams from theano, which is called in the DropoutLayer, but I am still getting changes from run to run when I include a DropoutLayer, but no changes when there is no DropoutLayer.

@dnouri
Copy link
Owner

dnouri commented Apr 22, 2015

There's an issue for that in Lasagne: Lasagne/Lasagne#6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants