Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dropout correctly #6

Open
andimarafioti opened this issue Mar 7, 2018 · 9 comments
Open

Use dropout correctly #6

andimarafioti opened this issue Mar 7, 2018 · 9 comments
Assignees

Comments

@andimarafioti
Copy link
Owner

I added a dropout feature to the sequential model. Preliminary tests on it are a bit hard to asses.

I trained two equivalents networks for 800k steps with a learning rate of 1e-3. In orange there's a network with dropout = 0.3 for the linear layer and 0.1 for all conv and deconv layers except the last deconv. In blue is the same network without any dropout.
I think the sudden change in the orange one in the training SNR comes when I restarted the training with dropout = 0.3 for the linear layer (before it was 0.5, I'm not really sure)

image

image

It seems to work well since the performance on the validation test is better with dropout and worse on the training set.

What do you think? Should I run more tests? Are this parameters good for you? (30% on the linear layer and 10% on convs)

I also tried the same net w/only dropout=50% on convs (blue):

image

@andimarafioti
Copy link
Owner Author

I will also change the implementation of the dropout to be a little more explicit and descriptive.

@andimarafioti
Copy link
Owner Author

I changed it here:
a8208b7

@andimarafioti
Copy link
Owner Author

andimarafioti commented Mar 7, 2018

According to the original paper, dropout should be after the activation (relu).

@andimarafioti
Copy link
Owner Author

And in here F. Chollet says the RELU should go before batch normalization.

@andimarafioti
Copy link
Owner Author

andimarafioti commented Mar 7, 2018

When I change the learning rate the SNR on validation improves drastically for the network that doesn't have dropout, making it work way better than the one with dropout and also similarly to what it does for training:

image

I don't know why this effect happens with the learning rate, but it's been happening for a while now. The weirdness is: on blue I added dropout and it did worse. On orange I removed dropout and it did better. Of course dropout is (should) being removed at testing/validating.

Maybe this small network is not able to overfit the training set?

@andimarafioti
Copy link
Owner Author

I may be seeing a problem that arises from not having a separate graph for the training and evaluating.

@andimarafioti
Copy link
Owner Author

I did some tests setting the dropout to really high values and the performance on the testing set is really affected but not on the validation, so its probably not a matter of having separated graphs

@nperraud
Copy link
Collaborator

nperraud commented Mar 7, 2018

According to the plot, it seems to work. To be discussed in the next meeting.

@nperraud
Copy link
Collaborator

nperraud commented Mar 8, 2018

Please, use 20% dropout before the fully connected layer only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants