-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dropout correctly #6
Comments
I will also change the implementation of the dropout to be a little more explicit and descriptive. |
I changed it here: |
According to the original paper, dropout should be after the activation (relu). |
And in here F. Chollet says the RELU should go before batch normalization. |
When I change the learning rate the SNR on validation improves drastically for the network that doesn't have dropout, making it work way better than the one with dropout and also similarly to what it does for training: I don't know why this effect happens with the learning rate, but it's been happening for a while now. The weirdness is: on blue I added dropout and it did worse. On orange I removed dropout and it did better. Of course dropout is (should) being removed at testing/validating. Maybe this small network is not able to overfit the training set? |
I may be seeing a problem that arises from not having a separate graph for the training and evaluating. |
I did some tests setting the dropout to really high values and the performance on the testing set is really affected but not on the validation, so its probably not a matter of having separated graphs |
According to the plot, it seems to work. To be discussed in the next meeting. |
Please, use 20% dropout before the fully connected layer only. |
I added a dropout feature to the sequential model. Preliminary tests on it are a bit hard to asses.
I trained two equivalents networks for 800k steps with a learning rate of 1e-3. In orange there's a network with dropout = 0.3 for the linear layer and 0.1 for all conv and deconv layers except the last deconv. In blue is the same network without any dropout.
I think the sudden change in the orange one in the training SNR comes when I restarted the training with dropout = 0.3 for the linear layer (before it was 0.5, I'm not really sure)
It seems to work well since the performance on the validation test is better with dropout and worse on the training set.
What do you think? Should I run more tests? Are this parameters good for you? (30% on the linear layer and 10% on convs)
I also tried the same net w/only dropout=50% on convs (blue):
The text was updated successfully, but these errors were encountered: