Use dropout correctly #6

andimarafioti · 2018-03-07T11:01:07Z

I added a dropout feature to the sequential model. Preliminary tests on it are a bit hard to asses.

I trained two equivalents networks for 800k steps with a learning rate of 1e-3. In orange there's a network with dropout = 0.3 for the linear layer and 0.1 for all conv and deconv layers except the last deconv. In blue is the same network without any dropout.
I think the sudden change in the orange one in the training SNR comes when I restarted the training with dropout = 0.3 for the linear layer (before it was 0.5, I'm not really sure)

It seems to work well since the performance on the validation test is better with dropout and worse on the training set.

What do you think? Should I run more tests? Are this parameters good for you? (30% on the linear layer and 10% on convs)

I also tried the same net w/only dropout=50% on convs (blue):

andimarafioti · 2018-03-07T11:01:33Z

I will also change the implementation of the dropout to be a little more explicit and descriptive.

andimarafioti · 2018-03-07T11:27:01Z

I changed it here:
a8208b7

andimarafioti · 2018-03-07T11:30:26Z

According to the original paper, dropout should be after the activation (relu).

andimarafioti · 2018-03-07T11:31:59Z

And in here F. Chollet says the RELU should go before batch normalization.

andimarafioti · 2018-03-07T14:07:32Z

When I change the learning rate the SNR on validation improves drastically for the network that doesn't have dropout, making it work way better than the one with dropout and also similarly to what it does for training:

I don't know why this effect happens with the learning rate, but it's been happening for a while now. The weirdness is: on blue I added dropout and it did worse. On orange I removed dropout and it did better. Of course dropout is (should) being removed at testing/validating.

Maybe this small network is not able to overfit the training set?

andimarafioti · 2018-03-07T14:24:43Z

I may be seeing a problem that arises from not having a separate graph for the training and evaluating.

andimarafioti · 2018-03-07T15:23:49Z

I did some tests setting the dropout to really high values and the performance on the testing set is really affected but not on the validation, so its probably not a matter of having separated graphs

nperraud · 2018-03-07T16:24:36Z

According to the plot, it seems to work. To be discussed in the next meeting.

nperraud · 2018-03-08T13:52:58Z

Please, use 20% dropout before the fully connected layer only.

andimarafioti assigned nperraud and andimarafioti Mar 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dropout correctly #6

Use dropout correctly #6

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018 •

edited

Loading

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018 •

edited

Loading

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018

nperraud commented Mar 7, 2018

nperraud commented Mar 8, 2018

Use dropout correctly #6

Use dropout correctly #6

Comments

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018 • edited Loading

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018 • edited Loading

andimarafioti commented Mar 7, 2018

andimarafioti commented Mar 7, 2018

nperraud commented Mar 7, 2018

nperraud commented Mar 8, 2018

andimarafioti commented Mar 7, 2018 •

edited

Loading

andimarafioti commented Mar 7, 2018 •

edited

Loading