-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding missing dropout layers to VGG16 and VG19 #43
Conversation
Good spot! I think this wasn't done (or noticed) because the main goal was to replicate the predictions, but when fine-tuning for another task, it shouldn't hurt to faithfully replicate the architecture originally used in training. |
I just renamed the dropout layers based on your suggestions. |
Thanks! I'll leave it to @ebenolson to have a look and/or merge. |
Yeah, I was focused on inference/feature extraction so I thought it would be simpler to leave these out, similarly the googlenet model is missing the auxiliary classifier arms. I also didn't want to confuse people with the need to set It's true though that these would be important for fine-tuning so I'm fine with the change. However, I'm pretty sure the dropout layers should follow fc6 and fc7, not precede them? |
You're right the dropout layers should follow not proceed the dense layers. I updated it. I found Lasagne's recipes very useful as a base model for feature extraction and training new models. Perhaps we should add a comment at the top saying: "If you want to build your own model, you should use the dropout layers to reduce overfitting. Otherwise, comment them". Or add What do you think? |
Are you sure? Section 3.1 says:
I'd say dropout comes before the fully-connected layers, not after. The biggest layer (in terms of inputs and params) is fc6, so that's where dropping inputs makes the most sense. And it sometimes helps to not drop any inputs of the final output layer, i.e., not place a dropout layer before fc8. So I think dff8363 was correct. Is there any training code to compare to?
Or maybe add a boolean parameter to |
I would agree with your interpretation, but the Caffe prototxt indicates dropout after fc6 and fc7. I like the idea of an argument for |
On the one hand I agree it makes the most sense to put the dropout before the layers with the most parameters, on the other hand it's kind of weird not to have one before the output layer as well in that case (where have you seen this before?). Also, even if the first dropout layer is after the first dense layer, its parameters will still get regularized somewhat, because dropout affects all activations coming after it, and all gradients in the network (the effect is global). So it's not unlikely that the caffe interpretation is right. |
Yes, ImageNet Pretrained Network (VGG_S).ipynb also has the dropout layers after the dense layers. I think it's more inline with Lasagne to use |
Ok, I'm alright with the current version also. Perhaps when I get around to #18 we will revisit the idea of a |
Merging, thank you for contributing. |
adding missing dropout layers to VGG16 and VG19
Phew, not sure... but I think I've seen instances where they don't dropout the inputs of the final classification layer. Probably that was with a lot fewer hidden units and a lot fewer classes, though (not an ImageNet model). |
According to the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition" there are two dropout layers (ratio 0.5) for the first two fully-connected layers in VGG16 and VGG19.
I added the two dropout layers and it drastically reduces overfitting.