Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding missing dropout layers to VGG16 and VG19 #43

Merged
merged 3 commits into from
Feb 1, 2016

Conversation

webeng
Copy link
Contributor

@webeng webeng commented Jan 28, 2016

According to the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition" there are two dropout layers (ratio 0.5) for the first two fully-connected layers in VGG16 and VGG19.

I added the two dropout layers and it drastically reduces overfitting.

@f0k
Copy link
Member

f0k commented Jan 28, 2016

Good spot! I think this wasn't done (or noticed) because the main goal was to replicate the predictions, but when fine-tuning for another task, it shouldn't hurt to faithfully replicate the architecture originally used in training.
My only complaint would be that the existing layer names should probably stay the same as they're referenced in other notebooks. Maybe the dropout layers should be called fc6_dropout and fc7_dropout, respectively.

@webeng
Copy link
Contributor Author

webeng commented Jan 28, 2016

I just renamed the dropout layers based on your suggestions.

@f0k
Copy link
Member

f0k commented Jan 28, 2016

Thanks! I'll leave it to @ebenolson to have a look and/or merge.

@ebenolson
Copy link
Member

Yeah, I was focused on inference/feature extraction so I thought it would be simpler to leave these out, similarly the googlenet model is missing the auxiliary classifier arms. I also didn't want to confuse people with the need to set deterministic=True in get_output.

It's true though that these would be important for fine-tuning so I'm fine with the change. However, I'm pretty sure the dropout layers should follow fc6 and fc7, not precede them?

@webeng
Copy link
Contributor Author

webeng commented Jan 29, 2016

You're right the dropout layers should follow not proceed the dense layers. I updated it.

I found Lasagne's recipes very useful as a base model for feature extraction and training new models. Perhaps we should add a comment at the top saying: "If you want to build your own model, you should use the dropout layers to reduce overfitting. Otherwise, comment them". Or add deterministic=True in get_output

What do you think?

@f0k
Copy link
Member

f0k commented Jan 29, 2016

However, I'm pretty sure the dropout layers should follow fc6 and fc7, not precede them?

Are you sure? Section 3.1 says:

The training was regularised by weight decay (the L2 penalty multiplier set to 5e-4) and dropout regularisation for the first two fully-connected layers (dropout ratio set to 0.5).

I'd say dropout comes before the fully-connected layers, not after. The biggest layer (in terms of inputs and params) is fc6, so that's where dropping inputs makes the most sense. And it sometimes helps to not drop any inputs of the final output layer, i.e., not place a dropout layer before fc8. So I think dff8363 was correct. Is there any training code to compare to?

Perhaps we should add a comment at the top saying: "If you want to build your own model, you should use the dropout layers to reduce overfitting. Otherwise, comment them".

Or maybe add a boolean parameter to build_model(), something like for_training=False. If set to True, it will include the dropout layers, and maybe eventually also the auxiliary classifiers for GoogLeNet.

@ebenolson
Copy link
Member

I would agree with your interpretation, but the Caffe prototxt indicates dropout after fc6 and fc7.

I like the idea of an argument for build_model, @webeng would you mind adding that?

@benanne
Copy link
Member

benanne commented Jan 30, 2016

On the one hand I agree it makes the most sense to put the dropout before the layers with the most parameters, on the other hand it's kind of weird not to have one before the output layer as well in that case (where have you seen this before?).

Also, even if the first dropout layer is after the first dense layer, its parameters will still get regularized somewhat, because dropout affects all activations coming after it, and all gradients in the network (the effect is global). So it's not unlikely that the caffe interpretation is right.

@webeng
Copy link
Contributor Author

webeng commented Jan 30, 2016

Yes, ImageNet Pretrained Network (VGG_S).ipynb also has the dropout layers after the dense layers.

I think it's more inline with Lasagne to use deterministic=true when you call get_output as it will already ignore the dropout layers, don't you think? This is how it's done in the previous example.

@ebenolson
Copy link
Member

Ok, I'm alright with the current version also. Perhaps when I get around to #18 we will revisit the idea of a for_training parameter. I will merge tonight if there are no further comments.

@ebenolson
Copy link
Member

Merging, thank you for contributing.

ebenolson added a commit that referenced this pull request Feb 1, 2016
adding missing dropout layers to VGG16 and VG19
@ebenolson ebenolson merged commit 0ccd547 into Lasagne:master Feb 1, 2016
@f0k
Copy link
Member

f0k commented Feb 1, 2016

where have you seen this before?

Phew, not sure... but I think I've seen instances where they don't dropout the inputs of the final classification layer. Probably that was with a lot fewer hidden units and a lot fewer classes, though (not an ImageNet model).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants