Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak GoogLeNet to match the torchvision implementations #205

Merged
merged 9 commits into from
Nov 17, 2022

Conversation

pri1311
Copy link
Contributor

@pri1311 pri1311 commented Nov 16, 2022

Changed the Conv layers to conv_norm layers.

However, I couldn't find a way to specify eps for BatchNorm. The default value that flux uses is 1f-5, and torchvision version of GoogleNet has eps set to 0.001.

Closes #196

PR Checklist

  • Tests are added
  • Documentation, if applicable

Also ps: Unrelated, but the link to 'contributing docs' in the readme seems to be broken.

@ToucheSir
Copy link
Member

ToucheSir commented Nov 16, 2022

The BatchNorm constructor (dee https://fluxml.ai/Flux.jl/stable/models/layers/#Flux.BatchNorm) does have an option to specify an epsilon (it's spelled with the symbol ϵ instead of the abbreviation eps), but it does not seem like conv_norm has an option to pass any additional args through to the norm layer it creates. I think it would be fine to add an extra keyword arg to conv_norm for this, but maybe other maintainers have different proposals.

@theabhirath
Copy link
Member

However, I couldn't find a way to specify eps for BatchNorm. The default value that flux uses is 1f-5, and torchvision version of GoogleNet has eps set to 0.001.

We have basic_conv_bn which already does this. This is what is currently being used for the other Inception models.

I think it would be fine to add an extra keyword arg to conv_norm for this, but maybe other maintainers have different proposals.

I'd tried this during the refactor, but it added too much clutter to the docstring and I had to handle other norm layers manually (although for eps in particular that might not be a problem).

@darsnack
Copy link
Member

it does not seem like conv_norm has an option to pass any additional args through to the norm layer it creates

I think the rational here is that conv will have keywords customized more than batch norm, so it gets preference for pass-through keywords. This avoids having different subsets of keywords going to different layers which I think is confusing. The normalization layer then has to use the slightly more verbose syntax of a closure: (planes, act) -> BatchNorm(planes, act; ϵ = 1f-3).

Though as Abhirath mentioned, using basic_conv_bn is preferred here.

@pri1311
Copy link
Contributor Author

pri1311 commented Nov 17, 2022

Would doing this work? - norm_layer = batchnorm ? (args...; kwargs...) -> BatchNorm(args...; ϵ = 1.0f-3) : identity

Because basic_conv_bn will require if else statements for every conv block.
Or a wrapper function like this - #196 (comment)

@theabhirath
Copy link
Member

Because basic_conv_bn will require if else statements for every conv block.

I didn't quite follow. basic_conv_bn calls conv_norm under the hood, so this should just be a matter of adding a toggle to basic_conv_bn to switch off the batch normalisation (and pass in identity to the norm_layer argument in conv_bn for that case). So the wrapper function you are suggesting is in fact, basic_conv_bn. What you've done in the PR works too, but the reason I introduced basic_conv_bn was so that the Inception family could have one function that handled the setting of these values so that it's clear to the users that they're related.

@pri1311
Copy link
Contributor Author

pri1311 commented Nov 17, 2022

So you are suggesting tweaking the basic_conv_bn function to have the toggle, right?

function basic_conv_bn(kernel_size::Dims{2}, inplanes, outplanes, activation = relu; batchnorm::Bool = true
                       kwargs...)
    # TensorFlow uses a default epsilon of 1e-3 for BatchNorm
    norm_layer = batchnorm ? (args...; kwargs...) -> BatchNorm(args...; ϵ = 1.0f-3, kwargs...) : identity
    return conv_norm(kernel_size, inplanes, outplanes, activation; norm_layer, kwargs...)
end

@theabhirath
Copy link
Member

Yep, that seems perfect!

@pri1311
Copy link
Contributor Author

pri1311 commented Nov 17, 2022

Thank you for all the help @theabhirath!

I had one more question though, the current implementation of GoogLeNet does not have relu activations after Conv blocks, but to my knowledge, the paper mentions having relu activations after convolutions as well torchvision has relu activation. Should I make that change too?

@theabhirath
Copy link
Member

I had one more question though, the current implementation of GoogLeNet does not have relu activations after Conv blocks, but to my knowledge, the paper mentions having relu activations after convolutions as well torchvision has relu activation. Should I make that change too?

Sure, go ahead! The closer we are to paper parity the better 😄

Copy link
Member

@theabhirath theabhirath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Could you just run the formatter locally once and commit so that the alignment/spacing for the code matches the rest of the repository? Otherwise everything seems great, thank you so much for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tweak GoogLeNet and Inception family to match the torchvision implementations
4 participants