Tweak GoogLeNet to match the torchvision implementations #205

pri1311 · 2022-11-16T02:11:09Z

Changed the Conv layers to conv_norm layers.

However, I couldn't find a way to specify eps for BatchNorm. The default value that flux uses is 1f-5, and torchvision version of GoogleNet has eps set to 0.001.

Closes #196

PR Checklist

Tests are added
Documentation, if applicable

Also ps: Unrelated, but the link to 'contributing docs' in the readme seems to be broken.

ToucheSir · 2022-11-16T02:46:32Z

The BatchNorm constructor (dee https://fluxml.ai/Flux.jl/stable/models/layers/#Flux.BatchNorm) does have an option to specify an epsilon (it's spelled with the symbol ϵ instead of the abbreviation eps), but it does not seem like conv_norm has an option to pass any additional args through to the norm layer it creates. I think it would be fine to add an extra keyword arg to conv_norm for this, but maybe other maintainers have different proposals.

theabhirath · 2022-11-16T04:25:42Z

However, I couldn't find a way to specify eps for BatchNorm. The default value that flux uses is 1f-5, and torchvision version of GoogleNet has eps set to 0.001.

We have basic_conv_bn which already does this. This is what is currently being used for the other Inception models.

I think it would be fine to add an extra keyword arg to conv_norm for this, but maybe other maintainers have different proposals.

I'd tried this during the refactor, but it added too much clutter to the docstring and I had to handle other norm layers manually (although for eps in particular that might not be a problem).

darsnack · 2022-11-16T14:39:58Z

it does not seem like conv_norm has an option to pass any additional args through to the norm layer it creates

I think the rational here is that conv will have keywords customized more than batch norm, so it gets preference for pass-through keywords. This avoids having different subsets of keywords going to different layers which I think is confusing. The normalization layer then has to use the slightly more verbose syntax of a closure: (planes, act) -> BatchNorm(planes, act; ϵ = 1f-3).

Though as Abhirath mentioned, using basic_conv_bn is preferred here.

pri1311 · 2022-11-17T04:07:10Z

Would doing this work? - norm_layer = batchnorm ? (args...; kwargs...) -> BatchNorm(args...; ϵ = 1.0f-3) : identity

Because basic_conv_bn will require if else statements for every conv block.
Or a wrapper function like this - #196 (comment)

theabhirath · 2022-11-17T04:20:29Z

Because basic_conv_bn will require if else statements for every conv block.

I didn't quite follow. basic_conv_bn calls conv_norm under the hood, so this should just be a matter of adding a toggle to basic_conv_bn to switch off the batch normalisation (and pass in identity to the norm_layer argument in conv_bn for that case). So the wrapper function you are suggesting is in fact, basic_conv_bn. What you've done in the PR works too, but the reason I introduced basic_conv_bn was so that the Inception family could have one function that handled the setting of these values so that it's clear to the users that they're related.

pri1311 · 2022-11-17T04:33:26Z

So you are suggesting tweaking the basic_conv_bn function to have the toggle, right?

function basic_conv_bn(kernel_size::Dims{2}, inplanes, outplanes, activation = relu; batchnorm::Bool = true
                       kwargs...)
    # TensorFlow uses a default epsilon of 1e-3 for BatchNorm
    norm_layer = batchnorm ? (args...; kwargs...) -> BatchNorm(args...; ϵ = 1.0f-3, kwargs...) : identity
    return conv_norm(kernel_size, inplanes, outplanes, activation; norm_layer, kwargs...)
end

theabhirath · 2022-11-17T04:38:12Z

Yep, that seems perfect!

pri1311 · 2022-11-17T05:10:49Z

Thank you for all the help @theabhirath!

I had one more question though, the current implementation of GoogLeNet does not have relu activations after Conv blocks, but to my knowledge, the paper mentions having relu activations after convolutions as well torchvision has relu activation. Should I make that change too?

theabhirath · 2022-11-17T05:18:35Z

I had one more question though, the current implementation of GoogLeNet does not have relu activations after Conv blocks, but to my knowledge, the paper mentions having relu activations after convolutions as well torchvision has relu activation. Should I make that change too?

Sure, go ahead! The closer we are to paper parity the better 😄

theabhirath

LGTM! Could you just run the formatter locally once and commit so that the alignment/spacing for the code matches the rest of the repository? Otherwise everything seems great, thank you so much for the contribution!

pri1311 added 3 commits November 16, 2022 07:35

Add toggle for batch normalisation

6d8075f

add bias argument

02464d5

add documentation

6dca136

pri1311 added 2 commits November 17, 2022 09:29

change norm_layer to pass epsilon to batch normalization

2d59df3

fixes

6fa2eeb

pri1311 added 3 commits November 17, 2022 10:55

use basic_conv_bn instead of conv_norm

5129799

correct tests

8baf73b

correct tests

8211fd0

theabhirath approved these changes Nov 17, 2022

View reviewed changes

change formatting

dc74cd0

darsnack merged commit b3cc533 into FluxML:master Nov 17, 2022

theabhirath mentioned this pull request Nov 17, 2022

Tweak GoogLeNet and Inception family to match the torchvision implementations #196

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak GoogLeNet to match the torchvision implementations #205

Tweak GoogLeNet to match the torchvision implementations #205

pri1311 commented Nov 16, 2022 •

edited

Loading

ToucheSir commented Nov 16, 2022 •

edited

Loading

theabhirath commented Nov 16, 2022

darsnack commented Nov 16, 2022

pri1311 commented Nov 17, 2022

theabhirath commented Nov 17, 2022

pri1311 commented Nov 17, 2022

theabhirath commented Nov 17, 2022

pri1311 commented Nov 17, 2022

theabhirath commented Nov 17, 2022

theabhirath left a comment

Tweak GoogLeNet to match the torchvision implementations #205

Tweak GoogLeNet to match the torchvision implementations #205

Conversation

pri1311 commented Nov 16, 2022 • edited Loading

PR Checklist

ToucheSir commented Nov 16, 2022 • edited Loading

theabhirath commented Nov 16, 2022

darsnack commented Nov 16, 2022

pri1311 commented Nov 17, 2022

theabhirath commented Nov 17, 2022

pri1311 commented Nov 17, 2022

theabhirath commented Nov 17, 2022

pri1311 commented Nov 17, 2022

theabhirath commented Nov 17, 2022

theabhirath left a comment

Choose a reason for hiding this comment

pri1311 commented Nov 16, 2022 •

edited

Loading

ToucheSir commented Nov 16, 2022 •

edited

Loading