Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify lower level API for EfficientNet and MobileNet model families #200

Merged
merged 8 commits into from
Sep 4, 2022

Conversation

theabhirath
Copy link
Member

@theabhirath theabhirath commented Aug 26, 2022

This PR unifies the lower level API for the EfficientNet and MobileNet model families into a single irmodelbuilder function. This followed as a natural consequence of #198 - a considerable amount of the calculations, model structure and configurations have now been abstracted out enough into pieces that compose, meaning that most of the models in these two families can be almost completely identified by their configuration dict and a couple of other arguments. Amongst other things, what this meant was

  1. a simplification of the builders for the inverted residual block stages,
  2. an ability to expose a more friendly mid-level API for all models that doesn't need the user to specify unexported, constant configuration dicts. This step is still in process but once completed, should hopefully result in a smoother experience for advanced users and
  3. Even more reduction of code duplication

Documentation TODO

  • Builders
  • Mid level API

Miscellaneous Notes

In a step closer to a uniform API, the higher level model APIs now all accept the pretrain keyword argument.

1. Expose `pretrain` option for all models
2. Make it easier to initialise models with config options at the mid level by providing an additional dispatch
3. Some cleanup + documentation
1. Unify MobileNet and EfficientNet APIs into a single lower level builder function
2. Further unification and consolidation of the mid level API
3. Some cleanup
conv_norm((1, 1), inplanes, squeeze_planes, activation;
norm_layer)...,
conv_norm((1, 1), squeeze_planes, inplanes,
gate_activation; norm_layer)...), .*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that (.*) returns a Base.Broadcast.BroadcastFunction. Maybe we should create an alias for it in NNlib...

Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fewer comments than I thought for a first pass :)

src/convnets/builders/resnet.jl Outdated Show resolved Hide resolved
src/convnets/builders/mbconv.jl Outdated Show resolved Hide resolved
1. Rename `DropPath` to `StochasticDepth` to better reflect the paper
2. `_prob` is more accurate than `_rate`.
3. Add stochastic depth to the `mbconv` builders
@theabhirath
Copy link
Member Author

This PR now also adds an additional batch mode to StochasticDepth (renamed from DropPath because it's not very descriptive and also to mimic torchvision) for completeness, fixes some minor issues with linear scheduling rates, renames all the *_rate to *_prob to make it clear that these are probability values and finally, adds StochasticDepth with a default value of 0.2 to the EfficientNets, finally achieving feature parity with the papers and torchvision/timm

"""
DropPath(p; [rng = rng_from_array(x)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this non/less breaking, adding a deprecation for DropPath would work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the Layers API is unexported, so does this still count as a breaking change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really should be exported in 0.8, but for now it isn't so I suppose it might work out in our favour 😄

Copy link
Member

@ToucheSir ToucheSir Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it was why you added the label, but that is true. Built docs list it as "public", but I think that's a false positive and JuliaHub + GH search turn up no results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, the breaking change is that EfficientNet literally returns a different model 😅

@theabhirath
Copy link
Member Author

This PR is complete I think - any further docs improvements would be best handled in a separate PR after landing this one and then #199

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few final naming things. Also, we can tackle more thorough docs including docstrings separately.

src/convnets/builders/irmodel.jl Outdated Show resolved Hide resolved
src/convnets/builders/mbconv.jl Outdated Show resolved Hide resolved
src/convnets/builders/mbconv.jl Outdated Show resolved Hide resolved
Comment on lines 14 to 15
stochastic_depth_prob = nothing, norm_layer = BatchNorm,
divisor::Integer = 8, kwargs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have unused keywords? As far as I can tell, right now it is silently ignored, but not including extraneous keywords would throw a MethodError (as it should).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are being used though, to pass stuff like the se_round_fn in for MobileNetv3

Copy link
Member

@darsnack darsnack Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically mean stuff like stochastic_depth_prob

Copy link
Member Author

@theabhirath theabhirath Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically mean stuff like stochastic_depth_prob

This particular one needs to be there because it will cause a MethodError for dwsep_conv_norm if passed through (and the default model builder passes it through). See also #200 (comment)

@theabhirath
Copy link
Member Author

Bump?

@darsnack darsnack merged commit 33c5257 into FluxML:master Sep 4, 2022
stochastic_depth_prob = nothing, norm_layer = BatchNorm,
divisor::Integer = 8, kwargs...)
width_mult, depth_mult = scalings
block_repeats = [ceil(Int, block_configs[idx][end - 1] * depth_mult)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Mobilenet the config list is

const MOBILENETV2_CONFIGS = [
    (mbconv, 3, 16, 1, 1, 1, nothing, relu6),
    (mbconv, 3, 24, 6, 2, 2, nothing, relu6),
    (mbconv, 3, 32, 6, 2, 3, nothing, relu6),
    (mbconv, 3, 64, 6, 2, 4, nothing, relu6),
    (mbconv, 3, 96, 6, 1, 3, nothing, relu6),
    (mbconv, 3, 160, 6, 2, 3, nothing, relu6),
    (mbconv, 3, 320, 6, 1, 1, nothing, relu6),
]

so won't block_repeats just be a list of nothing ?
and as the name suggest, it tells how many time a particular block of layers is repeated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants