fix ViT model output + rewrite attention layer + adapt torchvision script #230

CarloLucibello · 2023-04-25T18:07:49Z

Current status of this PR is that all weights are copied but the outputs on the test image differ (and flux's one don't make sense)

close #231

theabhirath · 2023-04-25T18:33:01Z

I'm not sure that I want the porting scripts to be generalised to all models. We used it for CNNs because it's convenient but even there the script is not exactly always directly usable (for example, SqueezeNets require you to remove the reverse added in #229 for the script to work). At best we perhaps link the scripts in the docs somewhere but having it in the repo IMO seems to suggest to users that somehow Metalhead cannot be used by them to train models on their own. The ideal solution, of course, is to train all the models in Metalhead on Imagenet and host those weights. But since there are certain obstacles to that which I encountered last summer, porting only the CNN weights (and leaving the porting script as a link instead of directly in the repo) is a good middle ground I feel. However, this is only my opinion. @darsnack and @ToucheSir may also think of something?

darsnack · 2023-04-25T19:15:21Z

Yeah that's the same reason that I only linked to it from the model card for the HF upload that used it (for reproducibility). I was afraid to suggest to users that it is a robust way to port weights from torchvision.

Maybe the scripts folder like Carlo created is fine. Why not have separate scripts in the folder that work for different sets of models? Since this isn't shipped code for the package, it does not need to be generic or robust. Having something cobbled together that allows us to ship more pre-trained models is the sweet spot. We can include a scripts/README.md that warns very explicitly that these scripts are a starting point and not a robust solution. Once something like FluxML/Flux.jl#2239 lands, we should only need the script once to generate the initial weights that work. Keeping it in the repo will only be to have a historical record for the model cards.

ToucheSir · 2023-04-25T19:23:34Z

I agree we need not be worried about having a single one-size-fits-all script. If one script per model family/group of model families helps simplify the porting code, that sounds good to me.

CarloLucibello · 2023-04-27T06:51:43Z

The problem with ViT is in the attention module, probably the weights have to be copied in some particular fashion, I will have to investigate further.

I really want to get ViT in because it is the most popular vision backbone these days.

CarloLucibello · 2023-05-05T10:30:15Z

src/layers/attention.jl

@@ -1,5 +1,5 @@
 """
-    MHAttention(planes::Integer, nheads::Integer = 8; qkv_bias::Bool = false, 
+    MultiHeadSelfAttention(planes::Integer, nheads::Integer = 8; qkv_bias::Bool = false, 


made the name more informative

CarloLucibello · 2023-05-05T10:31:03Z

src/vit-based/vit.jl

                       pool === :class ? x -> x[:, 1, :] : seconddimmean),
-                 Chain(LayerNorm(embedplanes), Dense(embedplanes, nclasses, tanh_fast)))
+                 Chain(LayerNorm(embedplanes), Dense(embedplanes, nclasses)))


this final tanh had no reason to exist

CarloLucibello · 2023-05-05T10:34:25Z

After rewriting the attention layer on top of NNlib and removing the final tanh from ViT I can reproduce pytorch's outputs although there is still a slight discrepancy:

Flux:
    acoustic guitar: 0.90519154
    stage: 0.0040107034
    harmonica: 0.0028614246
    microphone: 0.002621256
    electric guitar: 0.0025401094
PyTorch:
    acoustic guitar: 0.90745604
    stage: 0.0038461224
    harmonica: 0.002782756
    microphone: 0.0025289422
    electric guitar: 0.0023941135

This could be due to differences in the implementation of layer norm see FluxML/Flux.jl#2220

CarloLucibello · 2023-05-05T10:35:51Z

I'm happy with this. If i can get an approval I'll merge and move on

CarloLucibello · 2023-05-05T10:59:10Z

Looks like changing the implementation of LayerNorm has little effect:

Flux (LayerNormV2):
    acoustic guitar: 0.9051971
    stage: 0.0040056095
    harmonica: 0.0028621724
    microphone: 0.0026183864
    electric guitar: 0.0025359727
Flux:
    acoustic guitar: 0.90519154
    stage: 0.0040107034
    harmonica: 0.0028614246
    microphone: 0.002621256
    electric guitar: 0.0025401094
PyTorch:
    acoustic guitar: 0.90745604
    stage: 0.0038461224
    harmonica: 0.002782756
    microphone: 0.0025289422
    electric guitar: 0.0023941135

so I don't know why we observe these discrepancies. I added LayerNormv2 to the Layers module but didn't use it anywhere since I'm not sure it will be really needed

darsnack

A few minor changes before merging but otherwise looks good

src/layers/Layers.jl

src/layers/normalise.jl

darsnack · 2023-05-05T12:10:36Z

src/vit-based/vit.jl

@@ -100,9 +102,10 @@ end
 @functor ViT

 function ViT(config::Symbol; imsize::Dims{2} = (256, 256), patch_size::Dims{2} = (16, 16),
-             pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)
+             pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000, 
+             qkv_bias=false)


Unless it is typical to adjust this toggle, I think it should not get exposed going from vit to ViT. The logic with the codebase has been to make the uppercase exports as simple as possible.

I had to add it since the default for torchvision is true, here is false. The torchvision model is given by

ViT(:base, imsize=(224,224), qkv_bias=true)

I think we should change the defaults here to match that before the tag of the breaking release, but this can be done in another PR

Okay, so change the default to true and remove the keyword? I assume you almost always want it as true.

yes, I'll do it in the next PR

port ViT weights

256423e

identify the problem

d9a1334

fix ViT model

ed97207

CarloLucibello changed the title ~~adapt porting script for ViT~~ fix ViT model output and rewrite attention layer May 5, 2023

CarloLucibello changed the title ~~fix ViT model output and rewrite attention layer~~ fix ViT model output + rewrite attention layer + adapt torchvision script May 5, 2023

CarloLucibello commented May 5, 2023

View reviewed changes

CarloLucibello added 2 commits May 5, 2023 13:00

LayerNormV2

4ea2813

cleanup

ff6c1be

darsnack approved these changes May 5, 2023

View reviewed changes

address comments

35316ec

CarloLucibello merged commit 278bab6 into master May 5, 2023

CarloLucibello mentioned this pull request May 5, 2023

change ViT constructor's defaults #232

Closed

CarloLucibello deleted the cl/vit branch July 17, 2023 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix ViT model output + rewrite attention layer + adapt torchvision script #230

fix ViT model output + rewrite attention layer + adapt torchvision script #230

CarloLucibello commented Apr 25, 2023 •

edited

Loading

theabhirath commented Apr 25, 2023 •

edited

Loading

darsnack commented Apr 25, 2023

ToucheSir commented Apr 25, 2023

CarloLucibello commented Apr 27, 2023

CarloLucibello May 5, 2023

CarloLucibello May 5, 2023

CarloLucibello commented May 5, 2023

CarloLucibello commented May 5, 2023

CarloLucibello commented May 5, 2023

darsnack left a comment

darsnack May 5, 2023

CarloLucibello May 5, 2023 •

edited

Loading

darsnack May 5, 2023

CarloLucibello May 5, 2023

fix ViT model output + rewrite attention layer + adapt torchvision script #230

fix ViT model output + rewrite attention layer + adapt torchvision script #230

Conversation

CarloLucibello commented Apr 25, 2023 • edited Loading

theabhirath commented Apr 25, 2023 • edited Loading

darsnack commented Apr 25, 2023

ToucheSir commented Apr 25, 2023

CarloLucibello commented Apr 27, 2023

CarloLucibello May 5, 2023

Choose a reason for hiding this comment

CarloLucibello May 5, 2023

Choose a reason for hiding this comment

CarloLucibello commented May 5, 2023

CarloLucibello commented May 5, 2023

CarloLucibello commented May 5, 2023

darsnack left a comment

Choose a reason for hiding this comment

darsnack May 5, 2023

Choose a reason for hiding this comment

CarloLucibello May 5, 2023 • edited Loading

Choose a reason for hiding this comment

darsnack May 5, 2023

Choose a reason for hiding this comment

CarloLucibello May 5, 2023

Choose a reason for hiding this comment

CarloLucibello commented Apr 25, 2023 •

edited

Loading

theabhirath commented Apr 25, 2023 •

edited

Loading

CarloLucibello May 5, 2023 •

edited

Loading