Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResNet producing non deterministic and wrong predictions #72

Closed
jeremiedb opened this issue Nov 13, 2020 · 12 comments · Fixed by #164
Closed

ResNet producing non deterministic and wrong predictions #72

jeremiedb opened this issue Nov 13, 2020 · 12 comments · Fixed by #164

Comments

@jeremiedb
Copy link

jeremiedb commented Nov 13, 2020

Predictions from the ResNet pre-trained models seems to have an issue. The predicted probabilities are changing at each call and essentially look like random. VGG19 does work fine however.

using Flux
using Metalhead

img = Metalhead.load("data/cats/cats_00001.jpg");

julia> classify(VGG19(), img)
"tiger cat"

julia> classify(VGG19(), img)
"tiger cat"

julia> classify(ResNet(), img)
"reel"

julia> classify(ResNet(), img)
"abacus"

julia> classify(ResNet(), img)
"spotlight, spot"

julia> classify(ResNet(), img)
"fountain"

I first though it might be an issue with a different preprocessing required for ResNet, thought from the randomness in the output of the model, I'd guess some layers didn't get properly defined, possibly the BatchNorm?

If using testmode!, resnet will then produces the same predictions after each run. However, a different set of weights seem to be initialized right at each call to ``

resnet = ResNet().layers
testmode!(resnet)
cat_pred_resnet = resnet(preprocess(RGB.(img)))
julia> findmax(cat_pred_resnet)
(1.0f0, CartesianIndex(900, 1))

cat_pred_resnet = resnet(preprocess(RGB.(img)))
julia> findmax(cat_pred_resnet)
(1.0f0, CartesianIndex(900, 1))

resnet = ResNet().layers
testmode!(resnet)
cat_pred_resnet = resnet(preprocess(RGB.(img)))
julia> findmax(cat_pred_resnet)
(0.99877006f0, CartesianIndex(413, 1))
@jeremiedb
Copy link
Author

cats_00001

@ToucheSir
Copy link
Member

I have a suspicion this and related issues are due to not all weights being properly loaded. See https://julialang.zulipchat.com/#narrow/stream/237432-ml-ecosystem-coordination/topic/ResNet.20weights for details, but TL;DR no batchnorm or bias! If it hasn't done so already, this should be fixed in #70 eventually.

@maxfreu
Copy link

maxfreu commented Aug 7, 2021

Has the PR brought any change?

@darsnack
Copy link
Member

darsnack commented Aug 7, 2021

Not other than discovering that the previous pre-trained model was wrong. I'm working on re-training the current models.

@maxfreu
Copy link

maxfreu commented Aug 7, 2021

So re-training is all whats needed to fix it (hopefully)? Or is there still more to it? Outputs changing all the time sounds like BN layers are continuously adapting.

@ToucheSir
Copy link
Member

Outputs shouldn't change constantly if the normalization layers are frozen, so I assume they weren't in the example above. This definitely shouldn't be a problem with the new models, because Flux.testmode! should work OOTB.

@darsnack
Copy link
Member

darsnack commented Aug 7, 2021

Yeah this issue describes the old implementations. The new ones on master should not suffer from this issue. Training should be all that's necessary for a working model.

@maxfreu
Copy link

maxfreu commented Aug 8, 2021

Do you need help with training more models? Where’s the bottleneck?

@darsnack
Copy link
Member

darsnack commented Aug 8, 2021

Primarily the lack of a multi-GPU flow for DDP. Hopefully once the JuliaCon code is released, I can use that. For now, I'll just train with a single GPU.

@maxfreu
Copy link

maxfreu commented Aug 8, 2021

If you provide the training fixture code, I can provide hardware and electrons :)

Edit: Sounds like I have to watch the juliacon videos!!

@trahflow
Copy link
Contributor

trahflow commented Oct 9, 2021

so is this still relevant, or were the models re-trained in the meantime? (and if they were, did this fix the issue?)

@darsnack
Copy link
Member

darsnack commented Oct 9, 2021

No models have been re-trained so far. I can help guide someone to set it up if they have a GPU. Pre-trained weights would be a welcome contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants