New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seg fault after upgrading to Julia 1.5 #589
Comments
The latest tests seem to be ok: https://gitlab.com/JuliaGPU/Knet.jl/-/jobs/669021495 |
Knet v1.3.9 I tried Chain(x -> unpool(x)) instead of calling Chain(UnPool()), and still got the same problem. Looks like there are some C libraries involved too, which is over my head.... |
Could be, if Knet passes tests on your setup please open an issue with
KnetLayers.
…On Mon, Aug 3, 2020 at 11:51 PM andevellicus ***@***.***> wrote:
Knet v1.3.9
KnetLayers v0.2.0
Looking at the stack trace, I suppose it might be an issue with KnetLayers?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#589 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN43JUHUSZUYGFKAMAOCWTR64PNLANCNFSM4PTT77QQ>
.
|
Looks like it's something with unpool. If I replace Chain(UnPool, myconvblock) with Chain(x->unpool(x)) or even just x->unpool(x) I get the same fault. From the stacktrace, it looks like Autograd and unpool might be at fault? Maybe something in the forwargs... Not sure what changed between Julia 1.4 and 1.5 to cause that, it's a little over my head.
Having said that, when I just do a plain UnPool()(KnetArray(rand(100, 100, 100, 1, 1))) in REPL it works fine, so I suppose it must be running into issues when taking the gradient. |
So in looking into it more, the only thing I can think of that changed with Julia 1.5 is the way structs are allocated. From the release notes: Not sure what's so special about unpool that it'd mess autograd up, but the line that seems to be at issue is
in Autograd's core.jl. I have a pretty complex network so I'll have to see if I can reproduce it with something simpler. |
Interestingly, if I remove unpool completely and replace it with a hand written upsampling layer, I still get a similar error, only this time it's within my loss function.
Stack trace is:
Should I move this over to Autograd.jl? |
I need a small example to replicate the error. For example none of the following gives me issues:
|
I think I may have an idea of where the problem is. Below is some example code that throws the error on my computer:
I think
|
Tried this after the new 1.4 Knet update... still with the same issue. |
In your MWE you call dice_loss with an Array and a KnetArray which results in type mismatch. Even when I fix that x,y are (8,8,1,1) whereas model(x) is (14,14,8,1). These are not broadcastable shapes so dice_loss throws an error. Can you: (1) not use KnetLayers for now to isolate the problem (you only use Chain which can be defined in a few lines, e.g. here, (2) send me an MWE that fails with Knet-1.4 and Julia-1.5. |
I tried to run my MWE in #602 with Julia 1.5 and Knet 1.4. It doesn't call KnetLayers per se, but essentially uses almost identical code... still the same error. Will try to provide the model sans KnetLayers esque model. |
Ok, using the following code:
I get these results:
Here are the rest of my packages:
|
Ok, more digging... When I change
to
code works fine. So that leads me to believe there is an issue in
I commented out the update part, leaving just the diff macro, and I was back to getting errors... so I think there's something going on in AutoGrad.... |
Unfortunately I cannot get your code to fail ;( I tried it in windows thinking maybe it is an OS difference. It finished without any errors. My setup:
|
I found the solution to my error, though why it's specific to my system I have no idea. Once I changed
to
everything works fine. I have no earthly idea why that matters, but holy @#*$ am I glad it was a simple solution. I was despairing that it was something inherent to my box and OS, and there was much anguish and wringing of hands as well as explitives. Life is good now :) Thanks for taking the time to look into it. Will close now. |
I'm an idiot, just realized that taking the model function out of @ diff means that diff doesn't get applied to the model function. Nevermind.... |
So I found the source of the issue. Looks like for some reason on my system @ diff misbehaves. Once I did
No more errors. Any suggestions on how to actually collect the loss from ex? Or would this require modifying the @ diff macro? |
Welp, turns out that it's an Archlinux issue as you suggested. The default packages are messed up, when I install the bins everything is just dandy. Apologies for the wild goose chase, appreciate your patience. |
Code was working fine until Julia 1.5. Looks like there's something not right with autograd and unpool? Stack trace below:
The text was updated successfully, but these errors were encountered: