New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] eddy_cuda fails if ran with --estimate_move_by_susceptibility
#180
Comments
Might be related: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=FSL;b671062b.1910 |
Full log output is here in case useful: |
Good find @fredrmag, can you confirm there's a NaN in the pixdim field in the header? |
Are there NaNs in any of the inputs to eddy? If you're using fieldmaps there will likely be other image inputs. Have you had any successful runs with |
This participant do not have a fieldmap. Only other nifty input is the mask. Does not have NaN in the header. But, this NaN header should be fixed by Jesper in October 2019. So, if qsiprep is using the newest FSL, it should'nt be a problem? |
We have succesfully ran eddy_cuda with the |
When eddy ran successfully with the |
@mattcieslak, we created a singularity image of of the fix proposed by PR #181 using commit: 32462e9. @ethanknights tested this on a dataset, and we still get the same error message. I will attach the logs.
|
Very interesting. That PR had fsl 6.0.4, which is supposed to contain some big improvements for eddy. It looks like the GPU is correctly identified and a number of iterations are run successfully before this crash happens. Maybe the GPU is running out of memory? The error is definitely happening in the movement-by-susceptibility estimation when trying to allocate a new image. Is there a way to request more GPU memory on your cluster? |
This must be an FSL bug then. What do you think would be the best approach? Maybe downgrade to 6.0.2 for now until FSL fixes it? |
Hmm, I tried to run qsiprep with FSL 6.0.2, and I still get the same error. I find it strange why FSL 6.0.4 should not work when they have made a lot of improvements with eddy especially using the To summarize: Could it be something inside the qsiprep container that makes things fail? Are we missing some dependencies? We are going to do the following tests the next days:
|
Are you using singularity or docker? Maybe there's something strange with the singularity or docker configuration? If you're willing to send an example subject I can test locally too, this is an important thing to have working. |
We are using singularity with the I am not sure if we are allowed to share any data. I will test it on Have you managed to run qsiprep succesfully using singularity and eddy_cuda with the Btw, thanks for your help on this! |
So, the 1. test from the previous post is done:
Maike from our group says the following:
We are in the process of double checking that the fsl container used works on separate data. |
Hi @mattcieslak, I managed to reproduce this issue outside of qsiprep. Suggesting that this is not an issue related to qsiprep, but related to the FSL binaries. I also found this report from the FSL mailing list, where a user reported the same problem. I have tried to compile FSL with CUDA 10.2 and run it outside of a singularity container but this resulted in the same error. I have reported it to the FSL mailing list, and Jesper suggested it might be related to 3 things:
He will run the exact same data (ds003047_sub-01_ses-1) and commands I used to check if it is a FSL bug. My bet is the hardware issue. As we get the same error for the binaries compiled by FSL, and that they have improved upon the eddy_cuda with the Jesper suggests that recompiling the eddy_cuda binary with the flags I will keep you updated, |
Thanks for looking into this!! I will also try on a machine with a GPU and an exact matching version of cuda (9.1) |
Jesper from FSL just confirmed that this is a bug in eddy: https://www.jiscmail.ac.uk/cgi-bin/wa-jisc.exe?A2=ind2010&L=FSL&O=D&X=70686BB980F21516B4&Y=fredrik.magnussen%40psykologi.uio.no&P=190319 I also tried to compile eddy_cuda with the |
Jesper from FSL has fixed the bug. His response:
So, this will be fixed in the next FSL release, that is to come in the end of the year. |
I am closing this issue, as it is a bug in FSL, not qsiprep. |
Hi,
This bug might be more related to FSL. But I will post it here anyway.
When running qsiprep v.0.11.0 using
eddy_cuda
with the flag--estimate_move_by_susceptibility
. Our group get the following error:The text was updated successfully, but these errors were encountered: