Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
tckgen termination after 'MR::Exception' #294
When running the tckgen command on a cluster I sometimes see the following error:
terminate called after throwing an instance of 'MR::Exception'
An example command is as follows:
tckgen -seed_image aligned_mask.nii -mask aligned_mask.nii -algorithm iFOD2 -number 50000000 -maxlength 400 -minlength 2 -downsample 2 -act 5TT.nii -crop_at_gmwmi -backtrack -quiet -force aligned_HARDI_fod.nii /scratch/aligned_HARDI_fod.tck
I am running a standalone version of mrtrix3 on a cluster but will try to change to the static build in future.
I have a suspicion that this might be due to available disk space. I write the .tck files to /scratch which is a space available to each independent node. Each of ~16 cores can than write to these local drives in parallel. Perhaps this issue is occurring when the /scratch disk is reaching capacity.
As an example here are the storage details of node12 (which I believe has a total storage capacity of 457Go and we can see 423Go are accounted for by the .tck files):
OK, that would explain the problem. The track writer will most likely throw an Exception if it can't write - so at least the symptoms are consistent with that...
However, the program shouldn't crash out like this - at worst it should hang (not great either mind you, but at least it would be consistent with what I'd expect to happen). For completeness, we had a discussion about how to handle exceptions being thrown in a multi-threading context (see #167), and I pushed some changes to handle this on 12 March, with pull request #180. So I'm surprised to see this happen at all - unless you haven't updated your installation since then...?
Actually, having thought about it a little, I think it might be fine as-is. I was thinking the worker threads (which generate the tracks) would hang waiting for the writer thread to clear its backlog (which it clearly can't, having just thrown an exception and hence terminated). Thankfully, the queue backend that handles feeding streamlines from the worker threads to the writer thread will shut everything down if no-one's listening... It should work fine, give it a shot.
A quick update.
I installed the static version and everything runs well. I'm pretty certain this was an out of disk space error and also think that if I switch to SIFT2 I won't encounter this error in future (no need for the 40-60Gb .tck files I was generating previously).
I'll close this.