Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
tckgen interface changes #921
In line with discussions in #809. All interface changes as mentioned in this and this post, and then some. The easiest diff to get an idea of the changes would be the docs of
Beyond individual changes to the interface, the overarching idea was to make the most common use cases (i.e. integrate over all users' use cases, taking into account the frequencies of said use cases, and potentially slightly weighting in favour of newer users that rely more on documentation ordering to find what they need) the most obviously documented. This led to reordering of the categories of options, as well as moving around some options (even between categories). This worked out quite well, with some lesser used options pertaining to seeding/initialising tracks ending up in the seeding category... so they're actually "out of the way" in the main tractography options category.
So erase all your preconditioned minds, and give the
I'm not sure I'll have the time to look into this any time soon - it'll take a while to re-familiarise myself with this part of the code. The other outstanding issue was the suggestion that maybe some of the header key names should changed to match. I'm guessing something like
I also note that this value only makes sense for a file generated by
I'm happy to do the coding aspect, as long as I'm convinced that I actually understand and can make work what people are asking for.
Currently the termination of
From the description for the new proposed
If this is the case, the option names / descriptions could be tweaked a bit. But I'll wait to confirm that this is in fact correct.
Also regarding option naming: An alternative to throw out there might be "output" v.s. "generate"; the description would then clarify that "output" is all generated streamlines that satisfy all criteria imposed.
FYI Generally anything that extracts a subset of tracks,
Edit: Just thought of another detail (which is kind of what spawned this whole thing): If you set
Cool, that makes sense anyway. In which case it's OK to leave the terminology as-is, since its meaning depends on how the file was derived. That's one issue we don't have to deal with then.
Sure, I'd be happy with that. Slight issue with that though is that
My understanding was that
I'm just going to try to clarify what's going in my head, and give them labels so I can reason about these (I'm not suggesting these as the terminology, just to clarify the different concepts here). I have a feeling this probably repeats a lot of what's already been said, but bear with me, I'm getting old (40 next week...
So currently, as far as I can tell, for 'infinite' seeders, attempts happen at random locations within the seed region, with random orientations (unless
So as currently implemented, we'll have a lot more attempts than seeds, even in regions where seeding should be no problem, since most of the random orientations won't be above threshold - what this post was about. This isn't a problem if we just keep going and don't care about the number of attempts/seeds, but if that number is to be meaningful, I think we'd need to fix this to make sure that the terminology was consistent and well-defined, so that a seed really involved trying a bit harder than just a single attempt. Seems a bit problematic for iFOD2 from what @Lestropie says, but I guess we can just ask it to try harder in this case.
But this still leaves the distinction between seeds and starts (as I've defined them above). If we change behaviour so that seeds genuinely are a bit more exhaustive, then I think it would make sense to have
In this case, I think we can still rely on using the receiver to terminate tracking, provided streamlines get passed down the queue even if seeding was not successful, and the statistics get recorded right - seems like a relatively trivial change as far as I can tell (?), provided we can sort out the 'more attempts per seed' problem. And in this case, it makes sense that these options would have no effect if finite seeders are used.
Sorry for the duplication, but hopefully this matches what we all have in mind here...?
I'll answer with a few separate posts, so things don't get too mixed up. Here's number 1:
Sounds like a good plan. The reordering of categories and options within/between categories I already did was with the intention of having more commonly used stuff at the top. If this split would happen, I'd list the "seed sources" group right after the "streamlines tractography options" category (which is now the first one, prominently featuring
@jdtournier , either I'm misinterpreting something, or there's an inconsistency in your explanation above... Given your first statements:
...I fail to combine these with this statement:
It seems to me that the first bit above implies that we have an equal amount of attempts and seeds (because a fresh attempt is at a different seed), but that we have a lot more attempts (and seeds) than starts.
So your goal, if I get it right, is to reduce the drastic difference in numbers between seeds and starts, by doing more attempts per seed. Or semi-formally, we currently have:
and the proposed changes would aim for:
Where my numbers of ">" symbols vaguely describe levels of "more than".
Apart from checking whether I got it right, I do fully agree with it though. This would make the seeds concept (and number) much more meaningful, because, as you say, the number of attempts isn't really all that meaningful (it's just high because of most FODs typically being very sparse).
Finally, number 3:
In line with the above, I agree with
The number of starts, to me, is quite an artificial concept again (even though it exists programatically): if a track is started, but not selected, then it basically just fulfils part of the constraints, but not all. To me, from a seed(-point, potentially with many attempts), you end up with a selected track, or with none at all. A "started" track may just as well only have done one step and then failed to proceed due to the other constraints... how much more of a track is it then, compared to the one ("of length zero") that didn't succeed to start from the seed at all?
So building on this logic, I think the "generated" terminology may not be favourable: "generated" seems to refer to the "starts" concept, rather than the "seeds" one. It's funny actually, we currently (even before this branch and pull request) already have the word "selected" used somewhere; in the command line output during a run, e.g.:
But in the context of the above proposal, that "generated" would better become "seeds". The "selected" might become "selected tracks", so it's a bit more explicit.
Finally as to
About point 2: yes, we're on the same page. I guess my attempts at defining my terms failed to clarify the argument... Oh well.
But yes, the point is that currently, with infinite seeders there are really no such thing as seeds (as defined above), only attempts, since there is no proper effort to try to start from each seed location (only a single attempt). What I meant by my contradictory comment:
is that very few of these attempts will pass the
I guess the big distinction is that currently, seed location/directions are drawn from the combined seed ROI / orientation domain by random rejection sampling, and seeds produced by this process are already guaranteed to have FOD amplitude above threshold along the corresponding direction, so are already valid start points too - means there is no distinction between a seed and a start. What we're proposing these seeders should do is a more exhaustive search (lots of attempts) for a suitable initial direction of tracking from each seed, and if successful, then that seed is also a start - otherwise it remains just a (failed) seed. To use your notation:
Note here that the exact definition of the term 'seed' differs between the two versions, which is probably what's causing the apparent contradiction here...
Yep, fully agree with this now too; for the exact same reasons. The new terminology should really help a lot to get this logic across.
Yep again, very happy with keeping it simple. In line with what Rob mentioned, I think having them state a ratio rather than absolute number is actually harder to explain (and understand, I reckon). I can see people actually calculating the absolute number from the ratio they would state in such a scenario. Absolute numbers are the way to go.
Fully agree too, if only for clarity. I've just added this to the documentation too. It's much easier to say "default: 1000" than to say "default: long conditional text goes here". For most users, I reckon this will remain an obscure option anyway (that most would also not have to change, so that's good).
I've updated some other docs, and even a functional reference to an old option (property) in
It's probably a good idea to get yet another pair of eyes to finally check that we've not overlooked any other terminology changes, and the whole beast is still/again self-consistent ( @jdtournier ?
I've updated the docs for examples of
Ok, finally (as I think all the rest is good now), while we're changing interface as well as behaviour here anyway: would it be worthwhile to "upgrade" the default number of streamlines to be selected to something bigger than 1000? My reasoning is that, since the introduction of that default long ago, the average user's hardware is probably a bit more capable nowadays. A bigger number, e.g. 5000 or maybe even 10000, better shows off the "default" capabilities of a probabilistic tractography algorithm such as iFOD2 for the whole brain scenario. Every once in a while, there still seems to be a user on the forum who is a bit unimpressed with the "sparsity of our tractograms compared to other-diffusion-package-X". Of course, in any realistic scenario, a user should use the
Well, just an idea of course; happy to hear opinions. :-)
So I've had a quick look, and updated the doc for the
Otherwise, it all looks good, and seems to work as expected - at least from my very limited testing. A couple of points though:
It's still there, in part because it's there and it'd be more work to revert it back so that these get reported as
Not against it either. But for both here and for lengthy instructions for
Same. The intent is just to "do some tracking" if neither is specified. Maybe a warning-level message in this instance would be better (e.g.
I assumed this was the case. I'm not too bothered either, but if users rely on
Good idea. A little paragraph with a brief overview might do the trick. I'll see if I can draft something this evening.
I'm not sure that's warranted. Maybe as an info-level message, but I'd reserve warnings for conditions that are dangerous and/or likely to result in incorrect output. Besides, the progressbar essentially says the same thing already - although I agree it won't say why. But given that basically the first thing they'll come across when looking at the help page will relate to this issue, any potential confusion should be short-lived.
Looks like I earlier missed the hint that left myself in #734:
So clearly someone was confused as to the streamline rejection (reporting) behaviour when setting
Sorry for not chipping in earlier, because this pull request should be about finished now. A few quick things:
As far as I understand this correctly, it also seems to make sense me to keep this behaviour. Just wondering in what scenario(s) streamlines with 1 point only occur. This would be a successful seed, so it found at the seed location an orientation of the FOD above the cutoff (otherwise it would not go on to become a "streamline", wrt our current progress output, right?). Having established that, it'll take 1 tracking step and find itself e.g. outside of the mask, or in a spatial location where the FOD within the angular threshold has no amplitudes above the cutoff. Am I correct that this second location (after the first tracking step) is hence not a part of the streamline, so it ends up having only 1 point and be rejected (regardless of
I'm certainly on the same general front here, so I just wondered if there could be a lot "against it", and if not, whether it would be ok to raise it anyway. I agree there is no perfect default that'll strike a balance between the most compact of bundles on the one hand, and a whole-brain tractogram on the other hand. But even for a compact bundle, more tracks (within a reasonable number of course) doesn't hurt, while more tracks may help for other more extended bundles -- especially those that include fanning, and even more so if said fanning may be more challenging in some orientations than others. So my logic is along the lines of: what number can we (potentially) reasonably raise it to, so we at least extend the applicability to more bundles, without making the number too high for the simple intention of "just doing some tracking" (which I agree this default is about indeed).
The example of the optic radiation is a pretty compact bundle (at least in the brain of "Tournier Donald"
Simple whole brain tractography with minimal input to
Simplest attempt (I can imagine) at the CST, seeding from the spine, excluding the midsagittal plane from about above the pons, including an axial plane around the height of the upper half of the thalamus:
My finding being: with 5000 (versus 1000), there's definitely a much better impression as to what the (in this case) iFOD2 algorithm will ultimately explore. Even for the whole brain tractogram, it gives a pretty decent idea, whereas the 1000 case is still severely lacking to provide that impression. In a case like the CST and tracking just in one direction (from the spine up), any algorithm is often faced with the much easier "trackability" of the straight path up versus the lateral projections. Also, with 5000 streamlines, I could much more easily appreciate where my naive approach still lacks the most (in this case tracks diverging in e.g. the SLF, and then heading towards unrelated lower bits of the brain again), and eventually go back and fine-tune the criteria more.
So long story short, I often find myself using something like 5000 for the purpose of "just doing some (initial) tracking", across a very wide range of scenarios, if not almost all realistic ones. The ultimate question is then if we can "safely" increase the default to that of course. Beyond 5000 definitely doesn't seem necessary for this particular purpose.
I suppose one could argue whether or not iFOD2 should actually be generating that first segment as part of the
If you were to ignore the above, and use unidirectional tracking, and the first step (second streamline point) terminated for one of the reasons flagged as 0 in this array but not be rejected, then yes, that second point would be omitted from the streamline and hence the "streamline" would consist of only one point.
Currently yes. It just goes down as a "rejected streamline" and contributes to the "streamline count" accordingly in the progress message. You could separate it, essentially logging a single-point streamline as a "rejected seed" from within the
But as I've said previously, these are really quite rare, so I'm not convinced it's worth the gymnastics.
Hmm, yes, what I overlooked is that the default is of course bidirectional tracking, in which case these almost certainly become very rare indeed. I fully agree that it ends up not being worth such big changes in the back-end at this stage just for these rare cases. Maybe just something to take note of for the future; in case anyone wants to refactor the tracking code for other (more compelling) reasons, it could be taken on board at that stage. In the end, the command line output is not meant to be much more than a way to keep track of general progress, and give a sense of insight into why progress (accepted streamlines) may be slow in some scenarios.
So this pull request should basically be finished now (right?). There's still my outstanding screenshot'ed case above to increase the default selection goal from 1000 to 5000 (or anything in between really, e.g. 2000, whatever people may find themselves comfortable with