New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

tckgen interface changes #921

Merged
merged 41 commits into from Apr 6, 2017

Conversation

Projects
None yet
3 participants
@thijsdhollander
Copy link
Member

thijsdhollander commented Feb 27, 2017

In line with discussions in #809. All interface changes as mentioned in this and this post, and then some. The easiest diff to get an idea of the changes would be the docs of tckgen (i.e. tckgen.rst). The way the -select option now ends up at the top of the list, and the fat chunk of explanatory text that goes with it, should IMHO guarantee that users spot it (which was an initial concern of @jdtournier upon proposing the -select name). Existing users will of course get caught out at least once if they don't check the massive changelog that will come with the tag_0.3.16... but then again, they will be in for a shock on many other fronts too. 馃樃

Beyond individual changes to the interface, the overarching idea was to make the most common use cases (i.e. integrate over all users' use cases, taking into account the frequencies of said use cases, and potentially slightly weighting in favour of newer users that rely more on documentation ordering to find what they need) the most obviously documented. This led to reordering of the categories of options, as well as moving around some options (even between categories). This worked out quite well, with some lesser used options pertaining to seeding/initialising tracks ending up in the seeding category... so they're actually "out of the way" in the main tractography options category.

So erase all your preconditioned minds, and give the tckgen interface a look. 馃檹 (think about Tim Cook's statements about how we have to be courageous and all...)

TODO:

  • The -num_seeds option doesn't work yet, it just sets the exact_num_attempts property. I'll leave the implementation to a tckgen adept (@Lestropie, @jdtournier?), so it performs exactly as to the functionality and use case you guys had in mind.
  • Changes will probably have to happen to some docs that have tckgen examples in them...?
@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Mar 2, 2017

I'm not sure I'll have the time to look into this any time soon - it'll take a while to re-familiarise myself with this part of the code. The other outstanding issue was the suggestion that maybe some of the header key names should changed to match. I'm guessing something like countnum_selected, total_countnum_seeds, or something similar. Just worried about breaking all manner of other commands and scripts that might rely on these labels... I think count can remain as-is, it's unambiguous as to what it means once you have the file. I can see that total_count on the other hand is fairly meaningless without an explanation, so that's the one I might be tempted to change. But I'm not sure how safe it is to rename it: it's explicitly mentioned in tckedit and in the Tractography::FileBase class, so at the very least we'd need to keep interpreting that key as a synonym for the new one, at least for the foreseeable future, if we want to maintain our ability to interpret existing track files. Gets very messy once we start doing this...

I also note that this value only makes sense for a file generated by tckgen directly - if it's been edited by e.g. tckedit or tcksift, it's no longer consistent with the file contents. Not sure whether this is something we need to do anything about...?

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Mar 3, 2017

  • I would consider splitting "Tractography seeding options" to have seed sources in one group (maybe with "must provide at least one" in the option group name?), and options / parameters affecting seeding behaviour in another.

  • I think the contrast in option name between -max_seeds and -num_seeds needs to be increased (will come back to this).

I'm not sure I'll have the time to look into this any time soon - it'll take a while to re-familiarise myself with this part of the code.

I'm happy to do the coding aspect, as long as I'm convinced that I actually understand and can make work what people are asking for.

Currently the termination of tckgen is handled primarily by the class at the receiving end of the queue, which detects either the total number of streamline "attempts" (received streamlines that are empty or not; now -maxnum, proposed to be -max_seeds) or "selected" streamlines (non-empty streamlines; now -number, proposed to be -select). The other mechanism that can cause tckgen to stop is if a "finite" seed mechanism is used (-seed_grid_per_voxel or -seed_random_per_voxel), when worker threads are denied their requests for new seeds.

From the description for the new proposed -num_seeds option, I think the intended behaviour is to use a counter in the SeedList class, that triggers tckgen termination once a certain number of seeds have been drawn. This option is therefore essentially putting a finite limit on otherwise "infinite" seeding mechanisms, and cannot be used in conjunction with "finite" seeding mechanisms. This quantity is not influenced by the frequency with which the tracking algorithm rejects the seed at the init() step: The number of seeds that get through the init() step , and therefore that the algorithm actually "tracks from" (as opposed to "attempts to track from") should be reflected in -maxnum / -max_seeds.

If this is the case, the option names / descriptions could be tweaked a bit. But I'll wait to confirm that this is in fact correct.

Also regarding option naming: An alternative to throw out there might be "output" v.s. "generate"; the description would then clarify that "output" is all generated streamlines that satisfy all criteria imposed.

I also note that this value only makes sense for a file generated by tckgen directly - if it's been edited by e.g. tckedit or tcksift, it's no longer consistent with the file contents. Not sure whether this is something we need to do anything about...?

FYI Generally anything that extracts a subset of tracks, total_count is hacked to correspond to the number of tracks at input (by nulling the track but still sending it to the writer class). It may or may not make sense, and it may or may not be useful, it's just making use of the functionality that's there.

Edit: Just thought of another detail (which is kind of what spawned this whole thing): If you set -number 0 / -select 0, then tckgen will continue until the number of streamlines specified with -maxnum / -max_seeds have been generated. This should appear in the -number / -select description.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Mar 3, 2017

FYI Generally anything that extracts a subset of tracks, total_count is hacked to correspond to the number of tracks at input (by nulling the track but still sending it to the writer class).

Cool, that makes sense anyway. In which case it's OK to leave the terminology as-is, since its meaning depends on how the file was derived. That's one issue we don't have to deal with then.

An alternative to throw out there might be "output" v.s. "generate"; the description would then clarify that "output" is all generated streamlines that satisfy all criteria imposed.

Sure, I'd be happy with that. Slight issue with that though is that -output in this context might be misinterpreted as the output track file, but that's a minor issue. Same problem as with -select really: perfectly clear when looking at someone else's command (-output 1M / -select 200K), but maybe not so clear in isolation. I think I still prefer -select, but either will do.

Edit: Just thought of another detail (which is kind of what spawned this whole thing): If you set -number 0 / -select 0, then tckgen will continue until the number of streamlines specified with -maxnum / -max_seeds have been generated. This should appear in the -number / -select description.

My understanding was that -num_seeds would override -select and -maxseeds. But based on your other comments about the distinction between -num_seeds and -max_seeds, I think we need to clarify these issues, I'm not sure I completely follow all of the issues here. So:

From the description for the new proposed -num_seeds option, I think the intended behaviour is to use a counter in the SeedList class, that triggers tckgen termination once a certain number of seeds have been drawn. This option is therefore essentially putting a finite limit on otherwise "infinite" seeding mechanisms, and cannot be used in conjunction with "finite" seeding mechanisms. This quantity is not influenced by the frequency with which the tracking algorithm rejects the seed at the init() step: The number of seeds that get through the init() step , and therefore that the algorithm actually "tracks from" (as opposed to "attempts to track from") should be reflected in -maxnum / -max_seeds.

I'm just going to try to clarify what's going in my head, and give them labels so I can reason about these (I'm not suggesting these as the terminology, just to clarify the different concepts here). I have a feeling this probably repeats a lot of what's already been said, but bear with me, I'm getting old (40 next week... 馃懘). So we have:

  • attempts: attempt to start from a specific seed location along some orientation
  • seeds: (exhaustive) attempt to start from a specific seed location
  • starts: successful initiation of a streamline from a seed
  • selections: a streamline generated from a seed that meets the criteria and makes it to the output

So currently, as far as I can tell, for 'infinite' seeders, attempts happen at random locations within the seed region, with random orientations (unless -initdirection is set), and if the algorithm fails to start along that orientation from that location (i.e. below threshold), then it moves on to a fresh attempt at a different seed location. There is a max_attempt member in the Seeder::Base class, which is stated to be the "maximum number of times the tracking algorithm should attempt to start from each provided seed point", but it's currently unused, as far as I can tell (judging from a quick grep -r max_attempts src/ cmd/) - I'm guessing this is works-in-progress.

So as currently implemented, we'll have a lot more attempts than seeds, even in regions where seeding should be no problem, since most of the random orientations won't be above threshold - what this post was about. This isn't a problem if we just keep going and don't care about the number of attempts/seeds, but if that number is to be meaningful, I think we'd need to fix this to make sure that the terminology was consistent and well-defined, so that a seed really involved trying a bit harder than just a single attempt. Seems a bit problematic for iFOD2 from what @Lestropie says, but I guess we can just ask it to try harder in this case.

But this still leaves the distinction between seeds and starts (as I've defined them above). If we change behaviour so that seeds genuinely are a bit more exhaustive, then I think it would make sense to have -max_seeds and -num_seeds both influence the number of seeds, even if the final number of actual starts is then allowed to be lower than this. This I think would be consistent with the finite seeders, where any seeds that aren't successful just don't lead to selections, but get recorded in the statistics as NO_PROPAGATION_FROM_SEED.

In this case, I think we can still rely on using the receiver to terminate tracking, provided streamlines get passed down the queue even if seeding was not successful, and the statistics get recorded right - seems like a relatively trivial change as far as I can tell (?), provided we can sort out the 'more attempts per seed' problem. And in this case, it makes sense that these options would have no effect if finite seeders are used.

Sorry for the duplication, but hopefully this matches what we all have in mind here...?

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Mar 6, 2017

I'll answer with a few separate posts, so things don't get too mixed up. Here's number 1:

I would consider splitting "Tractography seeding options" to have seed sources in one group (maybe with "must provide at least one" in the option group name?), and options / parameters affecting seeding behaviour in another.

Sounds like a good plan. The reordering of categories and options within/between categories I already did was with the intention of having more commonly used stuff at the top. If this split would happen, I'd list the "seed sources" group right after the "streamlines tractography options" category (which is now the first one, prominently featuring -select as the first option). I reason "options/parameters affecting seeding behaviour" is definitely less used by a large chunk of users; these options go more towards specialised scenarios. Note I've also swapped the -seed_sphere and -seed_image options: I reckon -seed_image is definitely much more used (and more usable, since one can just draw/save an ROI for this purpose, rather than noting coordinates and radii).

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Mar 6, 2017

Number 2:

@jdtournier , either I'm misinterpreting something, or there's an inconsistency in your explanation above... Given your first statements:

So currently, as far as I can tell, for 'infinite' seeders, attempts happen at random locations within the seed region, with random orientations (unless -initdirection is set), and if the algorithm fails to start along that orientation from that location (i.e. below threshold), then it moves on to a fresh attempt at a different seed location.

...I fail to combine these with this statement:

So as currently implemented, we'll have a lot more attempts than seeds, ...

It seems to me that the first bit above implies that we have an equal amount of attempts and seeds (because a fresh attempt is at a different seed), but that we have a lot more attempts (and seeds) than starts.

So your goal, if I get it right, is to reduce the drastic difference in numbers between seeds and starts, by doing more attempts per seed. Or semi-formally, we currently have:

attempts == seeds >>> starts

and the proposed changes would aim for:

attempts >> seeds > starts

Where my numbers of ">" symbols vaguely describe levels of "more than".

Apart from checking whether I got it right, I do fully agree with it though. This would make the seeds concept (and number) much more meaningful, because, as you say, the number of attempts isn't really all that meaningful (it's just high because of most FODs typically being very sparse).
I also fully agree that we'd want to look at it from the point of view of seeds, and not starts. Seeders that e.g. do a fixed number of seeds per voxel, say for example 1 seed for each voxel in the centre of the voxel, also implicitly state the number of seeds beforehand (e.g. the same as the number of voxels in the mask, for the example of 1 seed per voxel), but not the starts (which can end up to be less than the number of seeds).

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Mar 6, 2017

Finally, number 3:

In line with the above, I agree with

But this still leaves the distinction between seeds and starts (as I've defined them above). If we change behaviour so that seeds genuinely are a bit more exhaustive, then I think it would make sense to have -max_seeds and -num_seeds both influence the number of seeds, even if the final number of actual starts is then allowed to be lower than this. This I think would be consistent with the finite seeders, where any seeds that aren't successful just don't lead to selections, but get recorded in the statistics as NO_PROPAGATION_FROM_SEED.

The number of starts, to me, is quite an artificial concept again (even though it exists programatically): if a track is started, but not selected, then it basically just fulfils part of the constraints, but not all. To me, from a seed(-point, potentially with many attempts), you end up with a selected track, or with none at all. A "started" track may just as well only have done one step and then failed to proceed due to the other constraints... how much more of a track is it then, compared to the one ("of length zero") that didn't succeed to start from the seed at all?

So building on this logic, I think the "generated" terminology may not be favourable: "generated" seems to refer to the "starts" concept, rather than the "seeds" one. It's funny actually, we currently (even before this branch and pull request) already have the word "selected" used somewhere; in the command line output during a run, e.g.:

tckgen: [100%]     1177 generated,     1000 selected

But in the context of the above proposal, that "generated" would better become "seeds". The "selected" might become "selected tracks", so it's a bit more explicit.

Finally as to -output versus -select; I think I also prefer -select just slightly more. It may just be me (and @jdtournier), but I also get triggered by -ouput into thinking that it's an option to supply an output file name to. -select feels, to me, more in line with the fact that it survived a set of constraints. As evidenced by the above command line output, it's also slightly more in line with the terminology of "selected" we already have out there.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Mar 6, 2017

About point 2: yes, we're on the same page. I guess my attempts at defining my terms failed to clarify the argument... Oh well.

But yes, the point is that currently, with infinite seeders there are really no such thing as seeds (as defined above), only attempts, since there is no proper effort to try to start from each seed location (only a single attempt). What I meant by my contradictory comment:

As currently implemented, we'll have a lot more attempts than seeds

is that very few of these attempts will pass the init() stage, and hence be considered as genuine seeds from which streamlines can be initiated - there's a difference in terminology here, which probably contributes to the confusion...

I guess the big distinction is that currently, seed location/directions are drawn from the combined seed ROI / orientation domain by random rejection sampling, and seeds produced by this process are already guaranteed to have FOD amplitude above threshold along the corresponding direction, so are already valid start points too - means there is no distinction between a seed and a start. What we're proposing these seeders should do is a more exhaustive search (lots of attempts) for a suitable initial direction of tracking from each seed, and if successful, then that seed is also a start - otherwise it remains just a (failed) seed. To use your notation:

  • Currently: attempts >>> seeds == starts > selected
  • Proposed: attempts >> seeds > starts > selected

Note here that the exact definition of the term 'seed' differs between the two versions, which is probably what's causing the apparent contradiction here...

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Mar 19, 2017

I'm happy with this if the -maxnum gets renamed to -seeds. The issue was that the idea behind -maxnum was really to set the failure rate, rather than the number of seeds - although it eventually became clear it could be used to set the number of seeds. I think if we clarify the terminology as we've been discussing, this is no longer an issue, and makes a lot more sense.

Yep, fully agree with this now too; for the exact same reasons. The new terminology should really help a lot to get this logic across.

It's already pretty self-evident with just -select and -seeds, let's keep it simple.

Yep again, very happy with keeping it simple. In line with what Rob mentioned, I think having them state a ratio rather than absolute number is actually harder to explain (and understand, I reckon). I can see people actually calculating the absolute number from the ratio they would state in such a scenario. Absolute numbers are the way to go. 馃憤

set the number of attempts per seed to a blanket 1,000

Fully agree too, if only for clarity. I've just added this to the documentation too. It's much easier to say "default: 1000" than to say "default: long conditional text goes here". For most users, I reckon this will remain an obscure option anyway (that most would also not have to change, so that's good).

I've updated some other docs, and even a functional reference to an old option (property) in tckgen.cpp. I've also slightly changed the name of max_seed_attempts to max_attempts_per_seed: the former sounded very much like equivalent of the old -maxnum that became the current -seeds; i.e. how many seeds will be attempted, rather than how many attempt will be made for each seed. The downside, as always, is that the option name is a bit longer... but I reckon this is not a problem, since the option should probably not be used/changed in most "default" scenarios. We should probably keep an eye on things ourselves though, to find out if 1000 is indeed a good default across a wide range of "typical" or "realistic" scenarios.

It's probably a good idea to get yet another pair of eyes to finally check that we've not overlooked any other terminology changes, and the whole beast is still/again self-consistent ( @jdtournier ? 馃榿 ). I think most of the external / UI logic sits entirely in...

cmd/tckgen.cpp
src/dwi/tractography/tracking/tractography.cpp
src/dwi/tractography/seeding/seeding.cpp

...right?

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Mar 20, 2017

I've updated the docs for examples of tckgen. It's surprising how the old -maxnum option / new -seeds option appears nowhere (except in the reference of tckgen itself of course). Well, actually not that surprising really: it shows that these options are certainly beyond the default scenarios most of the time.

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Mar 20, 2017

Ok, finally (as I think all the rest is good now), while we're changing interface as well as behaviour here anyway: would it be worthwhile to "upgrade" the default number of streamlines to be selected to something bigger than 1000? My reasoning is that, since the introduction of that default long ago, the average user's hardware is probably a bit more capable nowadays. A bigger number, e.g. 5000 or maybe even 10000, better shows off the "default" capabilities of a probabilistic tractography algorithm such as iFOD2 for the whole brain scenario. Every once in a while, there still seems to be a user on the forum who is a bit unimpressed with the "sparsity of our tractograms compared to other-diffusion-package-X". Of course, in any realistic scenario, a user should use the -select option; but so then again, should the default not be a better initial showcase?

Well, just an idea of course; happy to hear opinions. :-)

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Mar 20, 2017

So I've had a quick look, and updated the doc for the -seeds option - I think it was describing the old intent, rather than what we'd like it to be now, see what you think.

Otherwise, it all looks good, and seems to work as expected - at least from my very limited testing. A couple of points though:

  • NO_PROPAGATION_FROM_SEED is still there - we had loosely discussed maybe labelling it as TRACK_TOO_SHORT? No big deal, but if it stays it would be good to figure out what its purpose was... (?)

  • We had discussed raising the default for TCKGEN_DEFAULT_SEED_TO_SELECT_RATIO (currently 100). I'd be happy to raise it to 1,000, it might indeed help for those hard to track structures (optic radiations are a prime example).

  • I'm not entirely convinced about raising the default for -select, but I'm not against it either. The point of 1000 is that it's a sensible default for the number of selections for your average tract when doing targetted seeding - e.g. to delineate the CST, optic radiations, SLF, etc. It will undoubtedly be too low for anything approaching whole-brain, but then it's difficult to see how to set a default that would be appropriate in all cases. It's also worth bearing in mind that by default, this will lead to a maximum of 100K streamlines being seeded (1M if we increase that default), which in cases where the tractography has a very low success rate, may result in very long runs even for that number. For example, tracking the optic radiation from LGN to V1 using a simple -seed & -include strategy, using -seeds 1M -select 1K (which would be the default if we go with the above) can take a while on my system:

    tckgen: [100%]   640256 seeds,   637366 streamlines,     1000 selected
    real	0m43.283s
    user	8m34.143s
    sys	0m0.477s
    

    And looking at the reconstruction obtained with this, it looks dense enough, certainly for display purposes:
    screenshot from 2017-03-20 14-47-06
    So I'm not altogether convinced it needs changing. Perhaps you can give us examples of use cases where it's not sufficient?

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Mar 20, 2017

updated the doc for the -seeds option

Happy.

NO_PROPAGATION_FROM_SEED is still there

It's still there, in part because it's there and it'd be more work to revert it back so that these get reported as TRACK_TOO_SHORT, in part because I don't recall / can't find where that functionality was requested but if it were removed there'd be at least one person out there unhappy with the decision.

We had discussed raising the default for TCKGEN_DEFAULT_SEED_TO_SELECT_RATIO (currently 100). I'd be happy to raise it to 1,000, it might indeed help for those hard to track structures (optic radiations are a prime example).

Not against it either. But for both here and for lengthy instructions for -select / -number: It might be worth adding some test to the DESCRIPTION field regarding how these behave; that'll be ore likely to be spotted by new users than specific options (even though each is now at the head of their relevant option group).

I'm not entirely convinced about raising the default for -select, but I'm not against it either.

Same. The intent is just to "do some tracking" if neither is specified. Maybe a warning-level message in this instance would be better (e.g. [WARNING] Defaulting to 1,000 selected streamlines in absence of -select / -seeds options)?

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Mar 21, 2017

NO_PROPAGATION_FROM_SEED is still there

It's still there, in part because it's there and it'd be more work to revert it back so that these get reported as TRACK_TOO_SHORT, in part because I don't recall / can't find where that functionality was requested but if it were removed there'd be at least one person out there unhappy with the decision.

I assumed this was the case. I'm not too bothered either, but if users rely on TRACK_TOO_SHORT for whatever reason, they might need to think a bit about what to do with NO_PROPAGATION_FROM_SEED: should it be included in their total or not...? No big deal either way, but it might lead to confusion - especially if we ourselves can't figure out why it exists in the first place.

It might be worth adding some test to the DESCRIPTION field regarding how these behave;

Good idea. A little paragraph with a brief overview might do the trick. I'll see if I can draft something this evening.

Maybe a warning-level message in this instance would be better (e.g. [WARNING] Defaulting to 1,000 selected streamlines in absence of -select / -seeds options)?

I'm not sure that's warranted. Maybe as an info-level message, but I'd reserve warnings for conditions that are dangerous and/or likely to result in incorrect output. Besides, the progressbar essentially says the same thing already - although I agree it won't say why. But given that basically the first thing they'll come across when looking at the help page will relate to this issue, any potential confusion should be short-lived.

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented on a439e82 Mar 21, 2017

MAX_TRIALS doesn't apply to seeding: it's used for the maximum number of direction samples per step in probabilistic algorithms (e.g. see here v.s. here). One could possibly argue whether or not that and MAX_ATTEMPTS_PER_SEED should be equivalent; just not sure that was your actual intent here.

This comment has been minimized.

Copy link
Member Author

jdtournier replied Mar 21, 2017

Quite right... I'll fix that ASAP.

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Mar 21, 2017

I assumed this was the case. I'm not too bothered either, but if users rely on TRACK_TOO_SHORT for whatever reason, they might need to think a bit about what to do with NO_PROPAGATION_FROM_SEED: should it be included in their total or not...? No big deal either way, but it might lead to confusion - especially if we ourselves can't figure out why it exists in the first place.

Looks like I earlier missed the hint that left myself in #734:

... (since [the minimum length] may indeed be set to 0mm, but is forced to be at least two points)

So clearly someone was confused as to the streamline rejection (reporting) behaviour when setting -minlength 0 (still can't find the post though...). I reject streamlines with 1 point only (as long as it's not the seedtest algorithm), whereas regardless of user input I set the minimum number of points to be at least 2 (in order to constitute a "track"). So I think I'd rather keep as-is; could just document somewhere if necessary.

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Mar 27, 2017

Sorry for not chipping in earlier, because this pull request should be about finished now. A few quick things:

  1. Happy with the increase of seed-to-select ratio to 1000! I think I brought that up in the previous thread / pull request somewhere, but the rationale was basically to decrease the chances of (with the default settings) hit the maximum number of seeds before the tracks to select target had been reached. So more users would find themselves in more scenarios actually obtaining the number of tracks requested via -select. The default -seeds number can of course still be hit, but that would indicate an "exceptionally" hard to track bundle, where it's more legitimately up to the user to take action (and increase -seeds manually).

  2. About...

So clearly someone was confused as to the streamline rejection (reporting) behaviour when setting -minlength 0 (still can't find the post though...). I reject streamlines with 1 point only (as long as it's not the seedtest algorithm), whereas regardless of user input I set the minimum number of points to be at least 2 (in order to constitute a "track"). So I think I'd rather keep as-is; could just document somewhere if necessary.

As far as I understand this correctly, it also seems to make sense me to keep this behaviour. Just wondering in what scenario(s) streamlines with 1 point only occur. This would be a successful seed, so it found at the seed location an orientation of the FOD above the cutoff (otherwise it would not go on to become a "streamline", wrt our current progress output, right?). Having established that, it'll take 1 tracking step and find itself e.g. outside of the mask, or in a spatial location where the FOD within the angular threshold has no amplitudes above the cutoff. Am I correct that this second location (after the first tracking step) is hence not a part of the streamline, so it ends up having only 1 point and be rejected (regardless of -minlength 0)? It makes sense to reject it on grounds of "not being a streamline", but does it get added to the number of "streamlines" in the progress output? It sounds like it should not get added to the "streamlines" count, if it didn't successfully get it second point yet, and has "a priori" (i.e. even when -minlength 0) no chances to end up in the "selected" category.

  1. About the default for -select...

I'm not entirely convinced about raising the default for -select, but I'm not against it either.

I'm certainly on the same general front here, so I just wondered if there could be a lot "against it", and if not, whether it would be ok to raise it anyway. I agree there is no perfect default that'll strike a balance between the most compact of bundles on the one hand, and a whole-brain tractogram on the other hand. But even for a compact bundle, more tracks (within a reasonable number of course) doesn't hurt, while more tracks may help for other more extended bundles -- especially those that include fanning, and even more so if said fanning may be more challenging in some orientations than others. So my logic is along the lines of: what number can we (potentially) reasonably raise it to, so we at least extend the applicability to more bundles, without making the number too high for the simple intention of "just doing some tracking" (which I agree this default is about indeed).

The example of the optic radiation is a pretty compact bundle (at least in the brain of "Tournier Donald" 馃槣 ), I'd say. 1000 indeed works ok for that one. But my inspiration to consider a small increase to, let's say, 5000, came from examples like this:

Simple whole brain tractography with minimal input to tckgen, i.e. only the WM FODs and providing a whole brain mask to -seed_image:

-select 1000:
screenshot0000

-select 5000:
screenshot0001

Simplest attempt (I can imagine) at the CST, seeding from the spine, excluding the midsagittal plane from about above the pons, including an axial plane around the height of the upper half of the thalamus:

-select 1000:
screenshot0002

-select 5000:
screenshot0003

My finding being: with 5000 (versus 1000), there's definitely a much better impression as to what the (in this case) iFOD2 algorithm will ultimately explore. Even for the whole brain tractogram, it gives a pretty decent idea, whereas the 1000 case is still severely lacking to provide that impression. In a case like the CST and tracking just in one direction (from the spine up), any algorithm is often faced with the much easier "trackability" of the straight path up versus the lateral projections. Also, with 5000 streamlines, I could much more easily appreciate where my naive approach still lacks the most (in this case tracks diverging in e.g. the SLF, and then heading towards unrelated lower bits of the brain again), and eventually go back and fine-tune the criteria more.

So long story short, I often find myself using something like 5000 for the purpose of "just doing some (initial) tracking", across a very wide range of scenarios, if not almost all realistic ones. The ultimate question is then if we can "safely" increase the default to that of course. Beyond 5000 definitely doesn't seem necessary for this particular purpose.

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Mar 27, 2017

Just wondering in what scenario(s) streamlines with 1 point only occur.

Method's init() function succeeds, but next() fails at the first iteration (in both directions if bidirectional tracking). Probably unlikely to occur for any algorithm other than iFOD2: The init() function only looks for a direction at the seed point that is above the FOD amplitude threshold, whereas the first call to next() needs to find an arc segment that satisfies the FOD amplitude threshold.

I suppose one could argue whether or not iFOD2 should actually be generating that first segment as part of the init() call... but even if that were changed, it's still conceivable that some future tracking algorithm could exhibit similar behaviour, potentially even for a different reason.

Having established that, it'll take 1 tracking step and find itself e.g. outside of the mask, or in a spatial location where the FOD within the angular threshold has no amplitudes above the cutoff. Am I correct that this second location (after the first tracking step) is hence not a part of the streamline, so it ends up having only 1 point and be rejected (regardless of -minlength 0)?

If you were to ignore the above, and use unidirectional tracking, and the first step (second streamline point) terminated for one of the reasons flagged as 0 in this array but not be rejected, then yes, that second point would be omitted from the streamline and hence the "streamline" would consist of only one point.

It makes sense to reject it on grounds of "not being a streamline", but does it get added to the number of "streamlines" in the progress output?

Currently yes. It just goes down as a "rejected streamline" and contributes to the "streamline count" accordingly in the progress message. You could separate it, essentially logging a single-point streamline as a "rejected seed" from within the track_rejected() function; but if we were to do that for the command-line logging, we might as well make a similar change to the internal track rejection logging, removing "NO_PROPAGATION_FROM_SEED" and recording a zero-length streamline as "INVALID_SEED" (even though technically the seed was perfectly valid according to the method's init() function). I don't think I'd modify one and not the other.

But as I've said previously, these are really quite rare, so I'm not convinced it's worth the gymnastics.

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Apr 6, 2017

But as I've said previously, these are really quite rare, so I'm not convinced it's worth the gymnastics.

Hmm, yes, what I overlooked is that the default is of course bidirectional tracking, in which case these almost certainly become very rare indeed. I fully agree that it ends up not being worth such big changes in the back-end at this stage just for these rare cases. Maybe just something to take note of for the future; in case anyone wants to refactor the tracking code for other (more compelling) reasons, it could be taken on board at that stage. In the end, the command line output is not meant to be much more than a way to keep track of general progress, and give a sense of insight into why progress (accepted streamlines) may be slow in some scenarios.

So this pull request should basically be finished now (right?). There's still my outstanding screenshot'ed case above to increase the default selection goal from 1000 to 5000 (or anything in between really, e.g. 2000, whatever people may find themselves comfortable with 馃槈 )... but if there's still too much controversy about this (and that's ok), this is far from a priority with respect to merging the pull request. It doesn't have major repercussions on the tckgen interface anyway, which is what this pull request was essentially about.

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Apr 6, 2017

I'm happy to boost the default -select to 5,000 and merge.

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Apr 6, 2017

馃憤 Sounds good to me! @jdtournier : any objections?

@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Apr 6, 2017

(just did the commit; will wait until confirmation before merging)

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Apr 6, 2017

Happy enough with a default -select of 5000 - go for it!

@thijsdhollander thijsdhollander merged commit 467e202 into tag_0.3.16 Apr 6, 2017

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@thijsdhollander

This comment has been minimized.

Copy link
Member Author

thijsdhollander commented Apr 6, 2017

馃憤

@thijsdhollander thijsdhollander deleted the tckgen_interface_changes branch Apr 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment