New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcksample: Redesign #480

Closed
Lestropie opened this Issue Mar 21, 2016 · 7 comments

Comments

Projects
None yet
2 participants
@Lestropie
Copy link
Member

Lestropie commented Mar 21, 2016

I'm intending to redesign how this command / operation is used / performed, but want to make sure that I'm not removing any used functionalities by doing so.

Currently tcksample is a bit of a mish-mash between 'sampling' from an image underlying the streamlines, and re-'sampling' the streamlines points themselves. Personally I'd rather disambiguate these two, even if it does mean writing out an intermediary track file; in its current state it's difficult to add functionality, and it's easy for things to go wrong in these kind of operations if the intermediate results aren't checked.

So here's what I'm thinking:

  • Currently tckedit has the capability to upsample and downsample streamlines, and extract just the endpoints. This could be expanded to include resampling at a desired step size, and sampling a fixed number of points per streamline. The wacky vector / arc-based resampling probably don't fit there though.
  • tckedit should also get the capability to apply affine & non-linear warps to streamlines. Or this could go to a new command e.g. tckwarp to make the order of operations more explicit.
  • In addition to providing the image value underlying each point, tcksample will also have the option of generating a statistic per streamline, e.g. integral, mean, min, max, median.
  • tcksample will also have the option of using the precise streamline->voxel mapping mechanism. So e.g. instead of acquiring lots of trilinear-interpolated values at lots of points along the streamline and then taking the mean, just get the length of the streamline through each voxel in the image to be sampled and then take a weighted mean of those image values.

What I'm stuck on is the current tcksample functionalities where a fixed number of points per streamline are generated based on start and end points, with an optional 'waypoint' forming a circle arc, along which the resampling of each streamline should be performed. I'm considering extracting just this functionality, and placing it in a command with a different name, e.g. tckresample. But this would only output a new track file; sampling the underlying image would then be handled by the new tcksample command.

Thoughts?

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Mar 21, 2016

Well, in the previous MRtrix 0.2 version, this command was split into two: sample_tracks and resample_tracks. The thing is, in practice, no one really needs the tracks after resampling - you can get them if you want, but that's primarily for debugging and display purposes. In truth, part of the reason I merged them into one for MRtrix3 was expediency (just needed to code it up quickly at the time), and a lack of inspiration on the naming front: whichever way you spin it, the 'wacky resampling' bit is hard to christen...

Looking at what you're proposing, I can see the value-add of getting summary per-streamline values, and to some extent the precise sampling (just not convinced it would make a huge difference in practice). I can see that track resampling to a different step size would fit neatly into tckedit. tcknormalise can already apply non-linear warps, although it would need to be updated once the registration framework is established - and possibly renamed to tckwarp.

I guess what I'm puzzled by is the suggestion that tcksample needs a redesign. You're talking about a couple of enhancements, and additional functionality in other commands, but by and large the only thing that seems to jar here is the 'wacky resampling' feature. The thing is, this feature needs to stay somewhere, and having scratched my head about it for some time, I reckon it's probably best left where it is. I appreciate it's not the cleanest in that it mashes two operations together, and so to some extent obscures what's going on, but it does fit in quite neatly in terms of discoverability - it's not unlikely to be the first command users will look to for this purpose.

Anyway, if you really feel the 'wacky resampling' aspect needs to be split off from tcksample, I don't have any major objections other than wondering what need it's addressing... The challenge though is going to be to figure out what you're going to call this - and tckwackyresample doesn't qualify...

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Mar 21, 2016

I agree that you almost certainly don't need the tracks; no argument there. The command as it stands works because the two functionalities can more-or-less live side-by-side. But as soon as you try to expand features, it starts to get confusing as to what options apply to what, and what order things are done in (e.g. there's already explicit instructions required for the -warp option about what order things are done in).

By splitting, the order of operations becomes explicit, and for new options / functionalities it's clear as to what they apply. In the future, I'd also like to have what I'm currently calling tckresample resample based on an exemplar streamline (with another command calculating said exemplar) rather than a plane / arc, and indeed other options are possible. And tcksample will hopefully eventually sample SH amplitudes / fixel values. As the list grows longer it gets more confusing having the two functionalities bundled.

The intermediate track file could potentially be dealt with once track format handlers are in place by enabling track file piping (though not sure if tmpfs disk space will become an issue here). But given the scope of what can go wrong with processes like these, exacerbated by non-linear registrations etc., personally I prefer the default being to generate the intermediate track file that a user can choose to ignore, rather than not generating it (which implies that it doesn't need checking).

tcksample isn't exactly a massive redesign: more that the bulk of the existing code was for handling the 'resampling' functionality, so by the time the per-streamline statistics were implemented and the resampling was removed, the amount of carry-over code was very minimal. I've got a preliminary commit in the tcksample branch if you want to take a look.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Mar 21, 2016

But as soon as you try to expand features, it starts to get confusing as to what options apply to what, and what order things are done in (e.g. there's already explicit instructions required for the -warp option about what order things are done in).

Ah yes, I'd forgotten about the -warp option. I agree it does complicate things somewhat, and that things might get out of hand as features get added.

In the future, I'd also like to have what I'm currently calling tckresample resample based on an exemplar streamline (with another command calculating said exemplar) rather than a plane / arc, and indeed other options are possible.

Yes, as soon as you add other possibility for the resampling, this makes sense. Sampling along an exemplar sounds like a neat idea.

The intermediate track file could potentially be dealt with once track format handlers are in place by enabling track file piping (though not sure if tmpfs disk space will become an issue here).

Yes, it's becoming clear that being able to pipe streamlines would help with a lot of these issues. The tmpfs disk space is a non-issue here by the way, since in this case I'd advocate streaming the data directly through the pipe - no need to write to a temporary file. This would allow the pipeline to run on memory-constrained system without any issues. The reason why it would be possible in this case is because the data genuinely comes out as a stream - as opposed to images that require random access while either command is accessing it. This may imply that methods like SIFT would not be compatible with piping? But then it's probably best to require users to explicitly store the tracks in these cases anyway, given how large the files are going to be - anything else would probably entail higher RAM usage during execution one way or the other...

tcksample isn't exactly a massive redesign: more that the bulk of the existing code was for handling the 'resampling' functionality, so by the time the per-streamline statistics were implemented and the resampling was removed, the amount of carry-over code was very minimal. I've got a preliminary commit in the tcksample branch if you want to take a look.

No worries, make more sense now anyway...

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Mar 21, 2016

OK cool. I do like the idea of streamed piping, as long as it works with multiple concatenated pipes, and I suppose you might need some form of progressbar suppression (though with the progress percentage on the left now, maybe you'd get a cool flickering superposition of progress messages?).

Anyone got any thoughts on command naming? tcksample / tckresample is still a little ambiguous. And I think tckwackyresample was satirical, but I'm not quite sure. :-P

This may imply that methods like SIFT would not be compatible with piping?

Correct: the track file needs to be read twice (once to build the streamline fixel visitations, again to generate the output track file). I suppose the Tractography::Reader could have a constructor flag to say it doesn't accept streams?

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Mar 22, 2016

I do like the idea of streamed piping, as long as it works with multiple concatenated pipes

What do you mean by 'multiple concatenated pipes'? I can't see any issues with multiple sequential stages, if that's what you're asking? Much trickier if you want to support multiple pipes into the same command though, although that's a very rarely used feature which is unique to MRtrix anyway... However:

you might need some form of progressbar suppression

Yes, the output will probably not look great in this case... There is always the -quiet option, so we could at least ensure that scripts don't produce ugly output. For general command line usage, I'm not sure, it depends on how ugly the output is. If the lines genuinely just overwrite each other, I'd probably be happy to live with it. If on the other hand it causes continuous output, that would be very annoying. Not sure how you'd deal with this in such a case - maybe get any command that outputs streamlines to produce simple start and end messages?

Anyone got any thoughts on command naming? tcksample / tckresample is still a little ambiguous. And I think tckwackyresample was satirical, but I'm not quite sure. :-P

[Edited due to mobile madness]
It was. I'd be happy with tckresample though, if you were also planning on adding a resample-by-exemplar feature. However, there is still some ambiguity with changing the step size. I was thinking if do go with the name tckresample, we could have the change in step size feature in there too...? Or maybe stick the whole lot in tckedit...? It wouldn't be out of place there...

I suppose the Tractography::Reader could have a constructor flag to say it doesn't accept streams?

Seems a bit overkill to me, when this is probably a special case... (?) Might be simpler/cleaner just to check whether the filename is "-" in the app before invoking the constructor?

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Mar 23, 2016

What do you mean by 'multiple concatenated pipes'? I can't see any issues with multiple sequential stages, if that's what you're asking?

Yeah, that's just me derping. No reason why the terminal wouldn't handle stdin/stdout correctly for each command in multiple sequential stages. Just trying to fully wrap my head around this, since track data piping might help with the command role / naming issues here so thinking about attacking #411 and including this.

E.g. Currently tckedit does upsampling / downsampling, but it does upsampling before mapping to ROIs, and downsampling afterwards (to try to avoid erroneous ROI misses). Then with manual setting of the step size (a3641ec), and I'm also going to have an option for a fixed number of points per streamline, this order becomes even more ambiguous. Add the 'wacky' resampling strategies and the tckedit interface starts to get a little clumsy.

If we had track data piping, I could remove these functionalities from tckedit, have tckresample exclusively handle re-sampling of the track data (upsample, downsample, fixed step size, fixed number of points, and the 'wacky' (path-based?) cases), and then users would make their desired order of operations ('resampling' v.s. 'editing') explicit without incurring massive intermediate files.

Seems a bit overkill to me, when this is probably a special case... (?) Might be simpler/cleaner just to check whether the filename is "-" in the app before invoking the constructor?

Yeah, or just a 2-line helper function somewhere for commands to check it and give a standardised error.

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Jul 9, 2016

Closed by #653 and #555, but comments here are relevant to #411.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment