Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixelcfestats: New option -mask #1030

Merged
merged 13 commits into from Jul 14, 2017

Conversation

Projects
None yet
3 participants
@Lestropie
Copy link
Member

commented Jun 27, 2017

Had a go since there were multiple requests. Untested, but it's easier for me to get it installed on a server to test if I push to the online repo, so might as well post here straight away in case anybody is curious.

Wasn't a single unique solution. What I've gone with is:

  • Despite using -mask option, output fixel image will still contain the same set of fixels as the input fixel template.

  • When constructing the fixel-fixel connectivity matrix, streamlines may still be assigned to a fixel that is outside the mask, if the streamline is most collinear with that fixel. However connectivity to that fixel will not contribute in any way to the analysis.

  • Only fixels within the mask are involved in data smoothing, GLM, CFE.

Benefit of keeping all fixels is that you keep fixel correspondence across different experiments from the same template. You can still see those fixels, but they should disappear as soon as any quantitative value is loaded (should; need to check). Cropping the output would have made the set of fixels within the analysis mask more immediately obvious; but you can just fixelcrop after the fact if you want.

Closes #987.

Lestropie added some commits Jun 27, 2017

fixelcfestats: New option -mask
This option restricts the statistical analysis to a subset of fixels, as defined by a boolean fixel data file.
Calculation of the fixel-fixel connectivity matrix, data smoothing, t-statistic calculation, and stiatistical enhancement, are all only performed on those fixels within the mask.
However, the output fixel directory contains all of the fixels from the input template fixel image, not just those within the mask; this retains fixel-fixel correspondence between the input template image and the output results.
For fixels outside the mask, a value of NaN is written for all fixel data file outputs.

@Lestropie Lestropie self-assigned this Jun 27, 2017

Lestropie added some commits Jun 27, 2017

fixelcfestats: Various fixels for -mask option
Also includes a bug fix where opting to not smooth the input data would likely have resulted in an error.
fixelcfestats: Change type of normalised connectivity matrix
On testing prior changes to fixelcfestats to support the -mask option, it was found to run extremely slowly, even when not using the -mask option. The suspected cause was that during the matrix normalisation, each row of the connectivity matrix (a map<uint32_t, float>) was being re-constructed from scratch (due to the possibility of fixel indices changing), and hence what was a balanced binary search tree would become essentially a sorted doubly-linked-list. Now, each row of the connectivity matrix is instead converted to essentially a vector<pair<uint32_t,float>>, since it is no longer necessary to have fast lookup for an arbitrary fixel index. This should reduce memory requirement after matrix construction has completed, and will also hopefully improve cache performance since the connectivity data for each fixel will be serialised.
There are also various little fixes and tweaks to polish off additions for supporting the -mask option.
@Lestropie

This comment has been minimized.

Copy link
Member Author

commented Jul 5, 2017

OK, looks like this is now working as it should. I'll see if I can get one of the users to test as well. Then need to decide whether to go to master or a generic dev branch (which I've been meaning to create for other things anyway).

Comes with a nice execution speedup thanks to better cache hitrate: Independently of masking, my test dataset went from 8h to 2h 😈

@Lestropie Lestropie requested a review from jdtournier Jul 5, 2017

@jdtournier jdtournier requested review from draffelt and thijsdhollander Jul 5, 2017

@jdtournier

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

These changes sound good - but I'm not sure I'm in a position to review them properly at the moment... I've added a couple of other reviewers, hopefully someone can just have a look and approve these changes more rapidly than me.

Impressed and somewhat puzzled by this:

Independently of masking, my test dataset went from 8h to 2h

Not sure I fully understand how that worked out...

@Lestropie

This comment has been minimized.

Copy link
Member Author

commented Jul 5, 2017

Not sure I fully understand how that worked out...

The connectivity matrix is built using a std::map<uint32_t, float> per fixel, so that existing connections can be found efficiently. But if std::map is implemented using a binary search tree, that's 2 64-bit pointers per 8-byte connection stored; and the memory allocations could end up scattered, depending on how STL manages memory allocation for a std::map (or ~500,000 of them at once).

With these changes, during the normalisation / thresholding step the connectivity matrix is converted to a vector<std::pair<uint32_t, float>> per fixel, making the data per fixel contiguous and reducing its total size. For the statistical enhancement you only need to loop over the data per fixel; no need for efficient lookup of a particular fixel index.

Time will tell if the speedup is reproducible, I'm only reporting a single timed run; but it makes sense.

@draffelt
Copy link
Member

left a comment

Just had a look over these changes and it all looks good.

Sadly I never got around to adding tests for fixelcfestats. Would be good to add at least one test where it tests the output when using the -notest option (i.e. only checks the t-stat, cfe, effect size etc). Actually, with the relatively new -permutations option, we could also test the pvalue output with a handful of set permutations.

Happy to add these tests, just won't be in the next week or so.

@jdtournier

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

Good point re the tests - this is a pretty central command, it really does need to be protected with a few tests...

@jdtournier

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

@draffelt - you might want to actually approve these changes if you're happy with them, otherwise Rob can't merge...

@draffelt

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

I'll approve for now. But was thinking it might be good to check we at least get the same result without the mask as current master.

@draffelt

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

Or have you done that already Rob?

That is an impressive speed up.

@jdtournier

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

@draffelt: OK, gotcha. That's a really good suggestion, we should set up some testing now to verify equivalent results pre & post. @Lestropie, any chance you can find the time to generate a few tests?

@Lestropie

This comment has been minimized.

Copy link
Member Author

commented Jul 5, 2017

I did a pre & post without mask, it wasn't bitwise identical (presumably due to random permutations) but it was only edge fixels here and there that changed for p<0.05. I could run again with pre-generated permutations if you want.

One thing I did change was this hard-coded value, to use the same threshold for the smoothing matrix as the connectivity matrix. For default parameters the change has no net effect though; it just made sense to me that if the user expands or contracts the matrix by tuning the connectivity threshold up or down, the extent of the smoothing matrix should do the same thing.

@Lestropie

This comment has been minimized.

Copy link
Member Author

commented Jul 7, 2017

With pre-calculated permutations, fwe_pvalue images vary by no more than one permutation's worth; p<0.05 masks are bitwise equivalent.

$ time /software/mrtrix3/bin/fixelcfestats ...
real    518m12.199s
user    13251m53.299s
sys     195m55.055s

$ time /software/mrtrix_testing/bin/fixelcfestats ...
real    137m26.918s
user    1562m37.928s
sys     83m32.612s

💪

I'm veering away from pushing anything other than fixes to master, but if we're tagging 3.0_RC2 for #1036 (which I think we should), this can possibly be carried along with that?

@jdtournier

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

Good to hear. Any chance of running tests to verify operation that don't take 1½ hours...?

I'm veering away from pushing anything other than fixes to master, but if we're tagging 3.0_RC2 for #1036 (which I think we should), this can possibly be carried along with that?

My thoughts exactly. A new 3.0_RC2 tag would be a good opportunity to add/modify things that we might otherwise have delayed for a future version. I reckon now is the time to create a 3.0_pre_RC2 branch, with a view to merging these changes along with the mtlognorm branch and some of the other pull requests into it, and we'll then push the lot out to master when ready and tag accordingly.

@jdtournier jdtournier changed the base branch from master to dev Jul 12, 2017

@Lestropie

This comment has been minimized.

Copy link
Member Author

commented Jul 13, 2017

Good point re the tests - this is a pretty central command, it really does need to be protected with a few tests...

As an aside: I've been working on various aspects of the stats code lately (got multiple contrasts working!), so one thing I did just today was write a script that generates random dummy data to feed to vectorstats. That's the cleanest way to test the core GLM functionality. Testing the higher-order stats commands gets a bit trickier... but I figured it'd be easier to start from the bottom and work my way up.

@Lestropie Lestropie changed the base branch from dev to tag_3.0_RC2 Jul 14, 2017

@Lestropie Lestropie merged commit fc9d9b1 into tag_3.0_RC2 Jul 14, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Lestropie Lestropie deleted the fixelcfestats_mask branch Jul 14, 2017

@Lestropie Lestropie referenced this pull request Jul 14, 2017

Merged

Tag 3.0 RC2 #1052

Lestropie added a commit that referenced this pull request Dec 14, 2017

fixel2tsf: Don't propagate non-finite values
With the introduction of the -mask option in fixelcfestats (#1030), fixel data files may contain the value NaN to indicate fixels that were outside the processing mask. However if such values were read within fixel2tsf, and hence propagated to the output track scalar file, these would be erroneously interpreted as track delimiters. This fix replaces non-finite values with 0.0 in fixel2tsf to overcome this limitation of the .tck / .tsf formats.
Some assert() calls have also been added to the relevant code in mrview to ensure that any loaded .tsf file does indeed align perfectly with the corresponding track file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.