Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecoBundles - recognition of bundles #1443

Merged
merged 39 commits into from Apr 28, 2018
Merged

RecoBundles - recognition of bundles #1443

merged 39 commits into from Apr 28, 2018

Conversation

Garyfallidis
Copy link
Contributor

@Garyfallidis Garyfallidis commented Mar 5, 2018

This PR implements the super 🔥🔥🔥🔥 paper:

Garyfallidis et al. Recognition of white matter bundles using local and global streamline-based registration and clustering NeuroImage, 2017

https://www.sciencedirect.com/science/article/pii/S1053811917305839

This PR is ready for reviews.

I would like that first we merge #1379 so I can move from using directly the ArraySequence to using the Streamlines class. #1379 only needs a small coverage boost.

Some comments on the methods of this PR.

RecoBundles extracts bundles from a given tractogram using model bundles from another tractogram.
Therefore it is important to make sure that the two tractograms are registered first. So, what we usually do is apply the remarkably robust streamline registration first (see function whole_brain_slr) and then start RecoBundles. This strategy can be very useful with clinical datasets where image-based registration is uncertain. Of course you are happy to use other methods for registration too.

https://www.ncbi.nlm.nih.gov/pubmed/25987367

After this PR is merged will be working on providing an easy to use workflow and with access to an atlas of many bundles. And build the tutorial. Nonetheless, be happy to start using this and do please cite us gracefully.

Remark; If the two tractograms are already registered well with nonrigid registration make sure you disable the local_slr parameter as it will be probably an overkill.

Oh and Welcome to the Matrix! :) This PR brings infinite possibilities in the game. More cool additions and enhancements coming soon.

@frheault after this is merged make a PR for the centroid-based metric directly to the DIPY master. I am talking about your previous work here https://github.com/Garyfallidis/dipy/pull/50/files

Copy link
Member

@skoudoro skoudoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis complain but I can not reproduce this error on windows... all tests pass


Parameters
-----------
static : array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name static does not match with input parameter stat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

-----------
static : array
Static streamlines
moving : array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name moving does not match with input parameter mov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Notes
-----
The difference with ``_bundle_minimum_distance`` is that we sum the
minimum values only for the static. Therefore, this is assymetric.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: asymmetric

@codecov-io
Copy link

codecov-io commented Mar 9, 2018

Codecov Report

Merging #1443 into master will increase coverage by 0.03%.
The diff coverage is 88.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1443      +/-   ##
==========================================
+ Coverage   87.49%   87.52%   +0.03%     
==========================================
  Files         241      244       +3     
  Lines       30789    31189     +400     
  Branches     3322     3426     +104     
==========================================
+ Hits        26939    27299     +360     
- Misses       3074     3079       +5     
- Partials      776      811      +35
Impacted Files Coverage Δ
dipy/tracking/streamline.py 96.03% <ø> (+0.2%) ⬆️
dipy/tracking/utils.py 88.57% <ø> (ø) ⬆️
dipy/segment/clustering.py 94.63% <100%> (+0.97%) ⬆️
dipy/align/tests/test_streamlinear.py 97.74% <100%> (ø) ⬆️
dipy/align/streamlinear.py 86.85% <75.69%> (-9.24%) ⬇️
dipy/segment/bundles.py 90.21% <90.21%> (ø)
dipy/align/tests/test_whole_brain_slr.py 93.75% <93.75%> (ø)
dipy/segment/tests/test_qbx.py 97.12% <96.15%> (+0.19%) ⬆️
dipy/segment/tests/test_rb.py 96.87% <96.87%> (ø)
dipy/core/gradients.py 92.1% <0%> (-5.27%) ⬇️
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6af802...72fcf7f. Read the comment docs.

@Garyfallidis Garyfallidis changed the title WIP: RecoBundles - automatic recognition of bundles RecoBundles - automatic recognition of bundles Mar 11, 2018
@Garyfallidis Garyfallidis changed the title RecoBundles - automatic recognition of bundles RecoBundles - recognition of bundles Mar 11, 2018
@Garyfallidis Garyfallidis added this to Needs a review in Dipy 0.14.0 Mar 12, 2018
@Garyfallidis
Copy link
Contributor Author

@kesshijordan it would be great if you can have a look at this PR. Let me know what you think. @MarcCote please have a look at the qbx_merge function.

@kesshijordan
Copy link
Contributor

Thanks, @Garyfallidis ! I've been looking forward to this. I'm working on pyAFQ (@arokem) and think this approach would be a great way to clean up the results.

I made some atlases (calculated centroids from quickbundles and streamline registered a series of manually segmented healthy control tracks) that would probably work well for this.

@Garyfallidis
Copy link
Contributor Author

Great strategy! Cool @kesshijordan, give it a go and let me know if you have questions. Happy to help :)

@Garyfallidis
Copy link
Contributor Author

@kesshijordan and @BramshQamar please provide your review. It would be great to have this merged for this release.

Copy link
Contributor

@MarcCote MarcCote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adde minor suggestions to qbx_merge. Otherwise, looks good to me.

len_s = len(streamlines)
if select_randomly is None:
select_randomly = len_s
indices = np.random.choice(len_s, min(select_randomly, len_s),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer using a dedicated random generator instead of relying on np.random. It gives you more control for reproducibility.

I would add a new optional argument rng=None to the function and do

indices = np.arange(len(streamlines))
if select_randomly:
    rng = np.random.RandomState(None) if rng is None else rng
    rng.shuffle(indices)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note. I personally almost always replace the pattern x = y if x is None else x by x = x or y

rng = rng or np.random.RandomState(None)

I know that this could be a trap because False is not the same as None but some programmers prefer the or trick.


final_level = len(thresholds)

qbx_ordering_final = np.random.choice(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could reuse the rng from above here instead of relying on the global RandomState.


print(' Duration %0.3f sec. \n' % (time() - t1, ))

return merged_cluster_map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a newline.

@kesshijordan
Copy link
Contributor

kesshijordan commented Apr 5, 2018

@Garyfallidis I think it's worth having a basic tutorial with a quick demo/guide for parameter tuning in the same release as Recobundles. I would imagine that when it's released people will give it a quick try to see if it works on their data as well as the examples in the publication. You want them to know which knobs to turn when they do; the tuning isn't necessarily intuitive (it made a huge difference in how well it worked on my data when I got feedback from you on tuning). It would help make a good first impression.


# In essence whole_brain_slr can be thought as a combination of
# SLF and QuickBundles
whole_brain_slr = slr_with_qb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whole_brain_slr and slr_with_qb basically have same implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the same function can be used for brain tractograms and other tractograms. In the paper I called it whole_brain_slr but then realized that the function can be more generic. Adding some more documentation to explain.

len_s = len(streamlines)
if select_randomly is None:
select_randomly = len_s
indices = np.random.choice(len_s, min(select_randomly, len_s),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note. I personally almost always replace the pattern x = y if x is None else x by x = x or y

rng = rng or np.random.RandomState(None)

I know that this could be a trap because False is not the same as None but some programmers prefer the or trick.

double inf = np.finfo('f8').max
double dist=0
double * min_j
# double * min_i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clean all the commented code in this function?



def test_rb_no_neighb():
# what if neighbors are found? No recognition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo? What if NO neighbors are found?


MAX_DIST = 1e10
LOG_MAX_DIST = np.log(MAX_DIST)

DEFAULT_BOUNDS = [(-45, 45), (-45, 45), (-45, 45),
(-30, 30), (-30, 30), (-30, 30),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's 45, 30, and not 30, 45? I ask because I have seen another version with translation (-30, 30) and rotation (-45, 45) and I believe it makes more sense that way.

EDIT: Oh, I see that you have another set of bounds below :) Can you explain why it's different from this set that you jsut defined?

clustering.
"""

self.reduction_thr = reduction_thr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been working with this codebase (a really similar one from François Rheault) in the last weeks and please, for the love of gods, don't do that! Stop putting mutable stuff in self like a madman. It's so much harder to debug! I would know, I did it! I took me days to debug instead of hours.

I'm not talking only about reduction_thr. It wouldn't be a real problem if it was only for this variable. The problem happen because you have reduction_thr, model_bundle, pruned_streamlines, nb_pruned_streamlines, transf_streamlines, transf_matrix, labels, model_clust_thr, model_cluster_map, model_centroids, nb_model_centroids, centroid_matrix, neighb_streamlines, neighb_clusters, neighb_centroids, neighb_indices, nb_neighb_streamlines, slr_bmd, slr_iterations, slr_initial_matrix, slr_final_matrix, slr_xopt, rtransf_cluster_map, rtransf_centroids, nb_rtransf_centroids, pruning_matrix, pruned_indices and labeled_streamlines that are not in self at creation and are modified at random places and times. I may or may not have missed some mutable variables, so there may be more.

Some of them are only used in one function. Some of them are used in many functions but it could be easily be avoided. Some of them are simply the len of another. Maybe some of them are never used, I didn't check.

I will not write an essay here about why a programmer should never do that, I'll simply say that it's super hard to follow and to debug this kind of code. By the way, sorry for the "tone" of this review, I swear I did it for the good of DiPy.

Copy link
Contributor

@MarcCote MarcCote Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What an old-fashioned / C++ way of thinking :P ... just kidding! I totally agree with you that would definitively clarify the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the alternative that you are proposing @nilgoyette ? Also let's all try to be polite to each other. The tone was not necessary. Please be specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing fancy: return value(s) from functions, add arguments, try to have 0 mutable member.

As a generul rule, try to only have immutable objects. self should never be modified after construction. I know it's not Python way of thinking and it's maybe "too much", so I wouldn't care if 1 or 2 members are modified but you took it to another level :)

openmp.omp_init_lock(&lock)

min_j = <double *> malloc(static_size * sizeof(double))
# min_i = <double *> malloc(moving_size * sizeof(double))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you only keep the min of the first loop, you don't need an array. A simple min and sum var will do.

Notes
-----
The difference with ``_bundle_minimum_distance`` is that we sum the
minimum values only for the static. Therefore, this is asymetric.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain more what are the consequences, if any? This function is a little less heavy than _bundle_minimum_distance, ok, but what are we losing? When can we use it?

# return recognized bundle in original streamlines, labels of
# recognized bundle and transformed recognized bundle
return self.streamlines[self.labels], self.labels, \
self.pruned_streamlines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to have detailed documentation about difference between self.streamlines[self.labels] and self.pruned_streamlines returned by recognize function.

self.cluster_map.refdata = self.streamlines
self.centroids = self.cluster_map.centroids
self.nb_centroids = len(self.centroids)
self.indices = [cluster.indices for cluster in self.cluster_map]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update, it's incredibly better. You can go further by removing self.clust_thr and self.indices (never used), and self.nb_centroids (only used in a print, so simply use len). I think I won't complain after that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing self.clust_thr but the others are useful to stay there. I have reduced number of attributes incredibly there is no need to move directly and made _cluster_streamlines a "private" function. So, you should be very happy. Some utility attributes are useful.


def _cluster_streamlines(self, clust_thr=15, nb_pts=20):

np.random.seed(42)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a parameter? It would be nice if the user could simply choose a seed to have reproducable results. It's probably more complex than this simple line though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, needs a proper RandomState there. Fixing ...

centroid_matrix[centroid_matrix > reduction_thr] = np.inf

mins = np.min(centroid_matrix, axis=0)
close_clusters_indices = list(np.where(mins != np.inf)[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this strategy. Why not forget the infinity and simply use your reduction_thr directly in the np.where?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple ways to go to Rome.


# TODO this can be speeded up by using directly the centroids
static = select_random_set_of_streamlines(model_bundle,
select_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can fit in 80 characters.


mins = np.min(pruning_matrix, axis=0)
pruned_indices = [rtransf_cluster_map[i].indices
for i in np.where(mins != np.inf)[0]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem here?

pruned_streamlines = [transf_streamlines[i]
for i in pruned_indices]

pruned_indices = pruned_indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thanks.

@Garyfallidis
Copy link
Contributor Author

@kesshijordan I will have to make a tutorial on another PR. Here is why. We are currently cleaning and preparing an atlas to be used. And it takes time. I want you to look at this atlas when you visit us next week. Ideally I want to make sure that we have a tutorial that can run with multiple input atlases. But we can start with one first.

@Garyfallidis
Copy link
Contributor Author

Garyfallidis commented Apr 26, 2018

Thank you all @kesshijordan @MarcCote @BramshQamar @nilgoyette and @MarcCote for your reviews. There were all very useful. All your comments have been address. Apologies also for the initial coding style with the many unnecessary attributes. But all this is fixed now. We are going to merge this PR today so we can move with the release in preparation for the big DIPY workshop next week. During https://brainhack.sice.indiana.edu. More updates and tutorials and atlases will be available in other PRs.

@skoudoro skoudoro merged commit 26f99c5 into dipy:master Apr 28, 2018
Dipy 0.14.0 automation moved this from Needs a review to Done Apr 28, 2018
ShreyasFadnavis pushed a commit to ShreyasFadnavis/dipy that referenced this pull request Sep 20, 2018
RecoBundles - recognition of bundles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Dipy 0.14.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

7 participants