Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling estimates for optimal transport #34

Closed
emdupre opened this issue Oct 23, 2019 · 10 comments · Fixed by #41
Closed

Profiling estimates for optimal transport #34

emdupre opened this issue Oct 23, 2019 · 10 comments · Fixed by #41

Comments

@emdupre
Copy link
Collaborator

emdupre commented Oct 23, 2019

Hi --

Thanks for this great resource ! I'm trying to adapt this example and apply it to some data I've processed, sub-sampling only the first ten frames.

Of the four methods surveyed, the first three all run smoothly and complete in under 5 minutes. I have not, though, been able to obtain results for optimal transport -- it runs for over 12 hours on my 24GB RAM machine.

I'm including the exact script below. Do you have any estimates of how long you expect optimal transport to take, or how much RAM you would recommend ?

Thanks !

#%%
import pprint
import numpy as np
from os.path import join as opj

from nilearn.input_data import NiftiMasker
from nilearn import (datasets, image, plotting)

from fmralign._utils import voxelwise_correlation
from fmralign.pairwise_alignment import PairwiseAlignment

#%%
fmriprep = '/home/emdupre/Desktop/dissertation/fmriprep/postproc/'
anat = opj(fmriprep,
           'sub-01_space-MNI152NLin2009cAsym_desc-preproc_T1w.nii.gz')

atlas_yeo_2011 = datasets.fetch_atlas_yeo_2011()
atlas = image.load_img(atlas_yeo_2011.thick_7)

# Select visual cortex, create a mask and resample it to the right resolution
mask_visual = image.new_img_like(atlas, atlas.get_data() == 1)
resampled_mask_visual = image.resample_to_img(
    mask_visual, anat, interpolation="nearest")

#%%
test = opj(fmriprep, 'sub-01_task-movie_run-01_space-MNI152NLin2009cAsym_desc-postproc_bold.nii.gz')
roi_masker = NiftiMasker(mask_img=resampled_mask_visual).fit(test)
roi_masker.generate_report()

#%%
roi_masker = NiftiMasker(mask_img=resampled_mask_visual).fit()
n_voxels = roi_masker.mask_img_.get_data().sum()
n_pieces = np.int(np.round(n_voxels / 200))

# Explicitly set this for now, otherwise we're getting empty parcels
# and breaking _everything_
n_pieces = 450  

#%%
runs = ['01', '01', '02', '02']
subjs = ['01', '02', '01', '02']
files = []
keys = ['source_train', 'target_train', 'source_test', 'target_test']

for i, value in enumerate(zip(subjs, runs)):
    files.append(opj(fmriprep, 'sub-{0}_task-movie_run-{1}_'.format(value[0],
                                                                    value[1]) +
                               'space-MNI152NLin2009cAsym_desc-postproc_bold.nii.gz'))

data = dict(zip(keys, files))
pp = pprint.PrettyPrinter(indent=2)
pp.pprint(data)

#%%
# Let's use fmralign !
# https://github.com/Parietal-INRIA/fmralign
methods = ['identity', 'scaled_orthogonal', 'ridge_cv', 'optimal_transport']

for method in methods:
    alignment_estimator = PairwiseAlignment(
        alignment_method=method, n_pieces=n_pieces, mask=roi_masker)
    alignment_estimator.fit(image.index_img(data['source_train'], index=slice(0, 10)),
                            image.index_img(data['target_train'], index=slice(0, 10)))
    target_pred = alignment_estimator.transform(image.index_img(data['source_test'], index=slice(0,10)))
    aligned_score = voxelwise_correlation(
        image.index_img(data['target_test'], index=slice(0, 10)), target_pred, roi_masker)
    display = plotting.plot_stat_map(aligned_score, display_mode="z", cut_coords=[-15, -5],
                                    vmax=1, title=f"Correlation of prediction after {method} alignment")
    display.savefig(f'pretty_brain_{method}.png')
    display.close()
@bthirion
Copy link
Contributor

Hi,
2 suggestions:

  • take an atlas with smaller regions (e.g. yeo 17, further dividing into connected components)
  • otherwise you may want to increase the entropy regularization, which improves converge of Sinkhorn algorithm.

@thomasbazeille
Copy link
Collaborator

thomasbazeille commented Oct 24, 2019 via email

@thomasbazeille
Copy link
Collaborator

Hey @emdupre, did this solve your problem ?

I just thought of something else, in some cases, the clustering algorithm can create unbalanced regions so even though you have a high number of regions, one might be very large and completely blocking the algorithm.
If your algorithm managed to finish, you can check your estimator.labels_ parameter.

To solve this, you should change the clustering parameter of your estimator.(clustering = 'kmeans' by default)
To have more balanced regions ReNA can be used directly through clustering = 'rena' but it's available only if you nilearn >0.6 installed

@emdupre
Copy link
Collaborator Author

emdupre commented Oct 30, 2019

I wasn't able to get it to finish -- I switched to ReNA by by pre-empting your PR #35 in my local copy, but that still didn't solve it, yet.

I'm planning to try your suggestion to increase the entropy regularization, but I probably won't have time to play around with this again until next week.

Thanks for checking in -- I'll keep y'all up to date !

@thomasbazeille
Copy link
Collaborator

I did some additional testing on various IBC data. Apparently all clustering algorithms give very unbalanced results for long raw timeseries, which we did not observe since we mostly worked on contrasts for now.

An algorithm which give less bad results is hierarchical kmeans, that I may add in fmralign soon. For now the best solution to have reasonable parcels is to feed an atlas such as BASC (available in nilearn for various number of regions after resampling it to the resolution of your data)

You can feed any 3D Niimg atlas directly in the 'clustering' argument in PairwiseAlignment or TemplateAlignment

@thomasbazeille
Copy link
Collaborator

I did some profiling for various estimators : Scaled Orthogonal, Ridge and OT with different regularization level. (Time is 1 CPU runtime in seconds)

The difference is not huge up to 1000 voxels where our OT implementation starts taking forever without any clear reason. It's based on POT, and we might want to optimize this a lot in the coming months by recoding sinkhorn specifically.

Until then, 1000 voxels for the biggest region is the limit !

fmralign profiling

@emdupre
Copy link
Collaborator Author

emdupre commented Dec 3, 2019

Ahhh, that makes a lot of sense ! I had been trying various regularization thresholds but all still with the BASC parcellation at the finest resolution -- but that's still only 444 parcels.

Do you think it would be worth adding this to the docs ? I'm happy to open a PR !

@thomasbazeille
Copy link
Collaborator

Sorry about that, yes if you have a good idea of where it should be added, for sure it'd very helpful.

Are you working at a high resolution ?

I'll try to make hkmeans available as soon as possible - at least as long it's not in nilearn - so that you still have an option for better balanced clusters

@bthirion
Copy link
Contributor

bthirion commented Dec 3, 2019

Thand @thomasbazeille
And welcome in the big data era with POT !

@thomasbazeille
Copy link
Collaborator

Hello, Hierarchical Kmeans are available in fmralign through :PairwiseAlignment(clustering='hierarchical_kmeans')

This gives way better balanced clusters especially for timeseries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants