Profiling estimates for optimal transport #34

emdupre · 2019-10-23T19:03:37Z

Hi --

Thanks for this great resource ! I'm trying to adapt this example and apply it to some data I've processed, sub-sampling only the first ten frames.

Of the four methods surveyed, the first three all run smoothly and complete in under 5 minutes. I have not, though, been able to obtain results for optimal transport -- it runs for over 12 hours on my 24GB RAM machine.

I'm including the exact script below. Do you have any estimates of how long you expect optimal transport to take, or how much RAM you would recommend ?

Thanks !

#%%
import pprint
import numpy as np
from os.path import join as opj

from nilearn.input_data import NiftiMasker
from nilearn import (datasets, image, plotting)

from fmralign._utils import voxelwise_correlation
from fmralign.pairwise_alignment import PairwiseAlignment

#%%
fmriprep = '/home/emdupre/Desktop/dissertation/fmriprep/postproc/'
anat = opj(fmriprep,
           'sub-01_space-MNI152NLin2009cAsym_desc-preproc_T1w.nii.gz')

atlas_yeo_2011 = datasets.fetch_atlas_yeo_2011()
atlas = image.load_img(atlas_yeo_2011.thick_7)

# Select visual cortex, create a mask and resample it to the right resolution
mask_visual = image.new_img_like(atlas, atlas.get_data() == 1)
resampled_mask_visual = image.resample_to_img(
    mask_visual, anat, interpolation="nearest")

#%%
test = opj(fmriprep, 'sub-01_task-movie_run-01_space-MNI152NLin2009cAsym_desc-postproc_bold.nii.gz')
roi_masker = NiftiMasker(mask_img=resampled_mask_visual).fit(test)
roi_masker.generate_report()

#%%
roi_masker = NiftiMasker(mask_img=resampled_mask_visual).fit()
n_voxels = roi_masker.mask_img_.get_data().sum()
n_pieces = np.int(np.round(n_voxels / 200))

# Explicitly set this for now, otherwise we're getting empty parcels
# and breaking _everything_
n_pieces = 450  

#%%
runs = ['01', '01', '02', '02']
subjs = ['01', '02', '01', '02']
files = []
keys = ['source_train', 'target_train', 'source_test', 'target_test']

for i, value in enumerate(zip(subjs, runs)):
    files.append(opj(fmriprep, 'sub-{0}_task-movie_run-{1}_'.format(value[0],
                                                                    value[1]) +
                               'space-MNI152NLin2009cAsym_desc-postproc_bold.nii.gz'))

data = dict(zip(keys, files))
pp = pprint.PrettyPrinter(indent=2)
pp.pprint(data)

#%%
# Let's use fmralign !
# https://github.com/Parietal-INRIA/fmralign
methods = ['identity', 'scaled_orthogonal', 'ridge_cv', 'optimal_transport']

for method in methods:
    alignment_estimator = PairwiseAlignment(
        alignment_method=method, n_pieces=n_pieces, mask=roi_masker)
    alignment_estimator.fit(image.index_img(data['source_train'], index=slice(0, 10)),
                            image.index_img(data['target_train'], index=slice(0, 10)))
    target_pred = alignment_estimator.transform(image.index_img(data['source_test'], index=slice(0,10)))
    aligned_score = voxelwise_correlation(
        image.index_img(data['target_test'], index=slice(0, 10)), target_pred, roi_masker)
    display = plotting.plot_stat_map(aligned_score, display_mode="z", cut_coords=[-15, -5],
                                    vmax=1, title=f"Correlation of prediction after {method} alignment")
    display.savefig(f'pretty_brain_{method}.png')
    display.close()

bthirion · 2019-10-23T20:24:05Z

Hi,
2 suggestions:

take an atlas with smaller regions (e.g. yeo 17, further dividing into connected components)
otherwise you may want to increase the entropy regularization, which improves converge of Sinkhorn algorithm.

thomasbazeille · 2019-10-24T08:32:39Z

Yes, most probably the regions on which you try to apply optimal transport contains too many voxels. The best solution is to choose smaller regions (200 voxels up to 400-500). For 1000 voxels, it becomes really slow and so on. Also don't search an exact solution, a regularization of 10e-1, 10e-2 is already pretty weak and the alignment is not really improved by less regularization but it will be much slower. As a reference, the Sinkhorn algorithm used (and not really optimized) yields a complexity in O(p * n^2 *log(n)* τ−3 ) to find a T approximate solution for n voxels and p timeframes.

thomasbazeille · 2019-10-30T10:53:40Z

Hey @emdupre, did this solve your problem ?

I just thought of something else, in some cases, the clustering algorithm can create unbalanced regions so even though you have a high number of regions, one might be very large and completely blocking the algorithm.
If your algorithm managed to finish, you can check your estimator.labels_ parameter.

To solve this, you should change the clustering parameter of your estimator.(clustering = 'kmeans' by default)
To have more balanced regions ReNA can be used directly through clustering = 'rena' but it's available only if you nilearn >0.6 installed

emdupre · 2019-10-30T14:37:25Z

I wasn't able to get it to finish -- I switched to ReNA by by pre-empting your PR #35 in my local copy, but that still didn't solve it, yet.

I'm planning to try your suggestion to increase the entropy regularization, but I probably won't have time to play around with this again until next week.

Thanks for checking in -- I'll keep y'all up to date !

thomasbazeille · 2019-11-05T08:30:56Z

I did some additional testing on various IBC data. Apparently all clustering algorithms give very unbalanced results for long raw timeseries, which we did not observe since we mostly worked on contrasts for now.

An algorithm which give less bad results is hierarchical kmeans, that I may add in fmralign soon. For now the best solution to have reasonable parcels is to feed an atlas such as BASC (available in nilearn for various number of regions after resampling it to the resolution of your data)

You can feed any 3D Niimg atlas directly in the 'clustering' argument in PairwiseAlignment or TemplateAlignment

thomasbazeille · 2019-12-03T13:24:03Z

I did some profiling for various estimators : Scaled Orthogonal, Ridge and OT with different regularization level. (Time is 1 CPU runtime in seconds)

The difference is not huge up to 1000 voxels where our OT implementation starts taking forever without any clear reason. It's based on POT, and we might want to optimize this a lot in the coming months by recoding sinkhorn specifically.

Until then, 1000 voxels for the biggest region is the limit !

emdupre · 2019-12-03T16:03:06Z

Ahhh, that makes a lot of sense ! I had been trying various regularization thresholds but all still with the BASC parcellation at the finest resolution -- but that's still only 444 parcels.

Do you think it would be worth adding this to the docs ? I'm happy to open a PR !

thomasbazeille · 2019-12-03T19:53:40Z

Sorry about that, yes if you have a good idea of where it should be added, for sure it'd very helpful.

Are you working at a high resolution ?

I'll try to make hkmeans available as soon as possible - at least as long it's not in nilearn - so that you still have an option for better balanced clusters

bthirion · 2019-12-03T20:27:47Z

Thand @thomasbazeille
And welcome in the big data era with POT !

thomasbazeille · 2019-12-15T10:57:34Z

Hello, Hierarchical Kmeans are available in fmralign through :PairwiseAlignment(clustering='hierarchical_kmeans')

This gives way better balanced clusters especially for timeseries.

thomasbazeille mentioned this issue Dec 13, 2019

[DOC] Improve documentation based on feedback #38

Open

emdupre mentioned this issue Jan 6, 2020

[DOC] Note recommended region size for OT convergence #41

Merged

thomasbazeille closed this as completed in #41 Jan 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling estimates for optimal transport #34

Profiling estimates for optimal transport #34

emdupre commented Oct 23, 2019

bthirion commented Oct 23, 2019

thomasbazeille commented Oct 24, 2019 via email •

edited

Loading

thomasbazeille commented Oct 30, 2019

emdupre commented Oct 30, 2019

thomasbazeille commented Nov 5, 2019

thomasbazeille commented Dec 3, 2019

emdupre commented Dec 3, 2019

thomasbazeille commented Dec 3, 2019

bthirion commented Dec 3, 2019

thomasbazeille commented Dec 15, 2019

Profiling estimates for optimal transport #34

Profiling estimates for optimal transport #34

Comments

emdupre commented Oct 23, 2019

bthirion commented Oct 23, 2019

thomasbazeille commented Oct 24, 2019 via email • edited Loading

thomasbazeille commented Oct 30, 2019

emdupre commented Oct 30, 2019

thomasbazeille commented Nov 5, 2019

thomasbazeille commented Dec 3, 2019

emdupre commented Dec 3, 2019

thomasbazeille commented Dec 3, 2019

bthirion commented Dec 3, 2019

thomasbazeille commented Dec 15, 2019

thomasbazeille commented Oct 24, 2019 via email •

edited

Loading