Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of reconstruction error ? #37

Closed
emdupre opened this issue Dec 2, 2019 · 7 comments
Closed

Implementation of reconstruction error ? #37

emdupre opened this issue Dec 2, 2019 · 7 comments

Comments

@emdupre
Copy link
Collaborator

emdupre commented Dec 2, 2019

Thanks again for making this toolkit !

To understand it a little better, I'm trying to reproduce the Figure 4a from your recent paper introducing fmralign; you can see the implementation I have here, or launch it remotely with Binder.

I can't seem to get my distributions for ridge or scaled orthogonal alignment to match what you have in Figure 4a. I think the most likely explanation is simply that I'm not implementing the formula for reconstruction error appropriately.

Do you have an implementation of this already available ?

@thomasbazeille
Copy link
Collaborator

thomasbazeille commented Dec 13, 2019

Hello Elizabeth, thanks a lot for your tutorials and the good opportunity to reproduce the results of the paper, sorry in advance for the long message. This answer includes :

  1. Feedback on your notebook code.
  2. A figure with results i get with my reimplementation.
  3. The full replication code, sorry I didn't have time to make it a proper tutorial

Don't hesitate if you have any questions, or if you have ideas to improve documentation with what seemed unclear for you.

Feedbacks after running your code :

  • the important difference is the experimental set-up. For one pair, we use all "ap" conditions to learn one common alignment that we then apply to every "pa" conditions whereas you tried to learn one alignment per condition. As a result, ridge fails since the cross-validation scheme requires at least as many images as folds and the results you get with all methods must be spurious

  • The metrics are voxelwise and should be calculated across several samples (conditions here). So the results of your implementation of the reconstruction error and ratio match mine at the average level, but importantly, they should be run by voxel and on several samples

  • So for plotting you should get 40000 values for each subject. Then the violinplots we did are on all the concatened voxels of target subject. (each point representing one of the 20 x 40000 voxels)

Sidenotes :

  • before using the basc atlas to cluster you forgot to resample it so maybe that's what caused you troubles. For my implementation I sticked with automatic kmeans clustering (even though the paper used hierarchical kmeans)

  • if you provide a masker to the PairwiseAlignment class with the mask that comes with the data it will avoid fmralign recalculating it.

With these changes I obtain results that are fairly comparable with the paper. (code below)

Capture d’écran 2019-12-13 à 08 47 16

Full implementation includes:

  • a reconstruction error function with scikit-learn api.
  • same parameters as the paper when possible
import numpy as np
import scipy.optimize
from scipy.stats import pearsonr
from sklearn.metrics import r2_score
from fmralign.fetch_example_data import fetch_ibc_subjects_contrasts
from nilearn.image import load_img


def score_table(loss, X_gt, X_pred, multioutput='raw_values'):
    if loss is "r2":
        score = r2_score(X_gt, X_pred, multioutput=multioutput)
    elif loss is "n_reconstruction_err":
        score = normalized_reconstruction_error(
            X_gt, X_pred, multioutput=multioutput)
    elif loss is "corr":
        score = np.array([pearsonr(X_gt[:, vox], X_pred[:, vox])[0]
                          for vox in range(X_pred.shape[1])])
    else:
        raise NameError(
            "Unknown loss, it should be 'r2' or 'zero_mean_r2' or 'corr' or 'reconstruction_err'")
    return np.maximum(score, -1)


def normalized_reconstruction_error(y_true, y_pred, sample_weight=None, multioutput="uniform_average"):
    if y_true.ndim == 1:
        y_true = y_true.reshape((-1, 1))

    if y_pred.ndim == 1:
        y_pred = y_pred.reshape((-1, 1))

    if sample_weight is not None:
        weight = sample_weight[:, np.newaxis]
    else:
        weight = 1.

    numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)
    denominator = (weight * (y_true) ** 2).sum(axis=0, dtype=np.float64)
    nonzero_denominator = denominator != 0
    nonzero_numerator = numerator != 0
    valid_score = nonzero_denominator & nonzero_numerator
    output_scores = np.ones([y_true.shape[1]])
    output_scores[valid_score] = 1 - (numerator[valid_score] /
                                      denominator[valid_score])
    output_scores[nonzero_numerator & ~nonzero_denominator] = 0

    if multioutput == 'raw_values':
        # return scores individually
        return output_scores
    elif multioutput == 'uniform_average':
        # passing None as weights results is uniform mean
        avg_weights = None
    elif multioutput == 'variance_weighted':
        avg_weights = (weight * (y_true - np.average(y_true, axis=0,
                                                     weights=sample_weight)) ** 2).sum(axis=0, dtype=np.float64)
        # avoid fail on constant y or one-element arrays
        if not np.any(nonzero_denominator):
            if not np.any(nonzero_numerator):
                return 1.0
            else:
                return 0.0
    else:
        avg_weights = multioutput

    return np.average(output_scores, weights=avg_weights)


def reconstruction_ratio(aligned_error, identity_error):
    """
    Calculates the reconstruction error
    as defined by Bazeille and
    colleagues (2019).

    A value greater than 0 indicates that
    voxels are predicted better by aligned data
    than by raw data.

    Parameters
    ----------
    aligned_error : float64
        The reconstruction error from a given
        functional alignment method
    identity error :  float64
        The reconstruction error from predicting
        the target subject as the source subject
    """
    num = 1 - aligned_error
    den = 1 - identity_error
    return 1 - (num / den)


def align_score_one_pair(df, source_sub, target_sub, method, n_pieces, masker, n_jobs):
    # load the the ap and pa conditions for every subject
    source_train = df[df.subject ==
                      source_sub][df.acquisition == 'ap'].path.values
    target_train = df[df.subject ==
                      target_sub][df.acquisition == 'ap'].path.values
    source_test = df[df.subject ==
                     source_sub][df.acquisition == 'pa'].path.values
    target_test = df[df.subject ==
                     target_sub][df.acquisition == 'pa'].path.values

    # define alignment
    alignement_estimator = PairwiseAlignment(
        alignment_method=method, n_pieces=200, mask=masker, n_jobs=5)
    # fit estimator and predict missing data
    alignement_estimator.fit(source_train, target_train)
    target_pred = alignement_estimator.transform(source_test)
    # score prediction and source data
    reconstruction_err = score_table("n_reconstruction_err", masker.transform(target_test), masker.transform(target_pred),
                                     multioutput='raw_values')
    id_err = score_table("n_reconstruction_err", masker.transform(target_test), masker.transform(source_test),
                         multioutput='raw_values')
    # calculate reconstruction_ratio
    ratio = reconstruction_ratio(reconstruction_err, id_err)
    return reconstruction_err, ratio

files, df, mask = fetch_ibc_subjects_contrasts(subjects="all")

# Pairs used for paper
pairs_used = [(12, 8), (2, 5), (2, 8), (3, 11),
              (3, 1), (4, 6), (7, 10), (9, 1), (6, 3), (7, 9), (10, 1), (5, 12), (7, 4), (8, 6), (2, 11), (3, 5), (11, 7), (4, 12), (9, 10)]
subjects = df.subject.unique()
n_jobs = 10

# In the paper data were clustered automatically with hierarchical_k_means, here
# let's use kmeans, the closest algorithm available in fmralign, with 200 pieces

n_pieces = 200
clustering = "kmeans"
methods = ["scaled_orthogonal", "ridge_cv", "optimal_transport"]

# define masker
masker = NiftiMasker(mask_img=mask)
masker.fit()

# for each method calculate for all pairs reconstruction_error and reconstruction_ratio after alignment ; store them in scores.
scores = {}
for method in methods:
    method_err, method_ratio = [], []
    for pair in pairs_used:
        source_sub, target_sub = subjects[pair[0]], subjects[pair[1]]
        reconstruction_err, ratio = align_score_one_pair(
            df, source_sub, target_sub, method, n_pieces, masker, n_jobs)
        method_err.append(reconstruction_err)
        method_ratio.append(ratio)
    scores["{}_reconstruction_err".format(method)] = method_err
    scores["{}_reconstruction_ratio".format(method)] = method_ratio

# recover all reconstruction ratios and flatten their values across subjects

reconstruction_ratios = []
for method in methods:
    reconstruction_ratios.append(np.asarray(
        scores["{}_reconstruction_ratio".format(method)]).flatten())

# plot the distribution of scores for each method
import matplotlib.pyplot as plt
plt.figure(figsize=(5, 5))
plt.violinplot(reconstruction_ratios, vert=False, showextrema=False,
               showmeans=False, showmedians=True, points=800, widths=0.8)
plt.tick_params(axis='y', which='both', labelleft='on',
                labelright='off', labelsize=20)
plt.xlim(left=-1, right=1)
plt.xlabel(r'$R_{\eta^2}$', fontsize=24, rotation="horizontal")
plt.axvline(x=0, color='dimgrey')
plt.tight_layout()
plt.yticks(range(1, len(methods) + 1), methods)

@emdupre
Copy link
Collaborator Author

emdupre commented Dec 19, 2019

This has been super helpful, thank you !! I've updated my code based on your feedback 😸 Very much appreciated !

I do still have one small trouble I was hoping you could help me track down. I ran my adaption of your code (as well as the exact code you included above), and I'm getting slightly different results for ridge_cv and very different results for optimal_transport:

replication

I think these differences are likely coming from one of two places; either (1) my running without subjects 3 and 10 (since these aren't distributed on OSF), or (2) a difference on dependency versions. Here's what I'm currently running the replication with:

astroid==2.3.1
atomicwrites==1.3.0
attrs==19.3.0
autopep8==1.4.4
backcall==0.1.0
bleach==3.1.0
bokeh==1.4.0
brainiak==0.9.1
bz2file==0.98
certifi==2019.9.11
cycler==0.10.0
Cython==0.29.13
decorator==4.4.0
defusedxml==0.6.0
entrypoints==0.3
-e git+https://github.com/emdupre/fmralign@9d4884a31908e727caf15c8236c8bb84f880d327#egg=fmralign
h5py==2.10.0
importlib-metadata==0.23
ipdb==0.12.2
ipykernel==5.1.3
ipython==7.8.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isort==4.3.21
jedi==0.15.1
Jinja2==2.10.3
joblib==0.13.2
jsonschema==3.1.1
jupyter==1.0.0
jupyter-client==5.3.3
jupyter-console==6.0.0
jupyter-core==4.5.0
jupytext==1.2.4
kiwisolver==1.1.0
lazy-object-proxy==1.4.2
Mako==1.1.0
MarkupSafe==1.1.1
matplotlib==3.1.1
mccabe==0.6.1
mistune==0.8.4
more-itertools==7.2.0
mpi4py==3.0.2
nb-conda==2.2.1
nb-conda-kernels==2.2.2
nbconvert==5.6.0
nbformat==4.4.0
networkx==2.4
nibabel==2.5.1
nilearn==0.6.0a0
nitime==0.8.1
notebook==6.0.1
numpy==1.16.4
packaging==19.2
pandas==0.25.2
pandocfilters==1.4.2
parso==0.5.1
patsy==0.5.1
pexpect==4.7.0
pickleshare==0.7.5
Pillow==6.2.1
pluggy==0.13.0
POT==0.6.0
prometheus-client==0.7.1
prompt-toolkit==2.0.10
psutil==5.6.3
ptvsd==4.3.2
ptyprocess==0.6.0
py==1.8.0
pybind11==2.4.2
pycodestyle==2.5.0
pydicom==1.3.0
Pygments==2.4.2
pygpu==0.7.6
pylint==2.4.2
pymanopt==0.2.4
pyparsing==2.4.2
pyrsistent==0.15.4
pytest==5.2.1
python-dateutil==2.8.0
pytz==2019.3
PyYAML==5.1.2
pyzmq==18.1.0
qtconsole==4.5.5
scikit-learn==0.21.3
scipy==1.2.1
seaborn==0.9.0
Send2Trash==1.5.0
six==1.12.0
statsmodels==0.10.1
terminado==0.8.2
testpath==0.4.2
Theano==1.0.4
tornado==6.0.3
traitlets==4.3.3
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==0.6.0

Does anything stand out to you ?

EDIT: Or, option (3), I'm doing something else wrong entirely ! Any and all feedback is welcome, here.

Thanks so much again !

@thomasbazeille
Copy link
Collaborator

thomasbazeille commented Dec 19, 2019

This figure comes from the code above ?!😱

@emdupre
Copy link
Collaborator Author

emdupre commented Dec 19, 2019

This figure comes from the code above ?! 😱

My running of it 😅

I did add two missing imports so it would run (from nilearn.input_data import NiftiMasker and from fmralign.pairwise_alignment import PairwiseAlignment), otherwise I believe it's been run exactly as written.

@emdupre emdupre mentioned this issue Jan 6, 2020
2 tasks
@emdupre
Copy link
Collaborator Author

emdupre commented Jan 6, 2020

OK ! I can now get the correct figure:

rerun

It's clearly something in my dependencies. I'll keep looking into this and report back, sorry !

@thomasbazeille
Copy link
Collaborator

Glad you found it, I was about looking into this !

It's a bit worrying that these can change so much just based on dependencies.

So it's mostly but not only optimal transport, all methods seem to yield different results while all the front-end libraries seemed to be up-to-date. I'm a bit out of ideas here. @bthirion maybe ?

@thomasbazeille
Copy link
Collaborator

Ok #40 was merged as consequence of this issue. There is still the dependency question that stays open but i'm not sure we'll debug that quickly. I get back to it when I have time. I close this in the mean time.

Thanks a lot for the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants