Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ISC and ISFC #183

Merged
merged 13 commits into from Feb 24, 2017
Merged

Adding ISC and ISFC #183

merged 13 commits into from Feb 24, 2017

Conversation

@cbaldassano
Copy link
Collaborator

@cbaldassano cbaldassano commented Feb 21, 2017

I've added some simple functions for computing ISC and ISFC, taking advantage of the optimized correlation engine from FCMA. I also wrote a utility function for loading a set of masked subject data for use with these functions. Per the contributing doc, I've put these functions in a single file rather than a submodule.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Feb 21, 2017

The code looks great! I will make some suggestions to improve readability.

group = np.mean(D[:, :, np.arange(n_subj) != loo_subj], axis=2)
subj = D[:, :, loo_subj]
for v in range(n_vox):
ISC[v, loo_subj] = stats.pearsonr(group[v, :], subj[v, :])[0]

This comment has been minimized.

@yidawang

yidawang Feb 21, 2017
Member

Why don't you use fcma.util.compute_correlation here as well? If it is because of the data layouts, you can simply convert them accordingly. I believe even adding the layout conversion time up, the overall running time should outperform scipy.stats.pearsonr.

This comment has been minimized.

@mihaic

mihaic Feb 22, 2017
Contributor

@yidawang, is this what you had in mind? I though you meant it would be better to make single call to compute_correlation per subject.

This comment has been minimized.

@cbaldassano

cbaldassano Feb 22, 2017
Author Collaborator

I don't think that is possible (@yidawang, correct me if I'm wrong) - compute_correlation computes correlations between all pairs of vectors, while for ISC we only need correlations between corresponding pairs (i.e. the diagonal of what compute_correlation returns).

This comment has been minimized.

@yidawang

yidawang Feb 23, 2017
Member

If this is the case, calling scipy.stats.pearsonr is fine with me.

This comment has been minimized.

@mihaic

mihaic Feb 23, 2017
Contributor

Then it is probably best to revert to the original code, for clarity.

This comment has been minimized.

@yidawang

yidawang Feb 23, 2017
Member

Yes, reverting it would be better in terms of performance as well. Sorry for the confusion, I didn't fully get what you are doing in the beginning. @cbaldassano

This comment has been minimized.

@cbaldassano

cbaldassano Feb 24, 2017
Author Collaborator

I've just tested this, and yes it is much faster to use pearsonr in this case, so I've reverted the change.

Parameters
----------
data_files : list of filenames of subject nii files

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

Please document the required data shape.

This comment has been minimized.

@cbaldassano

cbaldassano Feb 22, 2017
Author Collaborator

This is just a base python list, so it is 1D

This comment has been minimized.

@mihaic

mihaic Feb 22, 2017
Contributor

That is clear, but what is the expected shape of the data inside the nii files?

This comment has been minimized.

@cbaldassano

cbaldassano Feb 22, 2017
Author Collaborator

Ah sorry, I've clarified in the comments now

-------
(D, coords) : tuple of voxel by time by subject ndarray, and
x,y,x (as provided by nibabel) coordinates of voxel locations (as a
tuple of ndarrays

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

I think the coords type and meaning could be clearer. Please document D and coords separately, using the expected style, i.e., one line for the type and separate text for the meaning.

n_vox = D.shape[0]
n_subj = D.shape[2]
ISFC = np.zeros((n_vox, n_vox, n_subj))
for loo_subj in range(D.shape[2]):

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

Is loo an abbreviation for index? If so, could you please replace it with a more common abbreviation, e.g., subj_idx?

This comment has been minimized.

@cbaldassano

cbaldassano Feb 22, 2017
Author Collaborator

loo is short for leave-one-out, which I've now clarified in a comment

subj = D[:, :, loo_subj]
ISFC[:, :, loo_subj] = compute_correlation(group, subj)
ISFC[:, :, loo_subj] = (ISFC[:, :, loo_subj] +
ISFC[:, :, loo_subj].T) / 2

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

Could you please add a comment describing this operation?

import numpy as np
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import fcluster, linkage
import sys, os

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

Imports should be on separate lines.

ISC_vol = np.zeros(brain_nii.shape)
ISC_vol[coords] = ISC
ISC_nifti = nib.Nifti1Image(ISC_vol,
brain_nii.get_affine(), brain_nii.header)

This comment has been minimized.

print('Calculating ISFC on ', np.sum(ISC_mask), ' voxels...')
D_masked = D[ISC_mask, :, :]
ISFC = brainiak.isfc.isfc(D_masked)
coords_ISFC = [x[ISC_mask] for x in coords]

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

coords_ISFC is not used.

z = fcluster(Z, 2, criterion='maxclust')
clust_inds = np.argsort(z)

plt.imshow(ISFC[np.ix_(clust_inds,clust_inds)])

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

Could you please add a comment explaining briefly what the image shows?

Functions for computing intersubject correlation (ISC) and intersubject
functional correlation (ISFC), and utility functions for loading subject data
files and masks
"""

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

Please add references to papers.


ISFC = brainiak.isfc.isfc(D)

gt = \

This comment has been minimized.

@mihaic

mihaic Feb 21, 2017
Contributor

ground_truth?

@cbaldassano
Copy link
Collaborator Author

@cbaldassano cbaldassano commented Feb 22, 2017

Hmm, the macos build is failing but it is working fine on my laptop. If I click Details, I get a permissions error "chrisb is missing the Overall/Read permission" so I'm not able to see what is going wrong.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Feb 22, 2017

As the commiter, you should have received an email from Jenkins with the output. (The Princeton Jenkins administrator does not have a solution for allowing the public to view the output yet.)

The problem is your usage of compute_correlation. You pass 1D ndarrays and it expects 2D ones:

brainiak/isfc.py:68: in isc
    ISC[v, loo_subj] = compute_correlation(group[v, :], subj[v, :])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

matrix1 = array([-0.29038674, -0.5220086 ,  0.23681971,  0.08854008, -0.41586313], dtype=float32)
matrix2 = array([-0.36225432, -0.43482456,  0.26723158,  0.16461712, -0.37991464], dtype=float32)

    def compute_correlation(matrix1, matrix2):
        """compute correlation between two sets of variables
    
        Correlate the rows of matrix1 with the rows of matrix2.
        If matrix1 == matrix2, it is auto-correlation computation
        resulting in a symmetric correlation matrix.
        The number of columns MUST agree between set1 and set2.
        The correlation being computed here is
        the Pearson's correlation coefficient, which can be expressed as
    
        .. math:: corr(X, Y) = \\frac{cov(X, Y)}{\\sigma_X\\sigma_Y}
    
        where cov(X, Y) is the covariance of variable X and Y, and
    
        .. math:: \\sigma_X
    
        is the standard deviation of variable X
    
        Reducing the correlation computation to matrix multiplication
        and using BLAS GEMM API wrapped by Scipy can speedup the numpy built-in
        correlation computation (numpy.corrcoef) by one order of magnitude
    
        .. math::
            corr(X, Y)
            &= \\frac{\\sum\\limits_{i=1}^n (x_i-\\bar{x})(y_i-\\bar{y})}{(n-1)
            \\sqrt{\\frac{\\sum\\limits_{j=1}^n x_j^2-n\\bar{x}}{n-1}}
            \\sqrt{\\frac{\\sum\\limits_{j=1}^{n} y_j^2-n\\bar{y}}{n-1}}}\\\\
            &= \\sum\\limits_{i=1}^n(\\frac{(x_i-\\bar{x})}
            {\\sqrt{\\sum\\limits_{j=1}^n x_j^2-n\\bar{x}}}
            \\frac{(y_i-\\bar{y})}{\\sqrt{\\sum\\limits_{j=1}^n y_j^2-n\\bar{y}}})
    
        Parameters
        ----------
        matrix1: 2D array in shape [r1, c]
            MUST be continuous and row-major
    
        matrix2: 2D array in shape [r2, c]
            MUST be continuous and row-major
    
        Returns
        -------
        corr_data: 2D array in shape [r1, r2]
            continuous and row-major in np.float32
        """
        matrix1 = matrix1.astype(np.float32)
        matrix2 = matrix2.astype(np.float32)
>       [r1, d1] = matrix1.shape
E       ValueError: not enough values to unpack (expected 2, got 1)
@cbaldassano
Copy link
Collaborator Author

@cbaldassano cbaldassano commented Feb 22, 2017

I fixed the compute_correlation issue in the most recent commit c811e7c but the builds are still failing. I've never gotten email from Jenkins, is there a way to see what email it is sending to? My local git email might not be set correctly?

@mihaic
Copy link
Contributor

@mihaic mihaic commented Feb 22, 2017

The email address is extracted from the commit, so you should make sure it is valid.

The error now is:

flake8 --config setup.cfg brainiak
brainiak/isfc.py:70:58: E231 missing whitespace after ','
@cbaldassano
Copy link
Collaborator Author

@cbaldassano cbaldassano commented Feb 22, 2017

Ok, fixed - forgot that run-tests didn't also run the static code checker. I have my local email set up correctly now, so hopefully Jenkins will send me emails now.

@cbaldassano
Copy link
Collaborator Author

@cbaldassano cbaldassano commented Feb 24, 2017

During the last commit one of the Travis builds ended in an error due to a timeout on the server - hopefully it goes through successfully this time, otherwise it might need to manually re-run in some way.

@mihaic
mihaic approved these changes Feb 24, 2017
@mihaic
Copy link
Contributor

@mihaic mihaic commented Feb 24, 2017

@cbaldassano, I am ready to merge this. However, we need to update it with the latest commits from master. Usually, I would just use the "Update branch" button on GitHub to update it myself, but I noticed you are using your master branch instead of a branch specifically created for this PR. Can you please do the update?

Bringing fork up to date with master
@cbaldassano
Copy link
Collaborator Author

@cbaldassano cbaldassano commented Feb 24, 2017

OK I think my fork is now up to date and should be mergeable by @mihaic

@mihaic mihaic merged commit 7940c62 into brainiak:master Feb 24, 2017
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@buildbot-princeton
linux Build finished.
Details
@buildbot-princeton
macos Build finished.
Details
danielsuo pushed a commit that referenced this pull request Nov 16, 2017
Search PATH in addition to current directory for executables
danielsuo pushed a commit that referenced this pull request Nov 16, 2017
* Make numpy arrays immutable in numbuf

* Move break statement outside of brackets

* Simplify test case

* Simplify test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants