Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Revamped ISC analyses with additional statistical tests #384

Merged
merged 46 commits into from Dec 7, 2018

Conversation

@snastase
Copy link
Contributor

commented Oct 4, 2018

I'm re-working and expanding the ISC (and ISFC) code to accommodate several new features. We can now compute either pairwise or leave-one-out ISCs (rather than just leave-one-out). All analyses now also accept 'mean' or 'median' as summary statistics. I've increased the flexibility of inputs to the core ISC functions, and tried to standardize output geometries (time points by voxels). I've moved statistical tests out of the core ISC and ISFC functions—these are now separate functions. In addition to phase randomization, I've added a circular time-shifting test (Kauppi et al., 2014), and group-level permutation and bootstrap tests (Chen et al., 2016). Both the phase randomization and circular time-shift randomization tests operate on the data themselves and call the core ISC function internally to recompute ISCs on the randomized data. Two changes to the phase randomization code: no more randomization across voxels (as this disrupts the spatial autocorrelation of fMRI data), and in the leave-one-out approach, only the left-out subject is phase randomized per iteration. If all subjects are randomized, then N-1 are averaged, the Nth subject will always be correlated with what is effectively a flat time series, yielding an overly tight null distribution and inflated false positive rates (FPRs). Group-level bootstrap and permutation tests operate directly on the ISC values and should better control FPRs (Chen et al., 2016). For pairwise ISCs, bootstrap resampling of subjects and group relabeling operate at the row/column level, not at the level of individual pairs (therefore respecting the correlation structure among pairs). Updated and expanded tests. No statistical tests for ISFC yet, but working on it. I would also recommend changing the module/filename from "isfc.py" to "isc.py", despite the ISC values being a subset of the ISC values, I think ISC is conceptually superordinate and we may want to add things like spatial ISC in the future (which does not fit under the ISFC heading). NB: This is my first major PR, so apologies if I'm missing anything obvious.

@buildbot-princeton

This comment has been minimized.

Copy link
Collaborator

commented Oct 4, 2018

Can one of the admins verify this patch?

1 similar comment
@buildbot-princeton

This comment has been minimized.

Copy link
Collaborator

commented Oct 4, 2018

Can one of the admins verify this patch?

@snastase snastase requested review from cbaldassano, manojneuro and qihongl Oct 4, 2018

@snastase snastase assigned snastase and unassigned snastase Oct 4, 2018

@mihaic

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2018

Jenkins, add to whitelist.

@mihaic

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2018

@snastase, please ignore the Travis status for the moment. The failure is unrelated. I'm working on a fix.

@qihongl

This comment has been minimized.

Copy link
Member

commented Oct 5, 2018

Hi, folks @snastase @mihaic sorry about the delayed response! And thank you very much @snastase
! I'm very excited about this pull request! Though my schedule is pretty crazy recently. If this is not time sensitive, I'd love to help with anything!

@qihongl

This comment has been minimized.

Copy link
Member

commented Oct 5, 2018

@mihaic @snastase I probably should discuss this point somewhere else:

This pull request raised a great point - standardized data geometry. I guess we probably should consider standardizing the input data geometry to be consistent with sklearn. For example, SRM currently takes n_features (n_voxels) x n_exampels (n_TRs), whereas sklearn, as well as many standard ML textbooks, use the transposed input format n_exampels x n_features. This have been causing some confusion in the past.

Moreover, I think we should encourage all future packages to be consistent with some default data format.

@manojneuro

This comment has been minimized.

Copy link

commented Oct 5, 2018

@snastase thanks for working on this. Meir and David Turner are doing some tests on memory utilization of ISC with Searchlight. I'll request them to use this fix as a part of their tests. This will provide an extra validation on many of these updates.

@mihaic

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2018

@qihongl, I agree about the geometry comment. See issue #356 for a different idea.

@snastase

This comment has been minimized.

Copy link
Contributor Author

commented Oct 5, 2018

@qihongl @mihaic I definitely agree about having a standard input geometry, and any transposes, slicing, etc happening internally. I would advocate for n_samples (typically time points, conditions) by n_features (typically voxels, electrodes, surface vertices), as this seems to be the most widespread convention and makes the most sense for, e.g., brain decoding analyses. I think standardizing this throughout will be critical for overall usability of BrainIAK software. I'm not very familiar with xarrays, but they look very cool. Seems like the ultimate solution would be to generally assume a standardized 2D geometry for neural data (with checks) but also allow for the increased flexibility / explicitness of xarrays.

@snastase

This comment has been minimized.

Copy link
Contributor Author

commented Oct 5, 2018

@manojneuro I haven't put a lot of effort into optimizing for speed / computing resources yet—was going for flexibility and readability—so I suspect there are several points where we could speed things up! Current version may run slower than the previous implementation until then, but I haven't thoroughly tested this.

@mihaic

This comment has been minimized.

Copy link
Contributor

commented Dec 6, 2018

I think removing (not deprecating) the old module brainiak.isfc is actually the way to go considering we are not keeping any function signature unchanged.

In any case, the current approach of duplicating the code is not what we want.

@snastase

This comment has been minimized.

Copy link
Contributor Author

commented Dec 6, 2018

Okay, will remove the brainiak.isfc duplicate (and tests) entirely.

I'm a bit confused about how to handle the current test error regarding linked references in the docstring. I'm getting the following message:

Warning, treated as error:
/home/travis/build/brainiak/brainiak/brainiak/isc.py:docstring of brainiak.isc.permutation_isc:30:Duplicate explicit target name: "chen2016".
make: *** [html] Error 2
/home/travis/build/brainiak/brainiak
make docs failed
The command "./pr-check.sh" exited with 1.

Is this because I have multiple pointers to the same reference? I.e., the Chen2016 reference is used for both bootstrap_isc and permutation_isc. What's the solution to this?

@snastase

This comment has been minimized.

Copy link
Contributor Author

commented Dec 6, 2018

@mihaic also pointed out in an offline conversation that we'll need to adjust the ISC example (https://github.com/brainiak/brainiak/blob/master/examples/isfc/isfc.py) to match the new functionality.

@mihaic

This comment has been minimized.

Copy link
Contributor

commented Dec 6, 2018

The error is because of multiple definitions of chen2016. You should remove the second of the definitions and replace it with a reference, e.g.:
The implementation is based on the following publication: [Chen2016]_.
http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#citations

@snastase

This comment has been minimized.

Copy link
Contributor Author

commented Dec 6, 2018

Hey @mihaic, some tests are failing for linux during the conda build, apparently due to problems with pymanopt(?)—any idea what that's about?

@mihaic

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

It's not related to your PR (I can see the same error on my machine on the master branch). I'm looking into it, but if you're done before I figure it out, we can merge your PR anyway.

@snastase

This comment has been minimized.

Copy link
Contributor Author

commented Dec 7, 2018

@mihaic I think all the base functionality @qihongl reviewed is basically ready to go. There are other related topics (e.g., updating the examples)... not sure if those are best handled here or in another PR.

@mihaic

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

The master branch is supposed to be functional, so I don't think we can postpone updating the examples.

@mihaic

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

Also, why do you think this test failed on Linux on Jenkins?

=================================== FAILURES ===================================
_______________________________ test_isc_output ________________________________

    def test_isc_output():
    
        logger.info("Testing ISC outputs")
    
        data = correlated_timeseries(20, 60, noise=0,
                                     random_state=42)
        iscs = isc(data, pairwise=False)
        assert np.all(iscs[:, :2] == 1.)
        assert np.all(iscs[:, -1] < 1.)
    
        iscs = isc(data, pairwise=True)
>       assert np.all(iscs[:, :2] == 1.)
E       assert False
E        +  where False = <function all at 0x7f6aef557e18>(array([[1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],...       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.]]) == 1.0)
E        +    where <function all at 0x7f6aef557e18> = np.all
@snastase

This comment has been minimized.

Copy link
Contributor Author

commented Dec 7, 2018

@mihaic Okay, the ISC/ISFC example is now functional and updated to use the new code (including new MaskMultiSubjectData shape, etc). I'm not sure why that test would fail. The point was to ensure that the correlation between identical time series was exactly 1. I haven't been able to recreate a situation where the test failed locally. I guess it's possible that the np.corrcoef computation could introduce some tiny deviation and thus break the assertion? I switched the assertion from testing for strictly 1 to np.allclose to introduce a tiny bit of tolerance.

@snastase snastase merged commit e09b345 into brainiak:master Dec 7, 2018

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
linux Build finished.
Details
macos Build finished.
Details

@snastase snastase deleted the snastase:isc-stats branch Dec 7, 2018

snastase added a commit to snastase/brainiak that referenced this pull request Mar 27, 2019

snastase added a commit that referenced this pull request Mar 27, 2019

Handling NaNs in ISC/ISFC (and ISFC output shape) (#405)
* Tolerance for NaNs for ISC/ISFC, and tests

* Messing with NaN options in isfc

* Option to return NaNs in compute_correlation

* Moved NaN handling into _normalize_for_correlation

* NaN handling for ISFC and tests

* Reshaped ISFC output to n_subjects x n_voxel_pairs

* Fixed whitespace errors etc

* tolerate_nans in phaseshift_isc and timeshift_isc tests

* Added new function to squareform ISFCs and keep diagonal

* Replaced old p_from_null with new version

* Removed ISC test NIfTI data because we're simulating

* Moved phase randomization in phaseshift_isc to utils

* Removed vestigial ecdf from utils and tests

* Moved _check_timeseries_input to utils to fix dependencies

* Fixing duplicate references

* isfc can now take 'targets' as input (asymmetric)

* Fixed up 'targets' for ISFCs, added tests

* Increasing some bits of code coverage

* Modified exception tests to use shmancy pytest.raises

* Added news items for issues and PR

* Updated NEWS directly for outdated PR #384

mihaic added a commit to mihaic/brainiak that referenced this pull request Apr 8, 2019

docs: Move PR brainiak#384 NEWS entry to newsfragments
This will add it to the next release, not v0.8, as it should be.
In PR brainiak#403, I mistakenly suggested editing NEWS directly.

mihaic added a commit that referenced this pull request Apr 10, 2019

docs: Move PR #384 NEWS entry to newsfragments (#417)
It will be added to the next release where it belongs, not v0.8.
In PR #405, I mistakenly suggested editing NEWS directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.