Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Revamped ISC analyses with additional statistical tests #384

Merged
merged 46 commits into from Dec 7, 2018

Conversation

@snastase
Copy link
Contributor

@snastase snastase commented Oct 4, 2018

I'm re-working and expanding the ISC (and ISFC) code to accommodate several new features. We can now compute either pairwise or leave-one-out ISCs (rather than just leave-one-out). All analyses now also accept 'mean' or 'median' as summary statistics. I've increased the flexibility of inputs to the core ISC functions, and tried to standardize output geometries (time points by voxels). I've moved statistical tests out of the core ISC and ISFC functions—these are now separate functions. In addition to phase randomization, I've added a circular time-shifting test (Kauppi et al., 2014), and group-level permutation and bootstrap tests (Chen et al., 2016). Both the phase randomization and circular time-shift randomization tests operate on the data themselves and call the core ISC function internally to recompute ISCs on the randomized data. Two changes to the phase randomization code: no more randomization across voxels (as this disrupts the spatial autocorrelation of fMRI data), and in the leave-one-out approach, only the left-out subject is phase randomized per iteration. If all subjects are randomized, then N-1 are averaged, the Nth subject will always be correlated with what is effectively a flat time series, yielding an overly tight null distribution and inflated false positive rates (FPRs). Group-level bootstrap and permutation tests operate directly on the ISC values and should better control FPRs (Chen et al., 2016). For pairwise ISCs, bootstrap resampling of subjects and group relabeling operate at the row/column level, not at the level of individual pairs (therefore respecting the correlation structure among pairs). Updated and expanded tests. No statistical tests for ISFC yet, but working on it. I would also recommend changing the module/filename from "isfc.py" to "isc.py", despite the ISC values being a subset of the ISC values, I think ISC is conceptually superordinate and we may want to add things like spatial ISC in the future (which does not fit under the ISFC heading). NB: This is my first major PR, so apologies if I'm missing anything obvious.

@buildbot-princeton
Copy link
Collaborator

@buildbot-princeton buildbot-princeton commented Oct 4, 2018

Can one of the admins verify this patch?

1 similar comment
@buildbot-princeton
Copy link
Collaborator

@buildbot-princeton buildbot-princeton commented Oct 4, 2018

Can one of the admins verify this patch?

@snastase snastase assigned snastase and unassigned snastase Oct 4, 2018
@mihaic
Copy link
Contributor

@mihaic mihaic commented Oct 4, 2018

Jenkins, add to whitelist.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Oct 4, 2018

@snastase, please ignore the Travis status for the moment. The failure is unrelated. I'm working on a fix.

@qihongl
Copy link
Member

@qihongl qihongl commented Oct 5, 2018

Hi, folks @snastase @mihaic sorry about the delayed response! And thank you very much @snastase
! I'm very excited about this pull request! Though my schedule is pretty crazy recently. If this is not time sensitive, I'd love to help with anything!

@qihongl
Copy link
Member

@qihongl qihongl commented Oct 5, 2018

@mihaic @snastase I probably should discuss this point somewhere else:

This pull request raised a great point - standardized data geometry. I guess we probably should consider standardizing the input data geometry to be consistent with sklearn. For example, SRM currently takes n_features (n_voxels) x n_exampels (n_TRs), whereas sklearn, as well as many standard ML textbooks, use the transposed input format n_exampels x n_features. This have been causing some confusion in the past.

Moreover, I think we should encourage all future packages to be consistent with some default data format.

@manojneuro
Copy link

@manojneuro manojneuro commented Oct 5, 2018

@snastase thanks for working on this. Meir and David Turner are doing some tests on memory utilization of ISC with Searchlight. I'll request them to use this fix as a part of their tests. This will provide an extra validation on many of these updates.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Oct 5, 2018

@qihongl, I agree about the geometry comment. See issue #356 for a different idea.

@snastase
Copy link
Contributor Author

@snastase snastase commented Oct 5, 2018

@qihongl @mihaic I definitely agree about having a standard input geometry, and any transposes, slicing, etc happening internally. I would advocate for n_samples (typically time points, conditions) by n_features (typically voxels, electrodes, surface vertices), as this seems to be the most widespread convention and makes the most sense for, e.g., brain decoding analyses. I think standardizing this throughout will be critical for overall usability of BrainIAK software. I'm not very familiar with xarrays, but they look very cool. Seems like the ultimate solution would be to generally assume a standardized 2D geometry for neural data (with checks) but also allow for the increased flexibility / explicitness of xarrays.

@snastase
Copy link
Contributor Author

@snastase snastase commented Oct 5, 2018

@manojneuro I haven't put a lot of effort into optimizing for speed / computing resources yet—was going for flexibility and readability—so I suspect there are several points where we could speed things up! Current version may run slower than the previous implementation until then, but I haven't thoroughly tested this.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Dec 5, 2018

@snastase, regarding your f-string commit, I created issue #393 about the minimum version of dependencies we should support.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Dec 6, 2018

I think removing (not deprecating) the old module brainiak.isfc is actually the way to go considering we are not keeping any function signature unchanged.

In any case, the current approach of duplicating the code is not what we want.

@snastase
Copy link
Contributor Author

@snastase snastase commented Dec 6, 2018

Okay, will remove the brainiak.isfc duplicate (and tests) entirely.

I'm a bit confused about how to handle the current test error regarding linked references in the docstring. I'm getting the following message:

Warning, treated as error:
/home/travis/build/brainiak/brainiak/brainiak/isc.py:docstring of brainiak.isc.permutation_isc:30:Duplicate explicit target name: "chen2016".
make: *** [html] Error 2
/home/travis/build/brainiak/brainiak
make docs failed
The command "./pr-check.sh" exited with 1.

Is this because I have multiple pointers to the same reference? I.e., the Chen2016 reference is used for both bootstrap_isc and permutation_isc. What's the solution to this?

@snastase
Copy link
Contributor Author

@snastase snastase commented Dec 6, 2018

@mihaic also pointed out in an offline conversation that we'll need to adjust the ISC example (https://github.com/brainiak/brainiak/blob/master/examples/isfc/isfc.py) to match the new functionality.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Dec 6, 2018

The error is because of multiple definitions of chen2016. You should remove the second of the definitions and replace it with a reference, e.g.:
The implementation is based on the following publication: [Chen2016]_.
http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#citations

@snastase
Copy link
Contributor Author

@snastase snastase commented Dec 6, 2018

Hey @mihaic, some tests are failing for linux during the conda build, apparently due to problems with pymanopt(?)—any idea what that's about?

@mihaic
Copy link
Contributor

@mihaic mihaic commented Dec 7, 2018

It's not related to your PR (I can see the same error on my machine on the master branch). I'm looking into it, but if you're done before I figure it out, we can merge your PR anyway.

@snastase
Copy link
Contributor Author

@snastase snastase commented Dec 7, 2018

@mihaic I think all the base functionality @qihongl reviewed is basically ready to go. There are other related topics (e.g., updating the examples)... not sure if those are best handled here or in another PR.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Dec 7, 2018

The master branch is supposed to be functional, so I don't think we can postpone updating the examples.

@mihaic
Copy link
Contributor

@mihaic mihaic commented Dec 7, 2018

Also, why do you think this test failed on Linux on Jenkins?

=================================== FAILURES ===================================
_______________________________ test_isc_output ________________________________

    def test_isc_output():
    
        logger.info("Testing ISC outputs")
    
        data = correlated_timeseries(20, 60, noise=0,
                                     random_state=42)
        iscs = isc(data, pairwise=False)
        assert np.all(iscs[:, :2] == 1.)
        assert np.all(iscs[:, -1] < 1.)
    
        iscs = isc(data, pairwise=True)
>       assert np.all(iscs[:, :2] == 1.)
E       assert False
E        +  where False = <function all at 0x7f6aef557e18>(array([[1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],...       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.],\n       [1., 1.]]) == 1.0)
E        +    where <function all at 0x7f6aef557e18> = np.all

@snastase
Copy link
Contributor Author

@snastase snastase commented Dec 7, 2018

@mihaic Okay, the ISC/ISFC example is now functional and updated to use the new code (including new MaskMultiSubjectData shape, etc). I'm not sure why that test would fail. The point was to ensure that the correlation between identical time series was exactly 1. I haven't been able to recreate a situation where the test failed locally. I guess it's possible that the np.corrcoef computation could introduce some tiny deviation and thus break the assertion? I switched the assertion from testing for strictly 1 to np.allclose to introduce a tiny bit of tolerance.

@snastase snastase merged commit e09b345 into brainiak:master Dec 7, 2018
2 of 3 checks passed
@snastase snastase deleted the isc-stats branch Dec 7, 2018
snastase added a commit to snastase/brainiak that referenced this issue Mar 27, 2019
snastase added a commit that referenced this issue Mar 27, 2019
* Tolerance for NaNs for ISC/ISFC, and tests

* Messing with NaN options in isfc

* Option to return NaNs in compute_correlation

* Moved NaN handling into _normalize_for_correlation

* NaN handling for ISFC and tests

* Reshaped ISFC output to n_subjects x n_voxel_pairs

* Fixed whitespace errors etc

* tolerate_nans in phaseshift_isc and timeshift_isc tests

* Added new function to squareform ISFCs and keep diagonal

* Replaced old p_from_null with new version

* Removed ISC test NIfTI data because we're simulating

* Moved phase randomization in phaseshift_isc to utils

* Removed vestigial ecdf from utils and tests

* Moved _check_timeseries_input to utils to fix dependencies

* Fixing duplicate references

* isfc can now take 'targets' as input (asymmetric)

* Fixed up 'targets' for ISFCs, added tests

* Increasing some bits of code coverage

* Modified exception tests to use shmancy pytest.raises

* Added news items for issues and PR

* Updated NEWS directly for outdated PR #384
mihaic added a commit to mihaic/brainiak that referenced this issue Apr 8, 2019
This will add it to the next release, not v0.8, as it should be.
In PR brainiak#403, I mistakenly suggested editing NEWS directly.
mihaic added a commit that referenced this issue Apr 10, 2019
It will be added to the next release where it belongs, not v0.8.
In PR #405, I mistakenly suggested editing NEWS directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants