Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed srm #220

Merged
merged 14 commits into from May 11, 2017
Merged

Distributed srm #220

merged 14 commits into from May 11, 2017

Conversation

@mjanderson09
Copy link
Contributor

@mjanderson09 mjanderson09 commented Apr 28, 2017

No description provided.

@@ -132,6 +144,8 @@ class SRM(BaseEstimator, TransformerMixin):
rho2_ : array, shape=[subjects]
The estimated noise variance :math:`\\rho_i^2` for each subject
comm_ : mpi4py.MPI.Intracomm
The MPI communicator containing the data

This comment has been minimized.

@mihaic

mihaic Apr 28, 2017
Contributor

comm should be used instead of comm_ everywhere, because it is given as a parameter, not computed from data:
http://scikit-learn.org/stable/developers/contributing.html#instantiation
http://scikit-learn.org/stable/developers/contributing.html#estimated-attributes

for subject in range(subjects):
mu.append(np.mean(data[subject], 1))
np.random.seed(subject)

This comment has been minimized.

@mihaic

mihaic Apr 28, 2017
Contributor

How does this interact with self.rand_seed?

This comment has been minimized.

@mihaic

mihaic Apr 28, 2017
Contributor

Not addressed.

This comment has been minimized.

@TuKo

TuKo Apr 28, 2017
Contributor

Check issue #114
Is it right to initialize the distributed SRM randomly?

This comment has been minimized.

@mjanderson09

mjanderson09 May 1, 2017
Author Contributor

I removed this random seed per subject code. Instead, each SRM class instantiation can set a random seed (as before). This means for multiple ranks, each rank can potentially set a different seed. I don't think we can guarantee exactly deterministic results for the distributed implementation due to differences in parallel scheduling per run.

@@ -191,3 +223,4 @@ def test_det_srm():
print("Test: different number of samples per subject")


test_can_instantiate()

This comment has been minimized.

@mihaic

mihaic Apr 28, 2017
Contributor

Wrap this call in if __name__ == "__main__", please.

@mihaic mihaic requested a review from TuKo Apr 28, 2017
mjanderson09 added 2 commits May 1, 2017
@mihaic
Copy link
Contributor

@mihaic mihaic commented May 1, 2017

Jenkins, retest this please.

1 similar comment
@mihaic
Copy link
Contributor

@mihaic mihaic commented May 2, 2017

Jenkins, retest this please.

@mihaic
mihaic approved these changes May 2, 2017
@TuKo
Copy link
Contributor

@TuKo TuKo commented May 2, 2017

@mjanderson09 great job!
Please, consider adding the TS-SVD and TS-QR code as well. We should allow for reproducibility of the IEEE BigData algorithm. Also, all SRM flavors will benefit from it, and some other methods as well.

Don't need to do it now. You can do it in a different PR.

Thank you!

mjanderson09 added 2 commits May 3, 2017
@@ -62,6 +63,9 @@ def test_can_instantiate():

# Check that runs with 2 subject
s.fit(X)
sr_v0_4 = np.load('sr_v0_4.npy')
diff_v0_4 = np.linalg.norm(sr_v0_4 - s.s_)
assert(diff_v0_4 < 1e-5)

This comment has been minimized.

@mihaic

mihaic May 8, 2017
Contributor

Shouldn't the results be element-wise equal, i.e., np.allclose(sr_v0_4, s.s_), maybe with an atol value?

@@ -62,6 +63,9 @@ def test_can_instantiate():

# Check that runs with 2 subject
s.fit(X)
sr_v0_4 = np.load('sr_v0_4.npy')

This comment has been minimized.

@mihaic

mihaic May 8, 2017
Contributor

Add this to imports:
from pathlib import Path

Then:
Path(__file__).parent / "sr_v0_4.npy"

@mihaic
mihaic approved these changes May 10, 2017
@mihaic
Copy link
Contributor

@mihaic mihaic commented May 10, 2017

@TuKo, what say you?

@TuKo
Copy link
Contributor

@TuKo TuKo commented May 10, 2017

@TuKo says NIPS, sorry...

@@ -190,4 +194,4 @@ def test_det_srm():
model.fit(X)
print("Test: different number of samples per subject")


test_can_instantiate()

This comment has been minimized.

@mihaic

mihaic May 11, 2017
Contributor

Please guard this call with if __name__ == "__main__".

@mihaic
mihaic approved these changes May 11, 2017
@TuKo
Copy link
Contributor

@TuKo TuKo commented May 11, 2017

👍 Great work!

@TuKo
TuKo approved these changes May 11, 2017
@mihaic mihaic merged commit 3e039bd into brainiak:master May 11, 2017
3 checks passed
3 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
linux Build finished.
Details
macos Build finished.
Details
@mihaic
Copy link
Contributor

@mihaic mihaic commented May 11, 2017

For the record, here is the effect of the changes on the accuracy in the image prediction example:

v0.4:
SRM: The average accuracy among all subjects is 0.657143 +/- 0.058576
Det. SRM: The average accuracy among all subjects is 0.657143 +/- 0.070891

master (i.e., per-subject random number generators):
SRM: The average accuracy among all subjects is 0.664286 +/- 0.076097
Det. SRM: The average accuracy among all subjects is 0.673214 +/- 0.068721

distributed_srm:
SRM: The average accuracy among all subjects is 0.664286 +/- 0.076097
Det. SRM: The average accuracy among all subjects is 0.673214 +/- 0.068721
danielsuo pushed a commit that referenced this pull request Nov 16, 2017
#220)

* Added test for retriving variables from an optimizer

* Added comments to test

* Addressed comments

* Fixed travis bug

* Added fix to circular controls

* Added set for explored operations and duplicate prefix stripping

* Removed embeded ipython

* Removed prefix, use seperate graph for each network

* Removed redundant imports

* Addressed comments and added separate graph to initializer

* fix typos

* get rid of prefix in documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants