Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix array indexing in create_multi_node_evaluator #8568

Merged
merged 1 commit into from
Jul 20, 2020

Conversation

emcastillo
Copy link
Member

@emcastillo emcastillo commented Jun 17, 2020

Closes #8546

In some cases local_mean_dict.values() is empty. For example in the above issue, it is empty for the rank 1 since the example explicitly creates an empty dataset for this rank.

if data_axis == 1:
train = chainermn.datasets.create_empty_dataset(train)
test = chainermn.datasets.create_empty_dataset(test)

Also metrics are not reported for rank 1.

if data_axis == 0:
model = L.Classifier(MLP0(model_comm, args.unit))
elif data_axis == 1:
model = MLP1(model_comm, args.unit, 10)

@emcastillo emcastillo changed the title Fix array indexing in create_multi_node_evaluator Fix array indexing in create_multi_node_evaluator Jun 17, 2020
@kmaehashi kmaehashi added the ChainerMN Related to ChainerMN. label Jun 18, 2020
@kmaehashi kmaehashi added this to the v7.6.0 milestone Jun 18, 2020
@keisukefukuda keisukefukuda self-requested a review June 18, 2020 09:25
@keisukefukuda
Copy link
Member

pfnci, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 813a518, target branch master) succeeded!

Copy link
Member

@keisukefukuda keisukefukuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test are failing. This does not seem to be related to this PR, but could you take a look at it?

@emcastillo
Copy link
Member Author

Yes, tests has been failing in PfnCI for a while due to out of memory errors,
We have to fix this 🙇‍♂️.
Jenkins worked fine and the errors are not related to the PR so I think it is safe to merge.

Sorry and thank you again.

@emcastillo
Copy link
Member Author

All the CI issues were fixed, the current failing job is due to having the env variables set on job creation so it can't be overridden with the new values unless the CIs are kicked again ☹️

@kmaehashi
Copy link
Member

pfnCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 813a518, target branch master) failed with status FAILURE.

@emcastillo
Copy link
Member Author

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 813a518, target branch master) succeeded!

@emcastillo
Copy link
Member Author

@keisukefukuda we fixed the CIs and now all test passed :D

@keisukefukuda keisukefukuda merged commit 5ca6e5d into chainer:master Jul 20, 2020
@kmaehashi kmaehashi added the cat:bug Bug report or fix. label Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:bug Bug report or fix. ChainerMN Related to ChainerMN.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to run mnist dual parallel example
4 participants