Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FP16 support to send/recv #6552

Merged
merged 4 commits into from
Apr 5, 2019

Conversation

belldandyxtq
Copy link
Member

@belldandyxtq belldandyxtq commented Mar 18, 2019

This PR add FP16 to send/recv function pair in ChainerMN for mpi.

Thank you for creating a pull request!

Please double-check the following.

@belldandyxtq belldandyxtq added cat:feature Implementation that introduces new interfaces. ChainerMN Related to ChainerMN. labels Mar 18, 2019
@belldandyxtq belldandyxtq changed the title [WIP] Add FP16 support to functions [WIP] Add FP16 support to send/recv Mar 18, 2019
@belldandyxtq belldandyxtq changed the title [WIP] Add FP16 support to send/recv Add FP16 support to send/recv Mar 18, 2019
Copy link
Member

@kuenishi kuenishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wanted you to squash all small commits to one or two large commits when you rebased and force-pushed the update. Several comments added as well.


def create_communicator(gpu, param):
if gpu:
communicator = chainermn.create_communicator('hierarchical')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'hierarchical' is to be deprecated in next release. Use 'flat' instead for default communicator.

communicator = create_communicator(gpu, param)
rank_send = (communicator.rank + 1) % communicator.size
rank_recv = (communicator.rank - 1) % communicator.size
train = TrainEnv(gpu, param, communicator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think TrainEnv is intuitive. I think we concluded create_model-ish name was more intuitive.
f and evaluation are not a variable so they don't necessarily have to be held in the object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say, like

from chainer.functions import sigmoid
import chainer.functions.mean_squared_error as mse


@pytest.mark.parametrize('param', params)
def test_tuple_communication1_cpu(param):
check_tuple_communication(1, False, param)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you just parameterize the length 1 and 2 here?

@kuenishi kuenishi dismissed shu65’s stale review March 28, 2019 07:39

I took over him as reviewer and his comment was already resolved.

Copy link
Member

@kuenishi kuenishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigmoid is a well known function for NN and MSE is also a well known function for evaluation. I don't think we don't have to replace them with a more abstract name. These changes may change the intention of original code but would be more intuitive so far.

Also, Travis is failing, IMO due to test code bug. Please fix it.

Also, the mainstream this PR is based on is broken. Please rebase against the latest master to introduce the fix by #6679 .

# Input process.
y = self.f(self.model(self.x))
y = function(model(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't sigmoid(model(x)) just enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid hard coding. Using 'function' can be more convenient when we want to change to a different one, say ReLU. The same goes to 'evaluation'.

err_ = self.evaluation(x_, self.x)
x_ = x
for l in range(communicator.size):
x_ = function(entire_model[l](x_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

err = self.evaluation(x, t)
x_ = chainermn.functions.recv(communicator, rank_recv,
delegate_variable=dlg)
err = evaluation(x_, t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto; mse(x_, t)

@belldandyxtq belldandyxtq changed the title Add FP16 support to send/recv [WIP] Add FP16 support to send/recv Mar 29, 2019
split the `Variable` class to improve readabilty.

fix for pep8

reflact uenishi's comments

fix bug

fix indent
@belldandyxtq
Copy link
Member Author

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 2a9caa8, target branch master) succeeded!

@belldandyxtq belldandyxtq changed the title [WIP] Add FP16 support to send/recv Add FP16 support to send/recv Mar 29, 2019
@shu65 shu65 modified the milestones: v6.0.0rc1, v6 Apr 4, 2019
@kuenishi kuenishi modified the milestones: v6, Future Task, v7.0.0a1 Apr 4, 2019
@kuenishi kuenishi merged commit 56cdf0e into chainer:master Apr 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature Implementation that introduces new interfaces. ChainerMN Related to ChainerMN.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants