-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FP16 support to send/recv #6552
Add FP16 support to send/recv #6552
Conversation
tests/chainermn_tests/functions_tests/test_point_to_point_communication.py
Outdated
Show resolved
Hide resolved
b93ec1c
to
8a618af
Compare
a0812fb
to
7529b6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wanted you to squash all small commits to one or two large commits when you rebased and force-pushed the update. Several comments added as well.
|
||
def create_communicator(gpu, param): | ||
if gpu: | ||
communicator = chainermn.create_communicator('hierarchical') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'hierarchical' is to be deprecated in next release. Use 'flat' instead for default communicator.
communicator = create_communicator(gpu, param) | ||
rank_send = (communicator.rank + 1) % communicator.size | ||
rank_recv = (communicator.rank - 1) % communicator.size | ||
train = TrainEnv(gpu, param, communicator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think TrainEnv
is intuitive. I think we concluded create_model
-ish name was more intuitive.
f
and evaluation
are not a variable so they don't necessarily have to be held in the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say, like
from chainer.functions import sigmoid
import chainer.functions.mean_squared_error as mse
|
||
@pytest.mark.parametrize('param', params) | ||
def test_tuple_communication1_cpu(param): | ||
check_tuple_communication(1, False, param) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you just parameterize the length 1 and 2 here?
I took over him as reviewer and his comment was already resolved.
7529b6e
to
3034275
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigmoid is a well known function for NN and MSE is also a well known function for evaluation. I don't think we don't have to replace them with a more abstract name. These changes may change the intention of original code but would be more intuitive so far.
Also, Travis is failing, IMO due to test code bug. Please fix it.
Also, the mainstream this PR is based on is broken. Please rebase against the latest master to introduce the fix by #6679 .
# Input process. | ||
y = self.f(self.model(self.x)) | ||
y = function(model(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't sigmoid(model(x))
just enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid hard coding. Using 'function' can be more convenient when we want to change to a different one, say ReLU. The same goes to 'evaluation'.
err_ = self.evaluation(x_, self.x) | ||
x_ = x | ||
for l in range(communicator.size): | ||
x_ = function(entire_model[l](x_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
err = self.evaluation(x, t) | ||
x_ = chainermn.functions.recv(communicator, rank_recv, | ||
delegate_variable=dlg) | ||
err = evaluation(x_, t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto; mse(x_, t)
Add the tests first. done for send & recv fix for pep8
3034275
to
0f5ef80
Compare
split the `Variable` class to improve readabilty. fix for pep8 reflact uenishi's comments fix bug fix indent
0f5ef80
to
2a9caa8
Compare
Jenkins, test this please |
Jenkins CI test (for commit 2a9caa8, target branch master) succeeded! |
This PR add FP16 to send/recv function pair in ChainerMN for mpi.
Thank you for creating a pull request!
Please double-check the following.