Add FP16 support to send/recv #6552

belldandyxtq · 2019-03-18T05:08:06Z

This PR add FP16 to send/recv function pair in ChainerMN for mpi.

Thank you for creating a pull request!

Please double-check the following.

Read our contribution guide.
- Does your code conform to our coding guidelines?
- Did you write sufficient test code?
- Did you write sufficient documentation?
Also, take a look at our compatibility policy.

tests/chainermn_tests/functions_tests/test_point_to_point_communication.py

kuenishi

I also wanted you to squash all small commits to one or two large commits when you rebased and force-pushed the update. Several comments added as well.

kuenishi · 2019-03-28T07:27:15Z

tests/chainermn_tests/functions_tests/test_point_to_point_communication.py

+
+def create_communicator(gpu, param):
+    if gpu:
+        communicator = chainermn.create_communicator('hierarchical')


'hierarchical' is to be deprecated in next release. Use 'flat' instead for default communicator.

kuenishi · 2019-03-28T07:30:58Z

tests/chainermn_tests/functions_tests/test_point_to_point_communication.py

+    communicator = create_communicator(gpu, param)
+    rank_send = (communicator.rank + 1) % communicator.size
+    rank_recv = (communicator.rank - 1) % communicator.size
+    train = TrainEnv(gpu, param, communicator)


I don't think TrainEnv is intuitive. I think we concluded create_model-ish name was more intuitive.
f and evaluation are not a variable so they don't necessarily have to be held in the object.

Say, like

from chainer.functions import sigmoid import chainer.functions.mean_squared_error as mse

kuenishi · 2019-03-28T07:35:23Z

tests/chainermn_tests/functions_tests/test_point_to_point_communication.py

+
+@pytest.mark.parametrize('param', params)
+def test_tuple_communication1_cpu(param):
+    check_tuple_communication(1, False, param)


Why don't you just parameterize the length 1 and 2 here?

I took over him as reviewer and his comment was already resolved.

kuenishi

Sigmoid is a well known function for NN and MSE is also a well known function for evaluation. I don't think we don't have to replace them with a more abstract name. These changes may change the intention of original code but would be more intuitive so far.

Also, Travis is failing, IMO due to test code bug. Please fix it.

Also, the mainstream this PR is based on is broken. Please rebase against the latest master to introduce the fix by #6679 .

kuenishi · 2019-03-29T00:40:13Z

tests/chainermn_tests/functions_tests/test_point_to_point_communication.py

            # Input process.
-            y = self.f(self.model(self.x))
+            y = function(model(x))


Isn't sigmoid(model(x)) just enough?

I want to avoid hard coding. Using 'function' can be more convenient when we want to change to a different one, say ReLU. The same goes to 'evaluation'.

kuenishi · 2019-03-29T00:40:26Z

tests/chainermn_tests/functions_tests/test_point_to_point_communication.py

-            err_ = self.evaluation(x_, self.x)
+            x_ = x
+            for l in range(communicator.size):
+                x_ = function(entire_model[l](x_))


kuenishi · 2019-03-29T00:41:01Z

tests/chainermn_tests/functions_tests/test_point_to_point_communication.py

-            err = self.evaluation(x, t)
+            x_ = chainermn.functions.recv(communicator, rank_recv,
+                                          delegate_variable=dlg)
+            err = evaluation(x_, t)


ditto; mse(x_, t)

Add the tests first. done for send & recv fix for pep8

split the `Variable` class to improve readabilty. fix for pep8 reflact uenishi's comments fix bug fix indent

belldandyxtq · 2019-03-29T11:37:43Z

Jenkins, test this please

chainer-ci · 2019-03-29T12:46:15Z

Jenkins CI test (for commit 2a9caa8, target branch master) succeeded!

belldandyxtq added cat:feature Implementation that introduces new interfaces. ChainerMN Related to ChainerMN. labels Mar 18, 2019

belldandyxtq changed the title ~~[WIP] Add FP16 support to functions~~ [WIP] Add FP16 support to send/recv Mar 18, 2019

belldandyxtq changed the title ~~[WIP] Add FP16 support to send/recv~~ Add FP16 support to send/recv Mar 18, 2019

shu65 previously requested changes Mar 18, 2019

View reviewed changes

tests/chainermn_tests/functions_tests/test_point_to_point_communication.py Outdated Show resolved Hide resolved

belldandyxtq force-pushed the add_fp16_support_to_func branch from b93ec1c to 8a618af Compare March 26, 2019 09:34

shu65 mentioned this pull request Mar 28, 2019

FP16 support in various functions and links of ChainerMN #6161

Closed

10 tasks

keisukefukuda added this to the v6.0.0rc1 milestone Mar 28, 2019

keisukefukuda requested a review from kuenishi March 28, 2019 05:54

belldandyxtq force-pushed the add_fp16_support_to_func branch from a0812fb to 7529b6e Compare March 28, 2019 07:02

kuenishi requested changes Mar 28, 2019

View reviewed changes

belldandyxtq force-pushed the add_fp16_support_to_func branch from 7529b6e to 3034275 Compare March 28, 2019 10:48

kuenishi requested changes Mar 29, 2019

View reviewed changes

belldandyxtq changed the title ~~Add FP16 support to send/recv~~ [WIP] Add FP16 support to send/recv Mar 29, 2019

belldandyxtq added 3 commits March 29, 2019 13:57

Add FP16 support to functions

286f7c1

Add the tests first. done for send & recv fix for pep8

change to use parametrize

56f232c

use chainer.dtype

b17c773

belldandyxtq force-pushed the add_fp16_support_to_func branch from 3034275 to 0f5ef80 Compare March 29, 2019 04:59

Split Variable

2a9caa8

split the `Variable` class to improve readabilty. fix for pep8 reflact uenishi's comments fix bug fix indent

belldandyxtq force-pushed the add_fp16_support_to_func branch from 0f5ef80 to 2a9caa8 Compare March 29, 2019 07:54

belldandyxtq changed the title ~~[WIP] Add FP16 support to send/recv~~ Add FP16 support to send/recv Mar 29, 2019

shu65 modified the milestones: v6.0.0rc1, v6 Apr 4, 2019

kuenishi modified the milestones: v6, Future Task, v7.0.0a1 Apr 4, 2019

kuenishi merged commit 56cdf0e into chainer:master Apr 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FP16 support to send/recv #6552

Add FP16 support to send/recv #6552

belldandyxtq commented Mar 18, 2019 •

edited

Loading

kuenishi left a comment

kuenishi Mar 28, 2019

kuenishi Mar 28, 2019

kuenishi Mar 28, 2019

kuenishi Mar 28, 2019

kuenishi left a comment •

edited

Loading

kuenishi Mar 29, 2019

belldandyxtq Mar 29, 2019

kuenishi Mar 29, 2019

kuenishi Mar 29, 2019

belldandyxtq commented Mar 29, 2019

chainer-ci commented Mar 29, 2019

Add FP16 support to send/recv #6552

Add FP16 support to send/recv #6552

Conversation

belldandyxtq commented Mar 18, 2019 • edited Loading

kuenishi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuenishi left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

belldandyxtq commented Mar 29, 2019

chainer-ci commented Mar 29, 2019

belldandyxtq commented Mar 18, 2019 •

edited

Loading

kuenishi left a comment •

edited

Loading