Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix backend.copyto for mismatched dtypes to CuPy ndarray #8043

Merged
merged 3 commits into from
Apr 3, 2020

Conversation

hvy
Copy link
Member

@hvy hvy commented Aug 27, 2019

Fixes a parameter initialization bug. See #7965 (comment).

@hvy hvy added the cat:bug Bug report or fix. label Aug 27, 2019
@toslunar toslunar self-assigned this Aug 27, 2019
@@ -65,7 +65,8 @@ def copyto(dst, src):
elif isinstance(dst, cuda.ndarray):
if isinstance(src, chainer.get_cpu_array_types()):
src = numpy.asarray(src)
if dst.flags.c_contiguous or dst.flags.f_contiguous:
if (src.dtype == dst.dtype
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set does not accept an array of different dtype, so we choose to fall through if dtypes don't match.

Copy link
Member

@toslunar toslunar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyto LGTM. You may split the PR for TestUninitializedParameterWithDevices.

x.initialize(a.shape)
assert isinstance(x.data, xp.ndarray)
assert x._has_chainerx_array is (xp is chainerx)
xp.testing.assert_array_equal(x.data, xp.asarray(a))
xp.testing.assert_array_equal(
x.data.astype(expected_dtype, copy=False), xp.asarray(a))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose

assert x.dtype == expected_dtype
xp.testing.assert_array_equal(x.data, xp.asarray(a, expected_dtype))

x.initialize(a.shape)
assert isinstance(x.data, xp.ndarray)
assert x._has_chainerx_array is (xp is chainerx)
xp.testing.assert_array_equal(x.data, xp.asarray(a))
xp.testing.assert_array_equal(
x.data.astype(expected_dtype, copy=False), xp.asarray(a))
xp.testing.assert_array_equal(x.grad, np.float32('nan'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you keep np.float32? It could be np.nan or float('nan').

@gwtnb gwtnb added the to-be-backported Pull request that should be backported. label Aug 27, 2019
@hvy
Copy link
Member Author

hvy commented Aug 28, 2019

Thanks for the comments. I'll address them in a separate PR. Removed test_variable.py from this one. PTAL.

@niboshi
Copy link
Member

niboshi commented Aug 28, 2019

Should copyto support dtype casting?
I think copyto is a low-level function which usual users don't use.
I feel it's better to fix the caller to handle dtypes rather than adding convenient features into copyto.

@toslunar
Copy link
Member

Random ideas:

  • Automatic casting is error-prone, assuming that backend.copyto is used in model definitions more often than in training loops.
  • backend.copyto could always check dtypes.
  • strides (or just contiguity) might also matter.
  • numpy.copyto supports casting.

@hvy
Copy link
Member Author

hvy commented Aug 28, 2019

I'm okay with making it strict, not allowing different dtypes. This functions is by no means a hot spot so we can afford some checking. We could also take inspiration from numpy.copyto and allow the caller to control the casting behavior, but that's a bit overkill considering the current use cases of this function.

@toslunar do you have concrete cases where stride checking would be beneficial?

@toslunar
Copy link
Member

@toslunar do you have concrete cases where stride checking would be beneficial?

Not really. Loading NHWC-layout parameters, maybe. A low-level copy-to sounds like memcpy.

@toslunar
Copy link
Member

The PR is no longer a PR-blocker because #8046 is closed.

@toslunar toslunar removed the pr-blocker Blocking other PRs. label Sep 12, 2019
@toslunar toslunar self-requested a review October 28, 2019 05:26
@toslunar toslunar added the st:needs-discussion State indicating that discussions are needed before proceeding. label Oct 28, 2019
@stale
Copy link

stale bot commented Jan 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Not updated for a longer period of time. label Jan 26, 2020
@toslunar toslunar added this to the v7.3.0 milestone Feb 4, 2020
@stale stale bot removed the stale Not updated for a longer period of time. label Feb 4, 2020
@toslunar
Copy link
Member

We discussed the status of the PR internally.

  • The issue is classified as a bug because the behavior depends on the contiguity of the arrays.
  • Since the master branch is stable now, it's better to align the behavior to the permissive side, i.e. allowing casting always (rather than denying it).

@toslunar toslunar removed st:needs-discussion State indicating that discussions are needed before proceeding. to-be-backported Pull request that should be backported. labels Feb 21, 2020
@toslunar
Copy link
Member

Jenkins, test this please.

@toslunar toslunar added the st:test-and-merge State indicating that pull request is approved by a reviewer and can be merged after CI passes. label Feb 21, 2020
@chainer-ci
Copy link
Member

Jenkins CI test (for commit 0aa2ad7, target branch master) succeeded!

@chainer-ci
Copy link
Member

@toslunar This pull-request is marked as st:test-and-merge, but there were no activities for the last 3 days. Could you check?

@toslunar
Copy link
Member

Jenkins, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 5bbb615, target branch master) succeeded!

@toslunar toslunar removed the st:test-and-merge State indicating that pull request is approved by a reviewer and can be merged after CI passes. label Feb 25, 2020
@toslunar
Copy link
Member

/test

@kmaehashi kmaehashi assigned emcastillo and unassigned toslunar Mar 17, 2020
@toslunar
Copy link
Member

/test

@emcastillo
Copy link
Member

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 5bbb615, target branch master) failed with status FAILURE.

@emcastillo
Copy link
Member

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 5bbb615, target branch master) succeeded!

@toslunar
Copy link
Member

FlexCI failures: timeout of chainer/tests/chainer_tests/functions_tests/rnn_tests/test_function_slstm.py

@emcastillo
Copy link
Member

emcastillo commented Mar 30, 2020

Re-running pfCI still fails on the same tests

[FAILED] /chainer/tests/chainer_tests/functions_tests/normalization_tests/test_l2_normalization.py (60 failed, 486 passed, 102 skipped, 648 deselected in 36.70 seconds)
--
E   cupy.cuda.driver.CUDADriverError: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: too many resources requested for launch |  
2020-03-30 18:56:34.864857 STDOUT 2433] |   |  
2020-03-30 18:56:34.864858 STDOUT 2433] | cupy/cuda/driver.pyx:124: CUDADriverError |  
2020-03-30 18:56:34.864859 STDOUT 2433]

Both of them added some extra errors.
Will clone and verify locally 😢

@emcastillo
Copy link
Member

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 5bbb615, target branch master) succeeded!

@emcastillo
Copy link
Member

Jenkins, test this please

@emcastillo emcastillo added the st:test-and-merge State indicating that pull request is approved by a reviewer and can be merged after CI passes. label Apr 3, 2020
@chainer-ci
Copy link
Member

Jenkins CI test (for commit 5bbb615, target branch master) succeeded!

@emcastillo emcastillo merged commit 0e23958 into chainer:master Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:bug Bug report or fix. st:test-and-merge State indicating that pull request is approved by a reviewer and can be merged after CI passes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants