Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add different dtype input test in histogram #3618

Merged
merged 6 commits into from
Sep 17, 2020

Conversation

okuta
Copy link
Member

@okuta okuta commented Jul 15, 2020

cub.device_histogram assumes following dtype combination.

  • same float dptye x and bin_edge
  • x is integer dtype, binedge is double dtype

@okuta okuta changed the title Fix diff dtype histogram Fix different dtype input case in histogram Jul 15, 2020
@leofang
Copy link
Member

leofang commented Jul 15, 2020

cub.device_histogram assumes following dtype combination.

  • same float dptye x and bin_edge
  • x is integer dtype, binedge is double dtype

It's not CUB's requirement, but actually NumPy's:
https://github.com/numpy/numpy/blob/2f3b82638a24204a13c9b16a3a36a59f199df41d/numpy/lib/histograms.py#L442-L443
So I simply used SFINAE to specialize kernels that satisfy this requirement:

cupy/cupy/cuda/cupy_cub.cu

Lines 585 to 589 in 0af4d97

struct _cub_histogram_range {
template <typename sampleT,
typename binT = typename If<std::is_integral<sampleT>::value, double, sampleT>::Type>
void operator()(void* workspace, size_t& workspace_size, void* input, void* output,
int n_bins, void* bins, size_t n_samples, cudaStream_t s) const

Comment on lines 224 to 225
else:
x = x.astype(bin_type, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this. It's usually cheaper to change the bins than the input array, as the latter is often much larger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will fix!

bins = testing.shaped_arange((10,), xp, dtype_a)

if xp is numpy:
return xp.histogram(x, bins)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create another test, say test_histogram_with_bins2, to test xp.histogram(x, bins)[1]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added.

bin_type = numpy.float
else:
x = x.astype(bin_type, copy=False)
acc_bin_edge = bin_edges.astype(bin_type, copy=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly you wanna hide the cost of the second bin shift by creating a copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Comment on lines 220 to 219
assert isinstance(bin_edges, cupy.ndarray)
bin_type = numpy.result_type(bin_edges.dtype, x.dtype)
if numpy.issubdtype(x.dtype, numpy.integer):
bin_type = numpy.float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cleaner 👍 But, when _get_bin_edges() generates bin_edges, if bins is int this type requirement is already satisfied, so it's why we first test for if isinstance(bins, cupy.ndarray):. Would be nice to keep the if branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, it seems to be faster.
The point is that bins needs type conversion even when numpy.array or list is an input.
How about if not is instance(bins, int):?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's prepare for #3542.

tests/cupy_tests/statics_tests/test_histogram.py Outdated Show resolved Hide resolved
@takagi
Copy link
Member

takagi commented Jul 16, 2020

pfnCI, test this please.

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 4ea497f:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 4ea497f, target branch master) failed with status FAILURE.

@takagi
Copy link
Member

takagi commented Jul 18, 2020

The CI also gets wrong with float32?

>>> import cupy
>>> x = cupy.testing.shaped_arange((10,), cupy, 'f')
>>> bins = cupy.arange(1, 10.1, 0.9)
>>> bins[-1] = 10.1
>>> y = cupy.empty(10, 'l')
>>> cupy.cuda.cub.device_histogram(x, bins, y)
array([1, 1, 0, 0, 0, 0, 0, 8, 0, 0])
>>> bins = bins.astype('f')
>>> cupy.cuda.cub.device_histogram(x, bins, y)
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

@okuta
Copy link
Member Author

okuta commented Jul 19, 2020

The CI also gets wrong with float32?

We have to use double type bins, so bins = bins.astype('d') is correct. @takagi

@leofang
Copy link
Member

leofang commented Jul 19, 2020

In @takagi's snippet, x is fp32, so bins should also be fp32.

@kmaehashi kmaehashi added cat:bug Bugs st:blocked-by-another-pr Blocked by another pull-request labels Aug 4, 2020
@kmaehashi
Copy link
Member

Blocked by #3617

@takagi takagi removed the st:blocked-by-another-pr Blocked by another pull-request label Aug 25, 2020
@takagi
Copy link
Member

takagi commented Aug 25, 2020

As #3617 is merged to the master now, would you resolve the conflict?

@okuta okuta changed the title Fix different dtype input case in histogram Add different dtype input test in histogram Aug 26, 2020
@okuta
Copy link
Member Author

okuta commented Aug 26, 2020

I fixed conflicts. So, this PR only adds testcase

@takagi
Copy link
Member

takagi commented Aug 28, 2020

pfnCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 229e52e, target branch master) failed with status FAILURE.

@okuta
Copy link
Member Author

okuta commented Aug 31, 2020

pfnCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 229e52e, target branch master) failed with status FAILURE.

@okuta
Copy link
Member Author

okuta commented Aug 31, 2020

pfnCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 229e52e, target branch master) failed with status FAILURE.

@takagi
Copy link
Member

takagi commented Aug 31, 2020

Waiting for CI problem fixed.

@takagi
Copy link
Member

takagi commented Sep 8, 2020

pfnCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 229e52e, target branch master) failed with status FAILURE.

@takagi takagi added the to-be-backported Pull-requests to be backported to stable branch label Sep 15, 2020
@okuta
Copy link
Member Author

okuta commented Sep 16, 2020

pfnCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 306ed6f, target branch master) succeeded!

@takagi takagi added this to the v9.0.0a1 milestone Sep 17, 2020
@takagi
Copy link
Member

takagi commented Sep 17, 2020

LGTM!

@takagi takagi merged commit 2351f05 into cupy:master Sep 17, 2020
takagi added a commit to takagi/cupy that referenced this pull request Sep 17, 2020
Add different dtype input test in histogram
@takagi takagi added cat:test Test code / CI and removed cat:bug Bugs to-be-backported Pull-requests to be backported to stable branch labels Sep 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:test Test code / CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants