Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cupyx.scipy.stats.entropy #4369

Merged
merged 6 commits into from
Dec 8, 2020
Merged

add cupyx.scipy.stats.entropy #4369

merged 6 commits into from
Dec 8, 2020

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Nov 30, 2020

This PR adds the scipy.stats function, entropy. This implementation relies on the already existing cupyx.scipy.special.entr and cupyx.scipy.special.rel_entr functions and so should be simple to review.

There is not currently a cupyx.scipy.stats module, so this PR has added it. However, I do not currently have plans to implement other functions within the stats module. This function is used within a scikit-image function we created a CUDA equivalent for in cupyimg. If there is not an interest in having stats functions within CuPy itself, we can just keep this implementation there.

Benchmark with pk input only

shape GPU time (ms) CPU time (ms) acceleration
(256,) 0.045 +/- 0.002 0.019 +/- 0.001 0.420
(512, 512) 0.117 +/- 0.002 5.334 +/- 0.067 45.745
(2048, 4096) 3.266 +/- 0.213 181.113 +/- 1.445 55.447
(192, 192, 192) 1.597 +/- 0.153 162.830 +/- 1.174 101.973

Benchmark with pk and qk inputs

shape GPU time (ms) CPU time (ms) acceleration
(256,) 0.075 +/- 0.009 0.022 +/- 0.003 0.295
(512, 512) 0.162 +/- 0.003 2.807 +/- 0.100 17.349
(2048, 4096) 4.317 +/- 0.266 100.531 +/- 1.208 23.288
(192, 192, 192) 1.985 +/- 0.002 96.379 +/- 0.544 48.560
benchmark script

import numpy
import scipy.stats

import cupy
import cupyx.scipy.stats
from cupyx.time import repeat

print("## Benchmark with pk input only")
print("shape | GPU time (ms) | CPU time (ms) | acceleration")
print("------|---------------|---------------|-------------")
for shape in [(256,), (512, 512), (2048, 4096), (192, 192, 192)]:
pg = cupy.testing.shaped_random(shape, dtype=numpy.float32)
perf_gpu = repeat(cupyx.scipy.stats.entropy, (pg,), n_warmup=20, n_repeat=100)
gpu_times = perf_gpu.gpu_times * 1000

p = cupy.asnumpy(pg)
perf_cpu = repeat(scipy.stats.entropy, (p,), n_warmup=1, n_repeat=20)
cpu_times = perf_cpu.gpu_times * 1000
accel = cpu_times.mean() / gpu_times.mean()
print(f"{shape} | {gpu_times.mean():0.3f} +/- {gpu_times.std():0.3f} | {cpu_times.mean():0.3f} +/- {cpu_times.std():0.3f} | {accel:0.3f}")

print("## Benchmark with pk and qk inputs")
print("shape | GPU time (ms) | CPU time (ms) | acceleration")
print("------|---------------|---------------|-------------")
for shape in [(256,), (512, 512), (2048, 4096), (192, 192, 192)]:
pg = cupy.testing.shaped_random(shape, dtype=numpy.float32)
qg = cupy.testing.shaped_random(shape, dtype=numpy.float32)
perf_gpu = repeat(cupyx.scipy.stats.entropy, (pg, qg), n_warmup=20, n_repeat=100)
gpu_times = perf_gpu.gpu_times * 1000

p = cupy.asnumpy(pg)
q = cupy.asnumpy(qg)
perf_cpu = repeat(scipy.stats.entropy, (p, q), n_warmup=1, n_repeat=20)
cpu_times = perf_cpu.gpu_times * 1000
accel = cpu_times.mean() / gpu_times.mean()
print(f"{shape} | {gpu_times.mean():0.3f} +/- {gpu_times.std():0.3f} | {cpu_times.mean():0.3f} +/- {cpu_times.std():0.3f} | {accel:0.3f}")

@grlee77
Copy link
Contributor Author

grlee77 commented Nov 30, 2020

Let me know if the basic tests in TestEntropyBasic should be removed. These are just copies of simple test casess from SciPy's test suite. The TestEntropy class I added is in a more typical CuPy style where direct comparison to SciPy outputs is done.

@takagi
Copy link
Member

takagi commented Dec 4, 2020

pfnCI, test this please.

@takagi
Copy link
Member

takagi commented Dec 4, 2020

Thanks for this PR! Yes, we'd like to have cupyx.scipy.stats 👍

@takagi
Copy link
Member

takagi commented Dec 4, 2020

Would you change the submodule name cupyx.scipy.stats.distribution to cupyx.scipy.stats._distribution so we indicate it is not intended to expose as public one?

@takagi
Copy link
Member

takagi commented Dec 4, 2020

Also nice to have TestEntropyBasic!

self.base, self.axis, self.normalize)

@testing.numpy_cupy_allclose(atol=1e-3, rtol=5e-3, scipy_name='scp')
def test_entropy_float16(self, xp, scp):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now testing.numpy_cupy_allclose accepts per-dtype tolerance #4269. Would you try that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a nice feature, but I have problems with it in this case. I removing the float16 test case and using a rtol dictionary on the other one like:

    @testing.numpy_cupy_allclose(rtol={cupy.float16: 1e-3,
                                       cupy.float32: 1e-6,
                                       'default': 1e-12},
                                 scipy_name='scp')

However the cupyx.scipy.special functions used are in float32 and float64 only, so float16 input -> float32 output and tol=1e-6 will get used. This indicates that I should probably upcast to float32 earlier in the function so that the normalization computations using cupy.sum() are of equivalent accuracy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the input with a larger size axis of (128,) does not pass at 1e-12 for double precision. This might be due to lower accuracy in sum in CuPy vs. NumPy (in some cases, numpy uses some fancier pairwise summation to reduce accumulated errors in sums)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I also checked that scipy.stats.entropy returns float32 value for float16 input.

@grlee77
Copy link
Contributor Author

grlee77 commented Dec 4, 2020

I also noticed that I have cupy.asarray calls on qk/pk, but we probably don't want these to avoid unintentional host/device transfer. I will remove them.

cast input to at least float32
@chainer-ci
Copy link
Member

Jenkins CI test (for commit f7836e7, target branch master) succeeded!

@takagi
Copy link
Member

takagi commented Dec 4, 2020

pfnCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit cec146d, target branch master) succeeded!

remove unused numpy import
@takagi
Copy link
Member

takagi commented Dec 8, 2020

pfnCI, test this please.

@takagi
Copy link
Member

takagi commented Dec 8, 2020

LGTM! Waiting for CI pass.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 6608b60, target branch master) succeeded!

@takagi takagi merged commit f2e6f83 into cupy:master Dec 8, 2020
@takagi
Copy link
Member

takagi commented Dec 8, 2020

Thanks!

@takagi takagi added this to the v9.0.0b1 milestone Dec 8, 2020
@grlee77 grlee77 deleted the scipy_entropy branch December 18, 2020 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants