add cupyx.scipy.stats.entropy #4369

grlee77 · 2020-11-30T21:24:50Z

This PR adds the scipy.stats function, entropy. This implementation relies on the already existing cupyx.scipy.special.entr and cupyx.scipy.special.rel_entr functions and so should be simple to review.

There is not currently a cupyx.scipy.stats module, so this PR has added it. However, I do not currently have plans to implement other functions within the stats module. This function is used within a scikit-image function we created a CUDA equivalent for in cupyimg. If there is not an interest in having stats functions within CuPy itself, we can just keep this implementation there.

Benchmark with pk input only

shape	GPU time (ms)	CPU time (ms)	acceleration
(256,)	0.045 +/- 0.002	0.019 +/- 0.001	0.420
(512, 512)	0.117 +/- 0.002	5.334 +/- 0.067	45.745
(2048, 4096)	3.266 +/- 0.213	181.113 +/- 1.445	55.447
(192, 192, 192)	1.597 +/- 0.153	162.830 +/- 1.174	101.973

Benchmark with pk and qk inputs

shape	GPU time (ms)	CPU time (ms)	acceleration
(256,)	0.075 +/- 0.009	0.022 +/- 0.003	0.295
(512, 512)	0.162 +/- 0.003	2.807 +/- 0.100	17.349
(2048, 4096)	4.317 +/- 0.266	100.531 +/- 1.208	23.288
(192, 192, 192)	1.985 +/- 0.002	96.379 +/- 0.544	48.560

benchmark script

import numpy
import scipy.stats

import cupy
import cupyx.scipy.stats
from cupyx.time import repeat

print("## Benchmark with pk input only")
print("shape | GPU time (ms) | CPU time (ms) | acceleration")
print("------|---------------|---------------|-------------")
for shape in [(256,), (512, 512), (2048, 4096), (192, 192, 192)]:
pg = cupy.testing.shaped_random(shape, dtype=numpy.float32)
perf_gpu = repeat(cupyx.scipy.stats.entropy, (pg,), n_warmup=20, n_repeat=100)
gpu_times = perf_gpu.gpu_times * 1000

p = cupy.asnumpy(pg)
perf_cpu = repeat(scipy.stats.entropy, (p,), n_warmup=1, n_repeat=20)
cpu_times = perf_cpu.gpu_times * 1000
accel = cpu_times.mean() / gpu_times.mean()
print(f"{shape} | {gpu_times.mean():0.3f} +/- {gpu_times.std():0.3f} | {cpu_times.mean():0.3f} +/- {cpu_times.std():0.3f} | {accel:0.3f}")

print("## Benchmark with pk and qk inputs")
print("shape | GPU time (ms) | CPU time (ms) | acceleration")
print("------|---------------|---------------|-------------")
for shape in [(256,), (512, 512), (2048, 4096), (192, 192, 192)]:
pg = cupy.testing.shaped_random(shape, dtype=numpy.float32)
qg = cupy.testing.shaped_random(shape, dtype=numpy.float32)
perf_gpu = repeat(cupyx.scipy.stats.entropy, (pg, qg), n_warmup=20, n_repeat=100)
gpu_times = perf_gpu.gpu_times * 1000

p = cupy.asnumpy(pg)
q = cupy.asnumpy(qg)
perf_cpu = repeat(scipy.stats.entropy, (p, q), n_warmup=1, n_repeat=20)
cpu_times = perf_cpu.gpu_times * 1000
accel = cpu_times.mean() / gpu_times.mean()
print(f"{shape} | {gpu_times.mean():0.3f} +/- {gpu_times.std():0.3f} | {cpu_times.mean():0.3f} +/- {cpu_times.std():0.3f} | {accel:0.3f}")

grlee77 · 2020-11-30T21:27:28Z

Let me know if the basic tests in TestEntropyBasic should be removed. These are just copies of simple test casess from SciPy's test suite. The TestEntropy class I added is in a more typical CuPy style where direct comparison to SciPy outputs is done.

takagi · 2020-12-04T02:56:11Z

pfnCI, test this please.

takagi · 2020-12-04T03:06:04Z

Thanks for this PR! Yes, we'd like to have cupyx.scipy.stats 👍

takagi · 2020-12-04T03:32:55Z

Would you change the submodule name cupyx.scipy.stats.distribution to cupyx.scipy.stats._distribution so we indicate it is not intended to expose as public one?

takagi · 2020-12-04T03:39:22Z

Also nice to have TestEntropyBasic!

takagi · 2020-12-04T03:47:04Z

tests/cupyx_tests/scipy_tests/stats_tests/test_distributions.py

+                             self.base, self.axis, self.normalize)
+
+    @testing.numpy_cupy_allclose(atol=1e-3, rtol=5e-3, scipy_name='scp')
+    def test_entropy_float16(self, xp, scp):


Now testing.numpy_cupy_allclose accepts per-dtype tolerance #4269. Would you try that?

That looks like a nice feature, but I have problems with it in this case. I removing the float16 test case and using a rtol dictionary on the other one like:

@testing.numpy_cupy_allclose(rtol={cupy.float16: 1e-3, cupy.float32: 1e-6, 'default': 1e-12}, scipy_name='scp')

However the cupyx.scipy.special functions used are in float32 and float64 only, so float16 input -> float32 output and tol=1e-6 will get used. This indicates that I should probably upcast to float32 earlier in the function so that the normalization computations using cupy.sum() are of equivalent accuracy.

Also, the input with a larger size axis of (128,) does not pass at 1e-12 for double precision. This might be due to lower accuracy in sum in CuPy vs. NumPy (in some cases, numpy uses some fancier pairwise summation to reduce accumulated errors in sums)

Okay. I also checked that scipy.stats.entropy returns float32 value for float16 input.

grlee77 · 2020-12-04T04:28:36Z

I also noticed that I have cupy.asarray calls on qk/pk, but we probably don't want these to avoid unintentional host/device transfer. I will remove them.

cast input to at least float32

chainer-ci · 2020-12-04T04:36:27Z

Jenkins CI test (for commit f7836e7, target branch master) succeeded!

takagi · 2020-12-04T08:49:18Z

pfnCI, test this please.

chainer-ci · 2020-12-04T09:55:00Z

Jenkins CI test (for commit cec146d, target branch master) succeeded!

tests/cupyx_tests/scipy_tests/stats_tests/test_distributions.py

cupyx/scipy/stats/distributions.py

remove unused numpy import

takagi · 2020-12-08T04:40:08Z

pfnCI, test this please.

takagi · 2020-12-08T04:40:25Z

LGTM! Waiting for CI pass.

chainer-ci · 2020-12-08T06:16:31Z

Jenkins CI test (for commit 6608b60, target branch master) succeeded!

takagi · 2020-12-08T06:47:03Z

Thanks!

grlee77 added 2 commits November 30, 2020 16:17

ENH: add cupyx.scipy.stats.entropy

d24b0ca

DOC: add API docs for the cupyx.scipy.stats module

f7836e7

kmaehashi assigned takagi Dec 1, 2020

kmaehashi added cat:feature New features/APIs prio:medium labels Dec 1, 2020

takagi reviewed Dec 4, 2020

View reviewed changes

remove cp.asarray() calls on input

6f7bb98

cast input to at least float32

more careful dtype handling and per-dtype tolerance

80d2dc0

takagi reviewed Dec 4, 2020

View reviewed changes

tests/cupyx_tests/scipy_tests/stats_tests/test_distributions.py Outdated Show resolved Hide resolved

takagi reviewed Dec 4, 2020

View reviewed changes

cupyx/scipy/stats/distributions.py Outdated Show resolved Hide resolved

DOC: double backticks in docstring

322906d

remove unused numpy import

grlee77 force-pushed the scipy_entropy branch from e902b02 to 322906d Compare December 4, 2020 19:39

test float16

6608b60

takagi merged commit f2e6f83 into cupy:master Dec 8, 2020

takagi mentioned this pull request Dec 8, 2020

Slight fix on entropy test #4424

Closed

takagi added this to the v9.0.0b1 milestone Dec 8, 2020

grlee77 deleted the scipy_entropy branch December 18, 2020 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add cupyx.scipy.stats.entropy #4369

add cupyx.scipy.stats.entropy #4369

grlee77 commented Nov 30, 2020

grlee77 commented Nov 30, 2020

takagi commented Dec 4, 2020

takagi commented Dec 4, 2020

takagi commented Dec 4, 2020

takagi commented Dec 4, 2020

takagi Dec 4, 2020

grlee77 Dec 4, 2020

grlee77 Dec 4, 2020

takagi Dec 4, 2020

grlee77 commented Dec 4, 2020

chainer-ci commented Dec 4, 2020

takagi commented Dec 4, 2020

chainer-ci commented Dec 4, 2020

takagi commented Dec 8, 2020

takagi commented Dec 8, 2020

chainer-ci commented Dec 8, 2020

takagi commented Dec 8, 2020

add cupyx.scipy.stats.entropy #4369

add cupyx.scipy.stats.entropy #4369

Conversation

grlee77 commented Nov 30, 2020

Benchmark with pk input only

Benchmark with pk and qk inputs

grlee77 commented Nov 30, 2020

takagi commented Dec 4, 2020

takagi commented Dec 4, 2020

takagi commented Dec 4, 2020

takagi commented Dec 4, 2020

takagi Dec 4, 2020

Choose a reason for hiding this comment

grlee77 Dec 4, 2020

Choose a reason for hiding this comment

grlee77 Dec 4, 2020

Choose a reason for hiding this comment

takagi Dec 4, 2020

Choose a reason for hiding this comment

grlee77 commented Dec 4, 2020

chainer-ci commented Dec 4, 2020

takagi commented Dec 4, 2020

chainer-ci commented Dec 4, 2020

takagi commented Dec 8, 2020

takagi commented Dec 8, 2020

chainer-ci commented Dec 8, 2020

takagi commented Dec 8, 2020