v7.0.0
This is the release note of v7.0.0. See here for the complete list of solved issues and merged PRs.
This release note only covers the difference from v7.0.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:
See the Upgrade Guide if you are upgrading from previous versions. Also, note that we dropped the support of Python 2.7 and 3.4 from CuPy v7.
Highlights
- Added experimental support cuTENSOR 1.0.0. cuTENSOR is a library for high-performance tensor operations and is available for CUDA GPUs with compute capability of 70+. See cuTENSOR examples for the details.
Changes without compatibility
- Stopped raising some errors in CuPy linalg functions by default for performance improvement. We can keep NumPy compatibility by setting
cupyx.seterr(linalg=True), but it sometimes decrease performance because checking cuSOLVERdevInfoand cuBLASinfoArrayrequires device synchronization.
New Features
- Add
scipy.ffttocupyx(#2355, thanks @peterbell10!) - Support separate compilation in
RawKernel(#2426, thanks @leofang!) - Introduce
errstateconfiguration to control cuSOLVERdevInfoand cuBLASinfoArraychecks (#2437) - Inverse for Hermitian matrix (#2495)
- Introduce
errstateand related functions (#2535) - Implement
tobytesfor CuPy arrays (#2617, thanks @jakirkham!) - Add
fromfile(#2626, thanks @jakirkham!) - Add
using_allocatorcontext manager (#2627) - Add cuDNN new batch normalization (#2651)
- Support CUB reduction for F-contiguous arrays (#2682, thanks @leofang!)
- Support cuTENSOR 1.0.0 (#2709)
- Add
searchsorted(#2726)
Enhancements
- Support cuComplex.h in
cupy.RawKernelandcupy.RawModule(#2551, thanks @leofang and @grlee77!) - Reduce compile warnings (#2556)
- Normalize strides of cuDNN descriptors (#2564)
- Display versions of CUDA libraries (#2578)
- Improve ROCm error handling (#2639)
- Add support of complex dtypes for
sinc(#2646) - Support thrust with ROCm (#2666)
- Improve
ndattay.reduced_view(#2694) - Make
set_allocatorandget_allocatorsymmetric (#2707) - Remove
cupy.cupyx(#2722) - Add a few missing stubs for ROCm/HIP build (#2737, thanks @leofang!)
- Support Python 3.8 on Windows (#2738)
- Implement
ParameterInfo.__repr__(#2747)
Performance Improvements
- Refactor CUB to support an explicit
axisargument; Fix alignments for Thrust's complex types (#2562, thanks @leofang!) - Add CUB support for
argmax()andargmin()(#2596, thanks @leofang!) - Avoid
__init__function call overhead in memory allocation (#2671) - Avoid
withoverhead in memory allocation (#2672) - Avoid use of slow
numpy.find_common_type(#2683, thanks @grlee77!) - Cache
dtypeobject for speed in_scalar(#2684) - Cache
ElementwiseKernelobject (#2685) - Avoid
threading.local()object overhead (#2687) - Add
prod_sequenceto avoid creatingvector(#2689) - Remove memory allocation in
set_contiguous_strides(#2690) - Avoid
__init__call when creating CArray object (#2691) - Reduce memory allocation in improve
get_reduced_dims(#2692) - Improve
_reduce_dims(#2693) - Improve small issues (#2695)
- Improve
broadcast(#2696) - CUB-based CSR sparse matrix vector multiply (#2698, thanks @grlee77!)
- Avoid module level lookup in
_dtype(#2700) - Add
_ndarray_initto reduce ndarray creation cost (#2701)
Bug Fixes
- Cache
ElementwiseKernelkernel globally instead of per instance (#2474) - Use
!=instead ofis notfor literal (#2561, thanks @Dobatymo!) - Remove cuSPARSE APIs dropped in CUDA 10.1 Update 2 (#2573)
- Support 0-sized arrays for
linalg.qr(#2586, thanks @IvanYashchuk!) - Fix
__cuda_array_interface__data pointer for 0-size arrays (#2611, thanks @leofang!) - Fix ROCm build error (#2632)
- Fix bugs in CUB (#2636, thanks @leofang!)
- Avoid using
__align__in ROCm (#2638) - Remove stubs for APIs dropped in CUDA 10.1 Update 2 (#2641)
- Do not allow
reshapeon empty arrays (#2648) - Fix
pinvfor complex datatypes (#2657, thanks @YoujinShin!) - Fix
detandslogdeton singular inputs (#2660) - Handle tuple with value 0 and return empty array (#2662, thanks @quasiben!)
- Fix
AttributeErrorofstride_tricks(#2679)
Code Fixes
- Remove redundant definitions in
cupy_cufft.h(#2560, thanks @leofang!) - Type
dumpsreturn value asbytes(#2619, thanks @jakirkham!) - Remove
std::mapfor simple implementation (#2670) - Improve reduction core (#2697)
- Remove insignificant assertion (#2714)
- Avoid tricky initialization of block stride (#2729)
- Remove
cupy/internal.py(#2739, thanks @leofang!)
Documentation
- Add CUDA API runtime API list (#2557)
- Document more environment variables (#2593, thanks @leofang!)
- Update
CODE_OF_CONDUCTtypo (#2609) - Expand TOC to improve document index page (#2642)
- Fix document format of
as_strided(#2680) - Update requirements (#2756)
Installation
- Package tests in sdist (#2563, thanks @jakirkham!)
- Fix url to use the home page address (#2580)
- Add software description to
setup.py(#2582) - Import CUDA headers from CUDA 10.1 Update 2 (10.1.243) (#2592)
- Fix invaild requirements (#2630)
Examples
Tests
- Add CI configuration for ROCm (#2408)
- Add backward compatibility test for
__cuda_array_interface__(#2536, thanks @leofang!) - Include
.gitin ChainerCV compatibility CI (#2577) - Update
testing.parameterizeusing the latest version from Chainer (#2633, thanks @grlee77!) - Add FlexCI configurations (#2649)
- Add test for
get_c_contiguity(#2686) - Skip tests that segfault when using SciPy 1.3.x (#2712, thanks @grlee77!)
- Fix broken version specification in FlexCI dockerfiles (#2728)