Skip to content

v7.0.0

Choose a tag to compare

@asi1024 asi1024 released this 05 Dec 05:35
89a64be

This is the release note of v7.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v7.0.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

See the Upgrade Guide if you are upgrading from previous versions. Also, note that we dropped the support of Python 2.7 and 3.4 from CuPy v7.

Highlights

  • Added experimental support cuTENSOR 1.0.0. cuTENSOR is a library for high-performance tensor operations and is available for CUDA GPUs with compute capability of 70+. See cuTENSOR examples for the details.

Changes without compatibility

  • Stopped raising some errors in CuPy linalg functions by default for performance improvement. We can keep NumPy compatibility by setting cupyx.seterr(linalg=True), but it sometimes decrease performance because checking cuSOLVER devInfo and cuBLAS infoArray requires device synchronization.

New Features

  • Add scipy.fft to cupyx (#2355, thanks @peterbell10!)
  • Support separate compilation in RawKernel (#2426, thanks @leofang!)
  • Introduce errstate configuration to control cuSOLVER devInfo and cuBLAS infoArray checks (#2437)
  • Inverse for Hermitian matrix (#2495)
  • Introduce errstate and related functions (#2535)
  • Implement tobytes for CuPy arrays (#2617, thanks @jakirkham!)
  • Add fromfile (#2626, thanks @jakirkham!)
  • Add using_allocator context manager (#2627)
  • Add cuDNN new batch normalization (#2651)
  • Support CUB reduction for F-contiguous arrays (#2682, thanks @leofang!)
  • Support cuTENSOR 1.0.0 (#2709)
  • Add searchsorted (#2726)

Enhancements

  • Support cuComplex.h in cupy.RawKernel and cupy.RawModule (#2551, thanks @leofang and @grlee77!)
  • Reduce compile warnings (#2556)
  • Normalize strides of cuDNN descriptors (#2564)
  • Display versions of CUDA libraries (#2578)
  • Improve ROCm error handling (#2639)
  • Add support of complex dtypes for sinc (#2646)
  • Support thrust with ROCm (#2666)
  • Improve ndattay.reduced_view (#2694)
  • Make set_allocator and get_allocator symmetric (#2707)
  • Remove cupy.cupyx (#2722)
  • Add a few missing stubs for ROCm/HIP build (#2737, thanks @leofang!)
  • Support Python 3.8 on Windows (#2738)
  • Implement ParameterInfo.__repr__ (#2747)

Performance Improvements

  • Refactor CUB to support an explicit axis argument; Fix alignments for Thrust's complex types (#2562, thanks @leofang!)
  • Add CUB support for argmax() and argmin() (#2596, thanks @leofang!)
  • Avoid __init__ function call overhead in memory allocation (#2671)
  • Avoid with overhead in memory allocation (#2672)
  • Avoid use of slow numpy.find_common_type (#2683, thanks @grlee77!)
  • Cache dtype object for speed in _scalar (#2684)
  • Cache ElementwiseKernel object (#2685)
  • Avoid threading.local() object overhead (#2687)
  • Add prod_sequence to avoid creating vector (#2689)
  • Remove memory allocation in set_contiguous_strides (#2690)
  • Avoid __init__ call when creating CArray object (#2691)
  • Reduce memory allocation in improve get_reduced_dims (#2692)
  • Improve _reduce_dims (#2693)
  • Improve small issues (#2695)
  • Improve broadcast (#2696)
  • CUB-based CSR sparse matrix vector multiply (#2698, thanks @grlee77!)
  • Avoid module level lookup in _dtype (#2700)
  • Add _ndarray_init to reduce ndarray creation cost (#2701)

Bug Fixes

  • Cache ElementwiseKernel kernel globally instead of per instance (#2474)
  • Use != instead of is not for literal (#2561, thanks @Dobatymo!)
  • Remove cuSPARSE APIs dropped in CUDA 10.1 Update 2 (#2573)
  • Support 0-sized arrays for linalg.qr (#2586, thanks @IvanYashchuk!)
  • Fix __cuda_array_interface__ data pointer for 0-size arrays (#2611, thanks @leofang!)
  • Fix ROCm build error (#2632)
  • Fix bugs in CUB (#2636, thanks @leofang!)
  • Avoid using __align__ in ROCm (#2638)
  • Remove stubs for APIs dropped in CUDA 10.1 Update 2 (#2641)
  • Do not allow reshape on empty arrays (#2648)
  • Fix pinv for complex datatypes (#2657, thanks @YoujinShin!)
  • Fix det and slogdet on singular inputs (#2660)
  • Handle tuple with value 0 and return empty array (#2662, thanks @quasiben!)
  • Fix AttributeError of stride_tricks (#2679)

Code Fixes

  • Remove redundant definitions in cupy_cufft.h (#2560, thanks @leofang!)
  • Type dumps return value as bytes (#2619, thanks @jakirkham!)
  • Remove std::map for simple implementation (#2670)
  • Improve reduction core (#2697)
  • Remove insignificant assertion (#2714)
  • Avoid tricky initialization of block stride (#2729)
  • Remove cupy/internal.py (#2739, thanks @leofang!)

Documentation

  • Add CUDA API runtime API list (#2557)
  • Document more environment variables (#2593, thanks @leofang!)
  • Update CODE_OF_CONDUCT typo (#2609)
  • Expand TOC to improve document index page (#2642)
  • Fix document format of as_strided (#2680)
  • Update requirements (#2756)

Installation

  • Package tests in sdist (#2563, thanks @jakirkham!)
  • Fix url to use the home page address (#2580)
  • Add software description to setup.py (#2582)
  • Import CUDA headers from CUDA 10.1 Update 2 (10.1.243) (#2592)
  • Fix invaild requirements (#2630)

Examples

  • Show better performance improvement in examples/stream/map_reduce.py (#2588, thanks @leofang!)

Tests

  • Add CI configuration for ROCm (#2408)
  • Add backward compatibility test for __cuda_array_interface__ (#2536, thanks @leofang!)
  • Include .git in ChainerCV compatibility CI (#2577)
  • Update testing.parameterize using the latest version from Chainer (#2633, thanks @grlee77!)
  • Add FlexCI configurations (#2649)
  • Add test for get_c_contiguity (#2686)
  • Skip tests that segfault when using SciPy 1.3.x (#2712, thanks @grlee77!)
  • Fix broken version specification in FlexCI dockerfiles (#2728)