Skip to content
Compare
Choose a tag to compare

v7.0.0

@asi1024 asi1024 released this
· 10361 commits to master since this release
89a64be
Compare
Choose a tag to compare

This is the release note of v7.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v7.0.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

See the Upgrade Guide if you are upgrading from previous versions. Also, note that we dropped the support of Python 2.7 and 3.4 from CuPy v7.

Highlights

  • Added experimental support cuTENSOR 1.0.0. cuTENSOR is a library for high-performance tensor operations and is available for CUDA GPUs with compute capability of 70+. See cuTENSOR examples for the details.

Changes without compatibility

  • Stopped raising some errors in CuPy linalg functions by default for performance improvement. We can keep NumPy compatibility by setting cupyx.seterr(linalg=True), but it sometimes decrease performance because checking cuSOLVER devInfo and cuBLAS infoArray requires device synchronization.

New Features

  • Add scipy.fft to cupyx (#2355, thanks @peterbell10!)
  • Support separate compilation in RawKernel (#2426, thanks @leofang!)
  • Introduce errstate configuration to control cuSOLVER devInfo and cuBLAS infoArray checks (#2437)
  • Inverse for Hermitian matrix (#2495)
  • Introduce errstate and related functions (#2535)
  • Implement tobytes for CuPy arrays (#2617, thanks @jakirkham!)
  • Add fromfile (#2626, thanks @jakirkham!)
  • Add using_allocator context manager (#2627)
  • Add cuDNN new batch normalization (#2651)
  • Support CUB reduction for F-contiguous arrays (#2682, thanks @leofang!)
  • Support cuTENSOR 1.0.0 (#2709)
  • Add searchsorted (#2726)

Enhancements

  • Support cuComplex.h in cupy.RawKernel and cupy.RawModule (#2551, thanks @leofang and @grlee77!)
  • Reduce compile warnings (#2556)
  • Normalize strides of cuDNN descriptors (#2564)
  • Display versions of CUDA libraries (#2578)
  • Improve ROCm error handling (#2639)
  • Add support of complex dtypes for sinc (#2646)
  • Support thrust with ROCm (#2666)
  • Improve ndattay.reduced_view (#2694)
  • Make set_allocator and get_allocator symmetric (#2707)
  • Remove cupy.cupyx (#2722)
  • Add a few missing stubs for ROCm/HIP build (#2737, thanks @leofang!)
  • Support Python 3.8 on Windows (#2738)
  • Implement ParameterInfo.__repr__ (#2747)

Performance Improvements

  • Refactor CUB to support an explicit axis argument; Fix alignments for Thrust's complex types (#2562, thanks @leofang!)
  • Add CUB support for argmax() and argmin() (#2596, thanks @leofang!)
  • Avoid __init__ function call overhead in memory allocation (#2671)
  • Avoid with overhead in memory allocation (#2672)
  • Avoid use of slow numpy.find_common_type (#2683, thanks @grlee77!)
  • Cache dtype object for speed in _scalar (#2684)
  • Cache ElementwiseKernel object (#2685)
  • Avoid threading.local() object overhead (#2687)
  • Add prod_sequence to avoid creating vector (#2689)
  • Remove memory allocation in set_contiguous_strides (#2690)
  • Avoid __init__ call when creating CArray object (#2691)
  • Reduce memory allocation in improve get_reduced_dims (#2692)
  • Improve _reduce_dims (#2693)
  • Improve small issues (#2695)
  • Improve broadcast (#2696)
  • CUB-based CSR sparse matrix vector multiply (#2698, thanks @grlee77!)
  • Avoid module level lookup in _dtype (#2700)
  • Add _ndarray_init to reduce ndarray creation cost (#2701)

Bug Fixes

  • Cache ElementwiseKernel kernel globally instead of per instance (#2474)
  • Use != instead of is not for literal (#2561, thanks @Dobatymo!)
  • Remove cuSPARSE APIs dropped in CUDA 10.1 Update 2 (#2573)
  • Support 0-sized arrays for linalg.qr (#2586, thanks @IvanYashchuk!)
  • Fix __cuda_array_interface__ data pointer for 0-size arrays (#2611, thanks @leofang!)
  • Fix ROCm build error (#2632)
  • Fix bugs in CUB (#2636, thanks @leofang!)
  • Avoid using __align__ in ROCm (#2638)
  • Remove stubs for APIs dropped in CUDA 10.1 Update 2 (#2641)
  • Do not allow reshape on empty arrays (#2648)
  • Fix pinv for complex datatypes (#2657, thanks @YoujinShin!)
  • Fix det and slogdet on singular inputs (#2660)
  • Handle tuple with value 0 and return empty array (#2662, thanks @quasiben!)
  • Fix AttributeError of stride_tricks (#2679)

Code Fixes

  • Remove redundant definitions in cupy_cufft.h (#2560, thanks @leofang!)
  • Type dumps return value as bytes (#2619, thanks @jakirkham!)
  • Remove std::map for simple implementation (#2670)
  • Improve reduction core (#2697)
  • Remove insignificant assertion (#2714)
  • Avoid tricky initialization of block stride (#2729)
  • Remove cupy/internal.py (#2739, thanks @leofang!)

Documentation

  • Add CUDA API runtime API list (#2557)
  • Document more environment variables (#2593, thanks @leofang!)
  • Update CODE_OF_CONDUCT typo (#2609)
  • Expand TOC to improve document index page (#2642)
  • Fix document format of as_strided (#2680)
  • Update requirements (#2756)

Installation

  • Package tests in sdist (#2563, thanks @jakirkham!)
  • Fix url to use the home page address (#2580)
  • Add software description to setup.py (#2582)
  • Import CUDA headers from CUDA 10.1 Update 2 (10.1.243) (#2592)
  • Fix invaild requirements (#2630)

Examples

  • Show better performance improvement in examples/stream/map_reduce.py (#2588, thanks @leofang!)

Tests

  • Add CI configuration for ROCm (#2408)
  • Add backward compatibility test for __cuda_array_interface__ (#2536, thanks @leofang!)
  • Include .git in ChainerCV compatibility CI (#2577)
  • Update testing.parameterize using the latest version from Chainer (#2633, thanks @grlee77!)
  • Add FlexCI configurations (#2649)
  • Add test for get_c_contiguity (#2686)
  • Skip tests that segfault when using SciPy 1.3.x (#2712, thanks @grlee77!)
  • Fix broken version specification in FlexCI dockerfiles (#2728)