Skip to content
  • v8.0.0
  • 9685664
  • Compare
    Choose a tag to compare
    Search for a tag
  • v8.0.0
  • 9685664
  • Compare
    Choose a tag to compare
    Search for a tag

@kmaehashi kmaehashi released this Oct 1, 2020 · 65 commits to v8 since this release

Highlights

The CuPy v8.0.0 release includes a number of new features, as well as enhanced NumPy/SciPy functionality coverage.

  • TensorFloat-32 (TF32) Support

    • CuPy now supports TensorFloat-32, a new feature available in NVIDIA Ampere GPU and CUDA 11. Set CUPY_TF32=1 environment variable to boost the performance of matrix multiplications in routines such as cupy.matmul or cupy.tensordot.
  • Official support for NVIDIA cuTENSOR and CUB libraries

    • Several routines in CuPy now support using the cuTENSOR and CUB libraries to further improve performance. Set CUPY_ACCELERATORS=cub,cutensor environment variable to benefit from these libraries.
  • Enhanced kernel fusion

    • While combining multiple kernels into a single one using cupy.fuse, it was only possible to use a single reduction operation (cupy.sum, etc.) at the end. With the new kernel fusion mechanism available in CuPy v8, now it is possible to combine multiple element-wise operations with interleaved reductions.
  • Automatic tuning of kernel launch parameters

    • CuPy now supports discovering the optimal CUDA kernel launch parameters depending on the data and device properties for better performance. See the API reference (cupyx.optimizing.optimize) for details.
  • Memory pool sharing with external libraries

    • With the new PythonFunctionAllocator API, you can let CuPy use arbitrary Python functions instead of a built-in memory pool when managing GPU memory. This improves interoperability with external libraries; for example, you can flexibly use CuPy to preprocess data or use its custom CUDA kernel features inside PyTorch. With pytorch-pfn-extras bundled allocator it is possible to easily use the PyTorch memory pool from CuPy.
  • Improved NumPy/SciPy function coverage

    • Many functions added, including the NumPy Polynomials package (results of Google Summer of Code 2020, thanks @Dahlia-Chehata!), the SciPy image processing package, and extended support for the SciPy sparse matrices package.

For the list of all backward-incompatible changes in v8, please refer to the Upgrade Guide.

Notes on Wheel Packages

  • CuPy for CUDA 10.1 (cupy-cuda101), 10.2 (cupy-cuda102), and 11.0 (cupy-cuda110) packages are built with cuDNN v8 support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, You need to download cuDNN library using the following command: python -m cupyx.tools.install_library --library cudnn --cuda X.X. It is also possible to install cuDNN v8.0.x via the system package manager (e.g., apt install libcudnn8 or yum install libcudnn8) or manually install it and set LD_LIBRARY_PATH environment variables.

Changes since v8.0.0rc1

See here for the complete list of merged PRs after v8.0.0rc1 release. For all changes since v7 series, please refer to the release notes of the pre-releases (alpha1, beta1, beta2, beta3, beta4, beta5, rc1).

Highlights

  • Add a cache to reuse FFT plans that greatly improves CPU time. (thanks @leofang!)
  • Support for cuTENSOR 1.2 and acceleration of cupy.prod, cupy.max, cupy.min, cupy.ptp and cupy.mean by means of CUPY_ACCELERATORS
  • Sparse matrices support greatly improved with the addition of new operators and the possibility of setting items.

New Features

  • Support sparse matrix pointwise maximum and minimum (#3943)
  • Support sparse matrix pointwise division by vectors or matrices (#3964)
  • Add cupy.testing.shaped_sparse_random (#3976)
  • Add compressed sparse __setitem__ (#3998)
  • Add sparse.linalg.norm (#4040)
  • Add cuTENSOR 1.2 support (#3970)
  • Add a cuFFT plan cache (#4010)

Enhancements

  • Update FP16 header to CUDA 11.0 Update 1 (11.0.3) (#3986)
  • Bump cuDNN version to v8.0.3 (#3996)

Performance Improvements

  • Use _csr_row_index for CSR matrix major-axis slicing with step (#3898)
  • Improve CSR matrix column fancy indexing (#3960)
  • Improve cupyx.scipy.sparse int x int indexing (#4003)
  • Avoid using CUlinkState unless absolutely necessary (#4016)
  • Use cuTENSOR in cupy.prod, cupy.max, cupy.min, cupy.ptp and cupy.mean (#4046)

Bug Fixes

  • Fix dtype in CSR matrix division (#3924)
  • Fix _compressed_sparse_matrix._minor_slice for step > 1 case (#3952)
  • Fix csr_matrix._get_intXslice for step < 0 case (#3957)
  • Handle transfer to cupy view (#3962)
  • Fix sparse.__getitem__ not to return view of input (#3993)
  • Fix managed memory leak (#4032)
  • Use __dealloc__ instead of __del__ for cdef class (#4037)

Code Fixes

  • Rename cupyx.scatter submodule (#3921)
  • Hide private names in cupyx/scipy/__init__.py (#3923)
  • Rename submodule under cupyx.scipy.fftpack (#3926)
  • Refactor CSR sparse matrix row fancy indexing (#3930)
  • Rename cupyx.runtime submodule (#3937)
  • Rename cupy.util submodule to cupy._util (#3938)
  • Rename cupy.statistics submodule to cupy._statistics (#3939)
  • Rename submodule under cupy.prof package (#3940)
  • Hide private names in cupyx.time (#3990)
  • Hide private names in cupy.cusparse (#4005)
  • Rename cupy.math submodule to cupy._math (#4028)
  • Hide private names in cupy.cudnn (#4029)
  • Rename cupy.logic submodule to cupy._logic (#4030)
  • Hide private names in cupy/__init__.py (#4039)

Documentation

  • Add cupy.searchsorted to doc (#3925)
  • Update cupyx.scipy API documentation (#3997)

Tests

  • Fix test fail when cuDNN is unavailable (#3910)
  • Fix 32-bit boundary test to run on Windows (#3913)
  • Add v8 to list of known branch in FlexCI script (#3914)
  • Fix side effects in some tests (#3953)
  • Fix some test to check compatibility with SciPy's behavior (#3956)
  • Refactor sparse indexing tests (#3977)
  • Fix cupy.ndim test style (#4034)
  • Fix bugs and test suites to make ROCm/HIP happy - Part 1 (#3929)

Others

  • Disable GitHub checks annotations of Codecov (#4022)
  • Bump version to v8.0.0 (#4049)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @cjnolet @grlee77 @kalvdans @leofang @saswatpp

Assets 2

@kmaehashi kmaehashi released this Sep 11, 2020

Fixed the following errors when building v7.8.0 source published on PyPI:

  • RuntimeError: Missing file: cupy/cuda/cub.cpp (when CUB is configured via the environment variable or using CUDA 11.0)
  • RuntimeError: Missing file: cupy/cuda/cutensor.cpp (when cuTENSOR is configured via the environment variable)

This release is only for packaging fix; there is no code difference since v7.8.0.

Assets 2
Pre-release
Pre-release

@kmaehashi kmaehashi released this Aug 27, 2020 · 842 commits to master since this release

This is the release note of v8.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are planning to release the final v8.0.0 on October 1st. Please start testing your workload with this release. See the Upgrade Guide for the list of possible breaking changes.

Highlights

  • This release adds support for CUDA 11, NumPy 1.19, and SciPy 1.5.
  • Several performance improvements when using cuTENSOR, sparse matrices indexing, matrix multiplication with CUDA 11 using TF32.
  • Compatibility with numpy.poly is being increased thanks to our GSoC student @Dahlia-Chehata!
  • Added an interface (#3126) to support using external memory allocators such as the PyTorch one (pytorch/pytorch#33860).

Notes on Wheel Packages

  • Update on 2020-09-23: cupy-cuda110 package is now available on PyPI! CuPy for CUDA 11.0 (cupy-cuda110) wheel packages are currently available only for Windows. We are going to publish Linux wheels once we get approval from the PyPI team. (Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda110 -f https://github.com/cupy/cupy/releases/tag/v8.0.0rc1). Those wheels will be removed once we publish the package on PyPI.)
  • CuPy for CUDA 10.1 (cupy-cuda101), 10.2 (cupy-cuda102), and 11.0 (cupy-cuda110) packages are built with cuDNN v8 support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, You need to download cuDNN library using the following command: python -m cupyx.tools.install_library --library cudnn --cuda X.X.
    It is also possible to install cuDNN v8.0.x via the system package manager (e.g., apt install libcudnn8 or yum install libcudnn8) or manually install it and set LD_LIBRARY_PATH environment variables.

Changes without compatibility

Deprecate cupy.sparse package (#3839, #3856)

CuPy's sparse matrix support was initially implemented in the cupy.sparse package. It was moved to the cupyx.scipy.sparse namespace in CuPy v5, while keeping the cupy.sparse one for backward compatibility.
Since there is no equivalent package in NumPy, it was decided that it will be deprecated and
eventually removed.

Deprecate *_enabled flags under cupy.cuda (#3732)

Before it was possible to use cupy.cuda.nccl_enabled or similar to detect whether NCCL, cuTENSOR or other optional CUDA libraries are available to use. Now this pull-request introduced a per-module flag (cupy.cuda.nccl.available, cupy.cuda.cutensor.available) to obtain the same information.

Bump version in Docker images (#3733)

The current base Docker images have been updated from Ubuntu 16.04, CUDA 9.2, and Python 3.5 to Ubuntu 18.04, CUDA 10.2, and Python 3.6.

New Features

  • Add cupy.ndim (#3060)
  • Add PythonFunctionAllocator (#3126)
  • Compressed Sparse Inner Indexing (#3486)
  • Add cupy.polyadd (#3548)
  • Add cupy.polymul (#3590)
  • Add cupy.polysub (#3593)
  • Add most of scipy.linalg.special_matrices (#3641)
  • Add scipy.signal functions that are simple wrappers of ndimage functions (#3645)
  • Add cupyx.scipy.ndimage.fourier_shift, fourier_gaussian, fourier_uniform (#3654)
  • Add 2D Sparse Slicing (#3657)
  • Add 2D Sparse Slicing + Row Indexing (#3658)
  • Add 2D Sparse Slicing + Row & Column Indexing (#3659)
  • Add cupy.roots for Hermitian or symmetric matrix (#3703)
  • Add cupy.polyval (#3725)
  • Support __cuda_array_interface__ in cupy.poly1d (#3729)
  • Implement library preloading for wheels (#3731)
  • Add cupy.poly1d.__pow__ (#3734)
  • Add scipy.signal.convolve and correlate functions (#3748)
  • Add trimcoef (#3793)

Enhancements

  • Avoid disk I/O in compiler (#3164)
  • Add check for method in Randomstate seed (#3282)
  • Support negative axis in sparse min/max/argmin/argmax (#3497)
  • Mark nonzero parameters experimental in sparse min/max (#3583)
  • Add a compile method for RawKernel and RawModule (#3644)
  • Handle __cuda_array_interface__ in asnumpy (#3718)
  • Use cublasGemmEx in tensordot_core when CUDA11 (#3719)
  • Deprecate *_enabled flags under cupy.cuda (#3732)
  • Fix handle types to intptr_t (#3746)
  • Support TF32 (#3810)
  • Deprecate cupy.sparse package (#3839)
  • Add path and readonly options to cupyx.optimizing.optimize (#3845)
  • Adding a workaround for even-length inputs to scipy.signal.sepfir2d (#3750)
  • Add multi-axis support to cupy.flip (#3742)

Performance Improvements

  • Speed up cupy.vdot (#3678)
  • Improve cupy.cutensor (#3700)
  • More improvement of cupy.cutensor (#3744)
  • Improve 2D sparse row slicing (#3782)
  • Improve median_filter, rank_filter and percentile_filter (#3813)
  • Improve CSR matrix getrow, getcol and some slicing (#3851)

Bug Fixes

  • Fix float16 ndarray input in histogram with CUB (#3617)
  • Support order argument in cupy.ones, cupy.full and cupy.eye (#3655)
  • Work around a known CUB SpMV bug (#3679)
  • Fix broken message format (#3691)
  • Fix can_use_device_segmented_reduce() for incompatible axes (#3740)
  • Fix circular imports (#3743)
  • Skip FFT input checks for some CUDA >= 10.1 cases (#3763)
  • Fix CUDA 11 multi-GPU FFT bug (#3775)
  • Temporary fixes for cudnn v8 (#3790)
  • Fix cupy.correlate (#3801)
  • Copy input by default for C2R transform (#3848)
  • Fix cupy.sparse.* deprecation (#3856)
  • Fix cub not bundled in wheels (#3879)
  • Fix wheel not loading bundled cuDNN on Windows (#3880)
  • Add option to include wheel metadata (#3881)
  • Fix not to use cupy.cuda.* from CuPy codebase (#3883)

Code Fixes

  • Add cupy_backends/cuda/libs/cutensor.pxd (#3595)
  • Refactor _make_decorator in helper.py (#3697)
  • Refactor cupy.poly1d tests (#3704)
  • Remove unnecessary imports in cupy._sorting (#3706)
  • Rename cupy.binary submodule to cupy._binary (#3707)
  • Rename cupy.creation submodule to cupy._creation (#3708)
  • Rename cupy.functional submodule to cupy._functional (#3710)
  • Rename cupy.indexing submodule to cupy._indexing (#3711)
  • Remove unnecessary imports of cupy.linalg (#3714)
  • Rename cupy.misc submodule to cupy._misc (#3726)
  • Rename cupy.padding submodule to cupy._padding (#3727)
  • Rename submodules under cupy.random package (#3772)
  • Refactor logical routines from core.pyx (#3804)
  • Refactor binary-op routines from core.pyx (#3816)
  • Fix typo (#3850)
  • Resolve circular imports between cupy and cupyx.scipy (#3854)

Documentation

  • Correct format of docstrings in creation routines (#3752)
  • Update docs for v8 (#3802)
  • Fix a broken document (#3807)
  • Add cupy-cuda110 package to README (#3817)
  • Fix documents to reflect CUPY_ACCELERATORS (#3818)
  • Support Optuna v2 (install docs) (#3842)
  • Add upgrade guide for v8 (#3863)
  • Fix broken link in the installation guide (#3864)

Installation

  • Bump version in Docker images (#3733)
  • Update classifiers in setup.py (#3814)
  • Install SciPy and Optuna to Docker image (#3844)

Tests

  • Fix wrong test file name (#3722)
  • Fix test to run without NCCL (#3735)
  • Avoid mutation of os.environ (#3749)
  • Relax tolerance in TestArrayElementwiseOp::test_doubly_broadcasted_pow (#3758)
  • More on using unittest.mock (#3791)
  • Fix test to run without cuDNN (#3846)

Others

  • Bump version to v8.0.0rc1 (#3882)
  • Make nvrtc getPTX use bytes instead of unicode (#3237)
  • Add hiprtc support (#3238)
  • Fix build and import errors for ROCm (#3786)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse, @cjnolet, @coderforlife, @Dahlia-Chehata, @jakirkham, @leofang, @niteya-shah, @pentschev

Assets 6
  • v7.8.0
  • a08841a
  • Compare
    Choose a tag to compare
    Search for a tag
  • v7.8.0
  • a08841a
  • Compare
    Choose a tag to compare
    Search for a tag

@emcastillo emcastillo released this Aug 19, 2020 · 2 commits to v7 since this release

This is the release note of v7.8.0. See here for the complete list of solved issues and merged PRs.

Highlights

  • This release adds support for CUDA 11, NumPy 1.19, and SciPy 1.5.
  • We expect this version to be the final release for v7.x series. Please start testing your workloads with the latest v8.x pre-release.

Notes on CUDA 11.0 support

  • Update on 2020-09-23: cupy-cuda110 package is now available on PyPI! cupy-cuda110 wheel packages are currently available only for Windows. We are going to publish Linux wheels once we got approval from the PyPI team. (update on 2020-08-21: Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda110 -f https://github.com/cupy/cupy/releases/tag/v7.8.0). Those wheels will be removed once we published the package on PyPI.)
  • cupy-cuda110 packages are built with cuDNN support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, you need to install cuDNN v8.0.x via the system package manager (e.g, apt install libcudnn8 or yum install libcudnn8) or manually install it and set LD_LIBRARY_PATH (Linux) or PATH (Windows) environment variables.
  • When building CuPy from source with CUDA 11.0, g++-6 or later is required. See the installation guide for the detailed instructions.

New Features

  • Support CUDA 11.0 (#3720)
  • Support cuSPARSE generic API (#3721)

Enhancements

  • Update CUDA 11.0 FP16 header to production release version (11.0.2) (#3799)

Performance Improvements

  • Improve cuDNN performance when using deterministic mode (#3798)

Bug Fixes

  • Fix broken message format (#3698)
  • Support order argument in cupy.ones, cupy.full and cupy.eye (#3699, thanks @grlee77!)
  • Fix sparse matrix related test failures on CUDA11 (#3761)
  • Allow MatDescriptor to be pickle-able (#3771)
  • Skip FFT input checks for some CUDA >= 10.1 cases (#3792)
  • Add temporary fixes for cuDNN v8 (#3794)
  • Fix error message broken (#3800)
  • Fix cuSparse build failure on Windows (#3809)

Documentation

  • Fix format of docstrings in creation routines (#3767)
  • Update requirements (#3803)
  • Update install doc: source devtoolset needed in CentOS (#3806)

Tests

  • Fix wrong test file name (#3754)
  • Relax tolerance in TestArrayElementwiseOp::test_doubly_broadcasted_pow (#3762)
  • Skip tests failing due to exception type changes in NumPy 1.19 (#3787)
  • Avoid testing exception type match on NumPy 1.19 (#3797)
  • Skip TestDiaMatrixScipyComparison failing with scipy>=1.5.0 (#3805)

Others

  • Bump version to v7.8.0 (#3812)
Assets 6
Pre-release
  • v8.0.0b5
  • 258bbf8
  • Compare
    Choose a tag to compare
    Search for a tag
Pre-release
  • v8.0.0b5
  • 258bbf8
  • Compare
    Choose a tag to compare
    Search for a tag

@emcastillo emcastillo released this Jul 30, 2020 · 1449 commits to master since this release

This is the release note of v8.0.0b5. See here for the complete list of solved issues and merged PRs.

Highlights

CUB is now bundled with CuPy so that everyone can use it out-of-the-box (thanks @leofang!). This release also introduces a mechanism to enable acceleration using different libraries, CUPY_ACCELERATORS environment variable. You can enable CUB and cuTENSOR by setting export CUPY_ACCELERATORS=cub,cutensor.

The new features include an implementation of the SciPy ndimage filters contributed by @coderforlife and the introduction of the cupy_backends library, used to decouple the CUDA ecosystem APIs from CuPy itself.
Currently, cupy_backends is considered an undocumented API and it is subject to further refactoring. In the meantime, you can still continue to use cupy.cuda.* APIs.

Changes without compatibility

Supported Platform (#3670)

As announced previously, we dropped support for CUDA 8.0 and 9.1. We are also going to drop support for NumPy 1.15 and SciPy 1.2 or earlier in the upcoming release.

CUB (#2584, #3461, #3562)

CUB is now bundled in the source tree. As a consequence, gcc-6 or later is required for the CuPy v8 build. If you are building CuPy from source on systems with legacy gcc, follow the instructions below. These steps are not necessary for general users using wheel packages.

### Ubuntu 16
$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
$ sudo apt-get update
$ sudo apt-get install g++-6
$ export NVCC="nvcc --compiler-bindir gcc-6"

### CentOS 6 and 7:
$ sudo yum install centos-release-scl
$ sudo yum install devtoolset-7-gcc-c++
$ source /opt/rh/devtoolset-7/enable

CUB-related environment variables (CUB_PATH, CUB_DISABLED) are no longer effective. You need to enable CUB by setting CUPY_ACCELERATORS=cub environment variable to boost reduction kernels and several functions such as min, max, sum, and scan.

cuTENSOR (#3592)

In response to the introduction of CUPY_ACCELERATORS, you need to explicitly specify the option CUPY_ACCELERATORS=cutensor to enable cuTENSOR.

Others

  • Avoid early compilation when initializing a RawModule instance (#3534)
  • Remove CHAINER_SEED (#3674)
  • Remove sum_duplicate parameter in sparse min/max/argmin/argmax (#3676)

New Features

Enhancements

  • Build the cupy.cuda.cub module by default (#2584)
  • Expose cuda IPC runtime calls (#3290)
  • Merge CUPY_CUB_BLOCK_REDUCTION_DISABLED and CUB_DISABLED (#3461)
  • Support CUB histogram (#3473)
  • Support cuTENSOR 1.1 (#3477)
  • Added functionality to print nvcc and nvrtc output (#3485, thanks @mnicely!)
  • Support axis=None in sparse min/max (#3515)
  • Small fixes for CUB block reduction kernels (#3520)
  • Avoid early compilation when initializing a RawModule instance (#3534)
  • Improve _prepare_mask_indexing_single (#3539)
  • Support batched slogdet with complex numbers (#3551, thanks @yoshipon!)
  • Fix hip header files (#3566)
  • Remove compute_30 when CUDA 11 (#3578)
  • Change einsum not to use cuTENSOR when accelerator is not set (#3592)
  • Update CUDA 11.0 FP16 header to production release version (11.0.2) (#3668)
  • Drop support for CUDA 8.0 and 9.1 (#3670)
  • Remove CHAINER_SEED (#3674)

Performance Improvements

  • Use cuTENSOR in cupy.sum (#2939)
  • Reduce numpy.ndarray creation in cuTENSOR operation preparation (#3393)
  • Improve scan operation (#3540)
  • Improve _ArgInfo init (#3549)
  • Fix small performance issue (#3550)
  • Improve _fft_convolve (#3560)
  • Reduce device synchronization in poly1d instantiation (#3563, thanks @Dahlia-Chehata!)
  • Reuse FFT plan for convolve/correlate (#3587)
  • Improve efficiency of cupy.fft.fftfreq and cupy.fft.rfftfreq (#3653, thanks @grlee77!)

Bug Fixes

  • Fix cupyx.scipy.ndimage.sum taking zero-dimensional input (#3425)
  • Use CUSPARSE_VERSION instead of CUDA_VERSION (#3491)
  • Fix sparse min/max to return sparse matrix (#3536)
  • Fix boolean indexing (#3538)
  • Support 0-size ndarray and fix possible error in __del__ at fft (#3543)
  • Fix cupy.percentile type assignment in asarray (#3570)
  • Fix array creation for ndarray list of arrays of different dtypes (#3605)
  • Change sorting order of COO sparse matrix for cuSPARSE (#3620)
  • Add __name__ to custom kernels (#3626)
  • Fix sparse argmin/argmax return shape (#3639)
  • Fix missing imports and cupy.show_config (#3642)
  • Fix sparse matrix related test failures on CUDA 11 (#3649)
  • Fix error message broken (#3669)
  • Remove sum_duplicate parameter in sparse min/max/argmin/argmax (#3676)
  • Fix broken imports for cupy.cuda.* (#3685)
  • Fix Windows build failure of cuSparse generic API (#3690)
  • Fix compile option on HIP environment (#3604)

Code Fixes

  • Use .data() for std::vector (#3022)
  • Add short comments for the internals (#3475)
  • Use absolute import (#3496)
  • Make type dispatcher from cupy.cuda.cub reusable (#3546)
  • Clean up CUB-related stuff (#3562)
  • Suppress compile warnings (#3573)
  • Remove unused descriptor definition (#3594)

Documentation

  • Add sample code for image resizing (#3559, thanks @pmixer!)
  • Update documentation of CUPY_ACCELERATORS (#3596)
  • Update url and email (#3608)
  • Add a warning for sum_duplicates (#3624)
  • Remove Chainer related docs (#3673)

Installation

  • Add missing cupy_cub.cu in package data (#3572)
  • Fix rpath for wheel build (#3689)

Tests

  • Test against scipy.fft when available (#3032)
  • Add tests for _cub_reduction (#3462)
  • Add mock tests to ensure cupy.cuda.cub is used (#3467)
  • Fix to set testing.slow correctly (#3501)
  • Check NumPy compatibility in flatiter tests (#3514)
  • Fix slogdet tests to check dtypes of return values (#3577)
  • Fix negative value test in test_helper (#3579)
  • Deprecate numpy_cupy_array_list_equal (#3582)
  • Use numpy_cupy_array_equal instead of numpy_cupy_array_list_equal (#3599)
  • Checks return types in testing.numpy_cupy_* (#3621)
  • Add tests for sparse max with axis=None (#3638)
  • Parameterize sparse min/max/argmin/argmax tests (#3656)
  • Expose accelerator internal API to one level up (#3664)

Others

  • Fix to raise ValueError for invalid order (#3498)
  • Fix to raise ValueError for invalid clipmode (#3499)
  • Fix to raise TypeError for invalid subscripts in einsum (#3502)
  • Use builtins directly (#3651, thanks @larsoner!)
  • Add link to Twitter account (#3529)
  • Update style checker version for Python 3.7 (#3585)
  • Bump version to v8.0.0b5 (#3687)
Assets 2
  • v7.7.0
  • 2a20cc6
  • Compare
    Choose a tag to compare
    Search for a tag
  • v7.7.0
  • 2a20cc6
  • Compare
    Choose a tag to compare
    Search for a tag

@emcastillo emcastillo released this Jul 30, 2020 · 50 commits to v7 since this release

This is the release note of v7.7.0. See here for the complete list of solved issues and merged PRs.

Enhancements

  • Support cusparse<t>csrgeam2 and cusparse<t>csrgemm2 (#3666)

Bug Fixes

  • Fix for cupy.cuda.thrust (#3422)
  • Fix sorting order of COO sparse matrix for cuSPARSE (#3623)
  • Fix array creation for ndarray list of arrays of different dtypes (#3663)

Code Fixes

  • Suppress compile warnings (#3580)

Documentation

  • Update url and email (#3635)
  • Add a warning for sum_duplicates (#3636)
  • Update Installation Guide (#3660)

Tests

  • Fix negative value test in test_helper (#3622)
  • Skip csc and erf tests for scipy>1.2 (#3628)

Others

  • Update style checker version for Python 3.7 (#3589)
  • Add link to Twitter account (#3634)
  • Bump version to v7.7.0 (#3688)
  • Use builtins directly (#3667, thanks @larsoner!)
Assets 2
Pre-release
  • v8.0.0b4
  • 0347516
  • Compare
    Choose a tag to compare
    Search for a tag
Pre-release
  • v8.0.0b4
  • 0347516
  • Compare
    Choose a tag to compare
    Search for a tag

@asi1024 asi1024 released this Jun 25, 2020 · 2058 commits to master since this release

This is the release note of v8.0.0b4. See here for the complete list of solved issues and merged PRs.

Highlights

CuPy v8.0.0b4 focuses on performance improvements by adding a general CUB based reduction kernel contributed by @leofang (#3244). We also introduce support for the upcoming CUDA 11 (#3405) although we don’t provide wheels for it yet. Last but not least, several new routines are added to improve the NumPy and SciPy functions coverage.

Changes without compatibility

Change the behavior of dia_matrix.diagonal to follow SciPy 1.5.0 specification. It does not raise ValueError for invalid values anymore. Now an empty array is returned instead. (#3469)

New Features

Enhancements

  • Refactor cuTENSOR handle initialization (#2772)
  • Deprecate testing.numpy_cupy_raises (#3098)
  • Align vector access with #3020 #3022 (#3228)
  • Get arch per device and support CUDA 9.2+ (#3366, thanks @leofang!)
  • Fix cuTENSOR routines to raise ValueError for invalid arguments (#3374)
  • Support ignore_error in kernel optimization (#3410)
  • Support boolean in cupyx.scipy.ndimage stats functions (#3419)
  • Raise TypeError in cupy.ndarray.__array__ (#3421)
  • Make Optuna optional to allow import (#3427)
  • Implement flatiter.copy() (#3442)

Performance Improvements

  • Speed up CSR SpMV by orders of magnitude (#3430, thanks @leofang!)
  • Index CArray using 32-bit indexes (#3448)

Bug Fixes

  • Assert that all the pointers are in the same device in concatenate (#3285)
  • Fix _count_non_nan datatype for windows (#3350)
  • Fix cupyx.time.repeat to accumulate duration after GPU synchronization (#3375)
  • Fix PerfCaseResult changing _ts (#3400)
  • Fix intermediate dtypes for float16 inputs in cupyx.scipy.ndimage stats functions (#3402)
  • Properly reset current stream in case null stream is destroyed (#3423)
  • Fix cupy.power(0j, 0j) (#3449)
  • Fix TypeError in parameterize test catching CUDADriverError (#3451)
  • Fix scipy.dia_matrix.diagonal for scipy==1.5.0 (#3469)

Code Fixes

  • Fix array() for readability (#2935)
  • Remove unnecessary comparison in cupy.linalg.svd (#3373)
  • Fix initial values in cupy._environment (#3413, thanks @leofang!)
  • Use find_packages in setup.py (#3424)
  • Refactor CUB-backed _SimpleReductionKernel (#3443)

Documentation

  • Add documentation for cupyx.optimizing.optimize (#3397)
  • Fix sphinx version for travis (#3416)
  • Document cupy.fromfile (#3439, thanks @jakirkham!)
  • Fix typos in cupy.linalg.det docstring (#3456, thanks @grlee77!)
  • Fix docstring of tofile() (#3460, thanks @leofang!)

Installation

  • Add optuna and remove theano for doctest requirement (#3446)

Tests

  • Add tests for cupy.cuda.cub (#2598, thanks @leofang!)
  • Remove chainercv CI configs (#3055)
  • Add a test to cover accepting large-size arrays via __cuda_array_interface__ (#3297, thanks @leofang!)
  • Add __init__.py to allow importing test packages (#3395)
  • Fix ChainerCV tests failing in master branch (#3411)
  • Test CUB SpMV (#3428, thanks @leofang!)
  • Deprecate testing.empty (#3438)
  • Skip some RawModule tests for wrong condition (#3453)
  • Use unittest.mock (#3468)

Others

  • Bump version to v8.0.0b4 (#3481)
Assets 2
  • v7.6.0
  • 6463bdc
  • Compare
    Choose a tag to compare
    Search for a tag
  • v7.6.0
  • 6463bdc
  • Compare
    Choose a tag to compare
    Search for a tag

@emcastillo emcastillo released this Jun 25, 2020 · 81 commits to v7 since this release

This is the release note of v7.6.0. See here for the complete list of solved issues and merged PRs.

New Features

  • Support all dtypes in every sorting function in cupy.cuda.thrust (#3415, thanks @leofang!)

Enhancements

  • Get arch per device and support CUDA 9.2+ (#3396, thanks @leofang!)

Bug Fixes

  • Fix _count_non_nan datatype for windows (#3391)
  • Properly reset current stream in case null stream is destroyed (#3437)
  • Fix TypeError in parameterize test catching CUDADriverError (#3459)
  • Assert that all the pointers are in the same device in concatenate (#3472)

Code Fixes

  • Use find_packages in setup.py (#3436)

Documentation

Installation

  • Remove theano for doctest requirement (#3463)

Tests

  • Add __init__.py to allow importing test packages (#3409)

Others

  • Bump version to v7.6.0 (#3480)
Assets 2
Pre-release
  • v8.0.0b3
  • 57235e7
  • Compare
    Choose a tag to compare
    Search for a tag
Pre-release
  • v8.0.0b3
  • 57235e7
  • Compare
    Choose a tag to compare
    Search for a tag

@kmaehashi kmaehashi released this May 29, 2020 · 2454 commits to master since this release

This is the release note of v8.0.0b3. See here for the complete list of solved issues and merged PRs.

As announced in the previous release, we are dropping support for CUDA 8.0 / 9.1 in v8 releases (#3301). Based on the feedback from users, we will continue to provide cuDNN support (#3303).

Highlights

CuPy v8.0.0b3 introduces a mechanism for optimizing internal parameters when launching reduction kernels using Optuna. Depending on your GPU and the kernels you execute, you can take advantage of this feature and improve the performance of your codes by letting Optuna to automatically find the best parameters for your GPU.
To take advantage of this, call functions that perform reductions with the following:

with cupyx.optimizing.optimize(key=None):
    # cupy reduction function
    y = cupy.sum(x)

CuPy is also taking part in GSoC 2020 and we keep adding new functions to improve our compatibility with NumPy.

New Features

  • Optimize kernel launch parameters using Optuna (#2731)
  • Support cuSPARSE generic API (#3242)
  • Implement flatiter.base property (#3250)
  • Implement flatiter.__len__() special method (#3251)
  • Implement flatiter.__next__() special method (#3252)
  • Implement putmask function (#3261, thanks @rushabh-v!)
  • Show versions of CUB and cuTENSOR on cupy.show_config (#3271)
  • Enable getting R2C/C2R FFT plans from get_fft_plan() (#3293, thanks @leofang!)
  • Support surface memory in RawKernel (#3294, thanks @leofang!)
  • Add cupy.bartlett (#3307, thanks @niteya-shah!)
  • Add mean for sparse matrices (#3333)
  • Support max_duration argument in cupyx.time.repeat (#3357)
  • Support OptimizeContext serialization (#3367)

Enhancements

  • Support primitive complex scalar in RawKernel (#2606)
  • Fix the internal streams in multi-GPU Plan1d (#3260, thanks @leofang!)
  • Support additional dtypes and axis sequences in cupy.median (#3280, thanks @grlee77!)
  • Support multiple architectures in CUPY_NVCC_GENERATE_CODE (#3330, thanks @leofang!)
  • Fix too small max_total_time_per_trial (#3365)

Performance Improvements

  • Rewrite cupyx.scipy.ndimage.interpolation using ElementwiseKernel (#3166, thanks @grlee77!)
  • Improve ElementwiseKernel cpu time (#3298)
  • Performance improvements to blackman, hanning and hamming methods (#3312, thanks @niteya-shah!)
  • Use local cache in cupy.RawKernel (#3341, thanks @leofang!)
  • Reduce memory usage of cupy.linalg.svd (#3347)

Bug Fixes

  • Fix SciPy version check in cupyx.scipy.fft (#3311, thanks @grlee77!)
  • Ensure runtime context on a per-device basis (#3321, thanks @leofang!)
  • Fix put when using scalars (#3328)
  • Assign a work space to ormqr functions in _solve (#3331)
  • Fix linalg.svd for 0-sized matrices (#3354)
  • Fix wrong parameter names in kernel launch optimizers (#3364)
  • cupy.around behaves differently from NumPy for EVEN_NUMBER+0.5 (#3335)

Code Fixes

  • Add alias of shape type (#3310)
  • Use shape_t instead of tuple (#3315)

Documentation

  • Add PFN to the README (#3276)
  • Remove upper restrictions for numpy and scipy in doc build (#3337)

Tests

  • Add tests for optimizer for kernel launch parameters (#3363)

Others

  • Bump version to v8.0.0b3 (#3376)
Assets 2
  • v7.5.0
  • 4884918
  • Compare
    Choose a tag to compare
    Search for a tag
  • v7.5.0
  • 4884918
  • Compare
    Choose a tag to compare
    Search for a tag

@kmaehashi kmaehashi released this May 29, 2020 · 112 commits to v7 since this release

This is the release note of v7.5.0. See here for the complete list of solved issues and merged PRs.

Enhancements

  • Show versions of CUB and cuTENSOR on cupy.show_config (#3353)
  • Support sorting complex arrays (#3336, thanks @leofang!)

Bug Fixes

  • Fix byte buffer handling to support PyPy (#3227)
  • Fix put when using scalars (#3332)
  • Remove some xfails in sorting tests (#3345)
  • Fix linalg.svd for 0-sized matrices (#3355)
  • Assign a workpace to ormqr functions in _solve (#3356)
  • Fix windows build issue with CUDA 8.0 (#3379)

Documentation

  • Remove upper restrictions for numpy and scipy in doc build (#3338)
  • Add PFN to the README (#3352)

Others

  • Bump version to v7.5.0 (#3377)
Assets 2
You can’t perform that action at this time.