@kmaehashi kmaehashi released this Sep 4, 2018

Assets 2

This is the release note of v4.4.1. See here for the complete list of solved issues and merged PRs.

This is a hot-fix release for v4.4.0 to address the issue reported in #1579 (thanks @BobLiu20 for reporting this!). Users calling CuPy functions on non-main threads may have been affected by this issue.

Bug Fixes

  • Fix cupy not working in thread other than one imported cupy (#1591)

Tests

  • Add test for thread use case of fusion (#1596)
Pre-release
Pre-release

@beam2d beam2d released this Aug 23, 2018 · 268 commits to master since this release

Assets 2

This is the release note of v5.0.0b4. See here for the complete list of solved issues and merged PRs.

Highlights

  • CuPy starts supporting __cuda_array_interface__, which is the CUDA array interchange interface compatible with Numba>=0.39.0. It means you can now pass CuPy arrays to kernels JITed with Numba. The folowing is a simple example code borrowed from numba/numba#2860:
import cupy
from numba import cuda

@cuda.jit
def add(x, y, out):
    start = cuda.grid(1)
    stride = cuda.gridsize(1)
    for i in range(start, x.shape[0], stride):
        out[i] = x[i] + y[i]

a = cupy.arange(10)
b = a * 2
out = cupy.zeros_like(a)

print(out)  # => [0 0 0 0 0 0 0 0 0 0]

add[1, 32](a, b, out)

print(out)  # => [ 0  3  6  9 12 15 18 21 24 27]
  • Improved performance.
  • Implemented cumsum and cumprod to ndarray.
  • Implemented cupy.allclose
  • Enhanced cuDNN RNN functionality including FP16 support.

New Features

  • Implement __cuda_array_interface__ (#1144, thanks @seibert!)
  • Support FP16 and FP64 in cuDNN RNN related functions (#1471)
  • Implement <t>tpttr and <t>trttp of cuBLAS (#1492)
  • Add cumsum and cumprod to ndarray (#1500)
  • Add cupyx.scipy.get_array_module (#1513)
  • Implement cupy.allclose (#1522, thanks @tsurumeso!)
  • Add mem_info to Device (#1538, thanks @larsoner!)

Enhancements

  • Avoid keeping Device object in Memory and MemoryPointer (#946)
  • Speed up ElementwiseKernel launch (#1318)
  • Improve memory allocation performance (#1343)
  • Fix styles for latest autopep8 (#1352)
  • Use CScalar in elementwise and reduction (#1447)
  • Define ndarray.__iter__ (#1449)
  • Move cupy.sparse to cupyx.scipy.sparse (#1451)
  • Support negative indices in array_split (#1454)
  • Avoid collections.sequence (#1456)
  • Avoid variable name l to follow pep8 (#1460)
  • Simplify nonzero function (#1487)
  • Use TensorCore for matmul with fp32 matrixes (#1493)
  • Support nonzero for complex types (#1501)
  • Change type checking rules of Fusion ufuncs (#1507)
  • Fix minor issues on coding style (#1509)
  • Fix errors in NumPy 1.15 (#1514)
  • Use collections.abc to avoid DeprecationWarning in Python 3.7 (#1515)
  • Support loop_prep in ufunc (#1537)
  • Add _has_memory_hooks to avoid thread local dictionary operation (#1540)
  • Add specialized CUDA kernel for fill function (#1541)
  • Improve ndarray creation performance (#1542)
  • Use xorshift128 to reduce global memory access (#1546)
  • Add cdef and cpdef for better cythonize core.pyx (#1548)

Bug Fixes

  • Fix errors on 0-sized inputs (#1459)
  • Use cython.no_gc to avoid memory leak (#1463)
  • Fix cupy.random.dirichlet to behave same as numpy.random.dirichlet (#1468)
  • Fix indexing behavior when input is zero-sized array (#1503)
  • Fix cupy.real and cupy.imag (#1504)
  • Avoid compile error in old GCC (CentOS 6) (#1506)
  • Fix thrust memory allocation problem (#1511)
  • Fix dtype order of create_comparison (#1551)
  • Raise error when trying to broadcast out_params in ElementwiseKernel (#1552)

Documentation

  • Add upgrade guide for cupyx namespaces (#1467)
  • Fix docstring about free_all_free (#1519)
  • Update agnostic code tutorial (#1521, thanks @w-m!)

Tests

  • Fix sparse test (#1452)
  • Avoid hacking: Use the same test settings as Chainer (#1477)

@kmaehashi kmaehashi released this Aug 23, 2018 · 16 commits to v4 since this release

Assets 2

This is the release note of v4.4.0. See here for the complete list of solved issues and merged PRs.

New Features

  • Add divmod function to cupy namespace (#1480)
  • Allow more natural fusion notation (#1481)

Enhancements

  • Avoid collections.sequence (#1472)
  • Support negative indices in array_split (#1475)
  • Avoid variable name l to follow pep8 (#1479)
  • Reduce Python function call in ElementwiseKernel (#1482)
  • Speed up ElementwiseKernel launch (#1488)
  • Improve memory allocation performance (#1490)
  • Fix type of return value of fused function (#1491)
  • Support composition of fused functions (#1494)
  • Add compilation methods in Fusion class (#1497)
  • Allow cupy.get_array_module take fusion parameters (#1498)
  • Use CScalar in elementwise and reduction (#1508)
  • Use collections.abc to avoid DeprecationWarning in Python 3.7 (#1517)
  • Fix unit test errors in NumPy 1.15 (#1549)

Bug Fixes

  • Use cython.no_gc to avoid memory leak (#1474)
  • Fix errors on 0-sized inputs (#1485)
  • Fix type of reduction (#1502)
  • Avoid compile error in old GCC (CentOS6) (#1516)
  • Fix thrust memory allocation problem (#1525)
  • Fix dtype order of create_comparison (#1553)
  • Raise error when trying to broadcast out_params in ElementwiseKernel (#1556)

Documentation

  • Add upgrade guide for cupyx namespaces (#1496)
  • Update agnostic code tutorial (#1528, thanks @w-m!)

Tests

  • Add .pytest_cache/ to .gitignore (#1530)
  • Avoid hacking: Use the same test settings as Chainer (#1536)
Pre-release
Pre-release

@niboshi niboshi released this Jul 19, 2018 · 423 commits to master since this release

Assets 2

This is the release note of v5.0.0b3. See here for the complete list of solved issues and merged PRs.

Highlights

  • cupyx.scipy namespace has been introduced to provide SciPy-compatible APIs for CuPy ndarrays. cupy.sparse module has been renamed to cupyx.scipy.sparse; cupy.sparse is kept for backward compatibility.

  • New user-defined kernel class called cupy.RawKernel has been added. By using raw kernels, you can define kernels from raw CUDA source. See the documentation for details.

New Features

  • Introduce SciPy namespace (#1079)
  • Logarithmic gamma and related functions (#1232)
  • Binomial distribution (#1356)
  • Implement cupyx.scipy.linalg.solve_triangular (#1383)
  • Implement RawKernel (#1398)
  • Beta distribution (#1413)
  • Dirichlet distribution (#1415)

Enhancements

  • Use fmin and fmax for HIP environment (#1116)
  • Improve reduce_dims for speed up (#1324)
  • Remove overhead in creation/basic.py (#1342)
  • Improve performance of host to device memory copy (#1367)
  • Fix cupy.cov for degrees of freedom <= 0 (#1370, thanks @tsurumeso!)
  • Add compilation methods in Fusion class (#1382)
  • Make the method cupy.random.RandomState.interval private (#1430)
  • Use get_cublas_handle to reduce creation of Device object (#1440)
  • Remove overhead in generator and distribution (#1442)
  • Remove stream option from RawKernel and add missing docs of arguments in ReductionKernel (#1444)
  • Allow cupy.get_array_module to take fusion parameters (#1446)
  • Use internal.clp2 in reduction (#1448)

Bug Fixes

  • Fix issue of cuDNN convolution math_type setting (#1428)
  • Fix Module and LinkState not freed (#1439)
  • Fix fromDlpack memory management (#1445, thanks @t-vi!)
  • Fix error in PooledMemory in Python 3.7 (#1457)

Documentation

  • Fix documentation of the option arg of ElementwiseKernel (#1437)
  • Convert cupy.sparse to cupyx.scipy.sparse in docstrings (#1450)

Installation

  • Change required Cython version to 0.28 or later (#1407)

Tests

  • Fix requirements of numpy in test_einsum.py (#1400)
  • Refactor TestOrder (#1405)

@beam2d beam2d released this Jul 19, 2018 · 75 commits to v4 since this release

Assets 2

This is the release note of v4.3.0. See here for the complete list of solved issues and merged PRs.

Enhancements

  • Improve reduce_dims for speed up (#1424)
  • Remove overhead in creation/basic.py (#1425)
  • Improve performance of host to device memory copy (#1426)

Bug Fixes

  • Fix to accept longer order names (#1395)
  • Fix Module and LinkState not freed (#1441)
  • Fix error in PooledMemory in Python 3.7 (#1462)

Documentation

  • Fix documentation of the option arg of ElementwiseKernel (#1438)

Installation

  • Support bundling dependent DLLs for Windows wheel support (#1410)
  • Change required Cython version to 0.28 or later (#1412)

Tests

  • Refactor TestOrder (#1408)
Pre-release
Pre-release

@niboshi niboshi released this Jun 21, 2018 · 585 commits to master since this release

Assets 2

This is the release note of v5.0.0b2. See here for the complete list of solved issues and merged PRs.

Highlights

  • CuPy now supports DLPack to improve interoperability between frameworks. You can convert between cupy.ndarray and DLPack tensor using array.toDlpack() and cupy.fromDlpack(tensor). See the documentation for details.
  • CuPy ndarray now implements __array_ufunc__ protocol to improve interoperability with NumPy. It makes NumPy ufuncs applicable to CuPy ndarrays directly (for example, numpy.exp(cupy.ones(3)) will call cupy.exp to compute the exponential, and return CuPy ndarray).
  • CuPy now supports CUDA 9.2 and NumPy 1.14.
  • More NumPy/SciPy compatible methods have been implemented: cupy.linalg.matrix_power, cupy.random.laplace, cupy.corrcoef, cupy.cov, cupy.i0, cupy.sinc, cupyx.scipy.special.* and more.
  • cupy.einsum has been rewritten to use cuBLAS. This significantly reduces the memory usage and also improves the performance.
  • Allocation strategy of pinned memory has been improved to reduce host memory usage.
  • Fixed bugs in multiple functions with arrays with complex dtypes.

New Features

  • Support DLPack (#1082)
  • Implement cupy.corrcoef and cupy.cov (#1110, thanks @tsurumeso!)
  • Allow more natural fusion notation (#1167)
  • Implement special functions (#1233)
  • Implement __array_ufunc__ (#1247, thanks @martindurant!)
  • Improve performance of batch normalization (#1260)
  • Add complex dtype to sparse matrix (#1277, thanks @chengts95!)
  • Add cuDNN API for tensor operations and reduction (#1319, thanks @kashif!)
  • Add distribution laplace (#1321)
  • Implement cupy.linalg.matrix_power (#1374, thanks @ericmjl!)

Enhancements

  • Reduce Python function call in ElementwiseKernel (#725)
  • Check cuDNN convolution algorithm (#890)
  • Remove memory copy in diag function (#1129)
  • Fix linalg.matrix_rank casting for Windows (#1217)
  • Use cuBLAS in cupy.einsum (#1218)
  • Add divmod function to cupy namespace (#1286)
  • Improve ndarray initializing performance (#1341)
  • Fix type of return value of fused function (#1349)
  • Support composition of fused functions (#1350)
  • Support weak reference to CuPy array (#1355)
  • Remove cupy_stdint.h (#1361)
  • Round-up pooled memory allocation size with clp2 (#1372)
  • Reduce GPU memory usage in (de)convotion (#1381)
  • Support 'f' and 'c' in the order option of ndarray (#1385)

Bug Fixes

  • Fix OutOfMemoryError raised even when there are sufficient large freeable chunks (#1256, thanks @hyabe!)
  • Fix astype for complex dtypes (#1279)
  • Fix real and imag for zero-dim arrays (#1280)
  • Support rounding for complex types (#1282)
  • Support expm1, log1p, log2 for complex type (#1283)
  • Fix unary functions in misc for complex (#1284)
  • Fix binary functions in misc to support complex types (#1285)
  • Fix view of zerodim ndarray (#1287)
  • Catch C++ exceptions from Thrust (#1289)
  • Fix real, imag of non-contiguous complex arrays (#1303)
  • Fix rint syntax error (#1311)
  • Skip einsum test for NumPy versions with broken einsum (#1334)
  • Fix type of reduction (#1354)
  • Fix to accept longer order names (#1393)

Documentation

  • Add NumPy 1.14 to supported versions (#1139)
  • Update README to encourage use of wheels (#1208)
  • Improve sparse docs to show conversion from/to SciPy (#1213)
  • Force displaying known methods which are mis-recognized as attributes by Sphinx (#1250)
  • Expand reference on differences in zero-dimensional arrays (#1254)
  • Fix typo in sparse matrix docs (#1307)
  • Split LICENSE file (#1325)
  • Fix typo in profiler docs (#1327)
  • Reorganize license file (#1330)
  • Fix typos in zeros and zeros_like (#1357)
  • Update requirements for v5.0.0b2 / v4.2 release (#1369)

Installation

  • Use define macro in setup.py (#1121)
  • Support bundling dependent DLLs for Windows wheel support (#1253)
  • Remove deprecated imp.load_source in setup.py (#1329, thanks @vilyaair!)
  • Add license file to wheel (#1333)

Tests

  • Sparse complex ufunc (#1312)
  • Add scipy_name to testing helper functions (#1339)

@hvy hvy released this Jun 21, 2018 · 99 commits to v4 since this release

Assets 2

This is the release note of v4.2.0. See here for the complete list of solved issues and merged PRs.

Highlights

  • Allocation strategy of pinned memory has been improved to reduce host memory usage.
  • Fixed bugs in multiple functions with arrays with complex dtypes.

Enhancements

  • Use cuDNN v7 APIs to get conv algos for TensorCore (#1134)
  • Fix cupy.diag failures for array-likes objects other than CuPy arrays (#1235, thanks @hyabe!)
  • Remove memory copy in the cupy.diag function (#1337)
  • Support weak reference to CuPy array (#1359)
  • Fix to preserve dtype of an input array in cupy.linalg.norm (#1376)
  • Improve performance of ndarray initialization (#1377)
  • Round-up pooled memory allocation size with clp2 (#1386)
  • Reduce GPU memory usage in (de)convotion (#1387)
  • Support 'f' and 'c' in the order option of ndarray (#1390)

Bug Fixes

  • Fix cupy.matmul when inputs contain zero-sized array(s) (#1238)
  • Fix default dtype of cupy.full (#1257)
  • Fix dtype option of cupy.sum and cupy.prod (#1259)
  • sort, lexsort, and argsort catch C++ exceptions from Thrust (#1290)
  • Fix view of zero-dim ndarray (#1291)
  • Fix real and imag of zero-dim ndarray (#1292)
  • Support cupy.expm1, cupy.log1p, cupy.log2 for complex type (#1293)
  • Fix unary functions in cupy.math.misc to support complex types (#1297)
  • Fix binary functions in cupy.math.misc to support complex types (#1298)
  • Fix OutOfMemoryError raised even when there are sufficient large freeable chunks (#1301, thanks @hyabe!)
  • Fix astype for complex dtypes (#1302)
  • Fix real, imag of non-contiguous complex ndarray (#1306)
  • Rounding functions support complex ndarrays (#1308)

Documentation

  • Update README to encourage use of wheels (#1296)
  • Add NumPy 1.14 to supported versions (#1305)
  • Fix typo in sparse matrix docs (#1310)
  • Force displaying known methods which are mis-recognized as attributes by Sphinx (#1314)
  • Expand reference on differences in zero-dimensional arrays (#1315)
  • Split LICENSE file (#1326)
  • Fix typo in profiler docs (#1328)
  • Reorganize license files (#1335)
  • Fix typos in cupy.zeros and cupy.zeros_like (#1360)
  • Update requirements for v5.0.0b2 / v4.2 release (#1373)

Installation

  • Remove deprecated imp.load_source in setup.py (#1332, thanks @vilyaair!)
  • Add a license file to wheel (#1348)

Tests

  • Fix .coveragerc (#1212)
  • Remove _multiprocess_can_split_ (#1267)
Pre-release
Pre-release

@beam2d beam2d released this May 24, 2018 · 1026 commits to master since this release

Assets 2

This is the release notes of v5.0.0b1. See here for the complete list of solved issues and merged PRs.

Highlights

We started to provide wheels for Python 3.6 on Windows. Currently this is considered as experimental, and we'd love to hear your feedback. See Installation Guide for details.

New Features

  • Fix incompatibility between cupy.random.permutation and numpy.random.permutation. (#1138)
  • Implement unique (#1140)
  • Add cupy.average (#1180)
  • Implement triangular array creation routines (#1195, thanks @tsurumeso!)

Enhancements

  • Fix to preserve dtype of input array in cupy.linalg.norm (#875)
  • Support complex constants and functions in fuse (#1090)
  • Fix cupy.diag() fails for array-likes other than CuPy arrays (#1124, thanks @hyabe!)
  • Add NumPy-compatibility constants (#1163, thanks @keisuke-umezawa!)
  • Free pooled memory when cufftMakePlan1d cannot allocate memory (#1219)
  • Rename CuFftError to CuFFTError (#1234)

Bug Fixes

  • Fix memory leak: mempool tried to find out-of-bounds bin when freeing chunk (#1165)
  • Fix scalar casting rule to support Windows (#1169)
  • Fix regex in einsum to match empty input subscript (#1181)
  • Fix default dtype of full (#1209)
  • Fix matmul when inputs contain zero-sized array (#1231)
  • Fix dtype option of sum and prod (#1239)
  • Fix conversion from float16 to complex (#1241)
  • Fix file permissions (#1249)

Documentations

  • Fix missing documents (#1148)

Installation

  • Separate NVTX module for better Windows support (#1211)

Examples

Tests

  • Skip int8.max test on Windows due to NumPy bug (#1171)
  • Fix example test to pass on Windows (#1172)
  • Fix real and imag test for bool to pass on Windows (#1173)
  • Separate tests for cupy.power against complex dtypes (#1174)
  • Fix .coveragerc (#1210)
  • Fix 32-bit boundary test to support Windows (#1216)
  • Remove _multiprocess_can_split_ (#1220)
  • Fix hacking version (#1228)

Others

  • Fix .gitignore to exclude .pyd files (#1215)

@niboshi niboshi released this May 24, 2018 · 177 commits to v4 since this release

Assets 2

This is the release note of v4.1.0. See here for the complete list of solved issues and merged PRs.

Enhancements

  • Add NumPy-compatibility constants (#1205, thanks @keisuke-umezawa!)
  • Support complex constants and functions in fuse (#1207)
  • Free pooled memory when cufftMakePlan1d cannot allocate memory (#1236)
  • Rename CuFftError to CuFFTError (#1244)

Bug Fixes

  • Fix regex in einsum to match empty input subscript (#1186)
  • Fix memory leak: mempool tried to find out-of-bounds bin when freeing chunk (#1189)
  • Fix scalar casting rule to support Windows (#1194)
  • Fix conversion from float16 to complex (#1252)

Installation

  • Separate NVTX module for better Windows support (#1237)

Tests

  • Separate tests for cupy.power against complex dtype (#1187)
  • Fix example test to pass on Windows (#1188)
  • Fix real and imag test for bool to pass on Windows (#1192)
  • Skip int8.max test on Windows due to NumPy bug (#1193)
  • Fix hacking version (#1229)
  • Fix 32-bit boundary test to support Windows (#1255)

Others

  • Fix .gitignore to exclude .pyd files (#1227)
Pre-release
Pre-release

@niboshi niboshi released this Apr 17, 2018 · 1147 commits to master since this release

Assets 2

This is the release note of v5.0.0a1. See here for the complete list of solved issues and merged PRs.

New Features

  • Expose context management API in driver (#977)
  • Add 'edge' and 'reflect' mode to cupy.pad (#1040, thanks @wkentaro!)
  • Implement histogram (#1049, thanks @IshitaTakeshi!)
  • Implement multi-dimensional image processing (#1066)
  • Implement cupy.show_config and cupyx.get_runtime_info (#1067)

Enhancements

  • Expose all supported dtypes from numpy (#1070)
  • Support double precision atomicAdd on Maxwell or older GPUs (#1071, thanks @anaruse!)
  • Use cuDNN v7 APIs to get convolution algorithms for TensorCore (#1095, thanks @anaruse!)
  • Handle errors in cupy.show_config() (#1132)
  • Fix to capture CuDNNError in cupyx.runtime (#1136)

Bug Fixes

  • Fix moveaxis bug (#1023, thanks @fukatani!)
  • Fix diagflat to fail if argument is not cupy.ndarray (#1036)
  • Limit arch to the maximum value allowed in each NVRTC version (#1055)
  • Fix ndarray.real and ndarray.imag to return view (#1089)
  • Fix cupy.concatenate to support arrays with >= 2**31 elements (#1101)
  • Use streams when calling libraries (#1107)
  • Fix duplicate declaration of EigMode in cuSPARSE (#1108)
  • Fix duplicate delcaration of cudaError_t (#1112)
  • Fix cupy.linalg.inv() breaks its argument (#1123, thanks @hyabe!)
  • Use cusolverSpSetStream for cuSolverSP library calls (#1152)
  • Do not use platform-specific CC (#1157)

Documents

  • Fix typos: (#1046, #1077)
  • Update documentation for chainer.backends.cuda (#1047)
  • Rewrite installation guide (#1064)
  • Remove invalid argument description in cupy.tensordot (#1069)
  • Fix document of for_unsigned_dtypes (#1076)
  • Fix wrong references of document (#1078)
  • Fix document of ndimage (#1131)
  • Enable flake8 in cupy/indexing/generate.py (#1141)
  • Fix document of r_ and c_ (#1142)
  • Fix document of MemoryHook (#1143)

Installation

  • Use --no-cache-dir in Dockerfile (#1060)
  • Avoid embedding CUDA_PATH to RPATH in wheels (#1065)

Examples

  • Avoid to import matplotlib to set its backend Agg (#976)

Tests

  • Remove platform-dependent dtype (#1091)
  • Remove nose dependency (#1125)