v4.2.0
CUDA v4.2.0
Closed issues:
- NVTX: consider using Start/End for ranges (#1485)
- Limitations of
CuIterator
(#1768) - Testing fails on unsupported devices. (#1815)
- Local runtime discovery does not work for external libraries (CUDNN, CUTENSOR) (#1850)
- Passing tests using Github CI workflow errors with
libcuda not defined
(#1867) - Cannot precompile GPU code with SnoopPrecompile (#1870)
- Incorrect kernel execution with bounds checking using Julia 1.9.0-rc2 (#1875)
- Fake CUDA library (#1879)
- Error thrown when launching Julia with Nsight systems or compute. (#1886)
- Cannot construct CuDeviceArray (#1887)
- Incorrect colVal array when using CuSparseMatrixCSR command on sparse matrix (#1888)
Merged pull requests:
- Use
adapt
symmetrically inCuIterator
(#1769) (@mcabbott) - Allow but warn when testing on not fully-supported devices. (#1818) (@maleadt)
- Support runtime discovery for non-toolkit libraries (CUTENSOR, CUDNN, CUQUANTUM) (#1858) (@mloubout)
- Add KernelAbstractions.jl unsafe_free! (#1863) (@pxl-th)
- Allow precompiling CUDA code. (#1865) (@maleadt)
- Assert CUDA.jl is functional when creating the TLS. (#1868) (@maleadt)
- Update manifest (#1871) (@github-actions[bot])
- Don't collect
AbstractQ
objects in tests (#1872) (@dkarrasch) - Add compatibility entry for Lovelace (#1873) (@xaellison)
- remove some type-piracy from cusparse (#1876) (@vtjnash)
- Remove more unneeded ndims methods. (#1878) (@maleadt)
- Guard the initialization-time CUDA driver check in a try/catch. (#1881) (@maleadt)
- Update manifest (#1882) (@github-actions[bot])
- Update CUDA 12.1 to 12.1.1. (#1883) (@maleadt)
- Use atomics for allocation statistics. (#1884) (@maleadt)
- Fix atomic increment of alloc stats. (#1885) (@maleadt)
- Update manifest (#1889) (@github-actions[bot])