Release v4.2.0 · JuliaGPU/CUDA.jl

CUDA v4.2.0

Closed issues:

NVTX: consider using Start/End for ranges (#1485)
Limitations of CuIterator (#1768)
Testing fails on unsupported devices. (#1815)
Local runtime discovery does not work for external libraries (CUDNN, CUTENSOR) (#1850)
Passing tests using Github CI workflow errors with libcuda not defined (#1867)
Cannot precompile GPU code with SnoopPrecompile (#1870)
Incorrect kernel execution with bounds checking using Julia 1.9.0-rc2 (#1875)
Fake CUDA library (#1879)
Error thrown when launching Julia with Nsight systems or compute. (#1886)
Cannot construct CuDeviceArray (#1887)
Incorrect colVal array when using CuSparseMatrixCSR command on sparse matrix (#1888)

Merged pull requests:

Use adapt symmetrically in CuIterator (#1769) (@mcabbott)
Allow but warn when testing on not fully-supported devices. (#1818) (@maleadt)
Support runtime discovery for non-toolkit libraries (CUTENSOR, CUDNN, CUQUANTUM) (#1858) (@mloubout)
Add KernelAbstractions.jl unsafe_free! (#1863) (@pxl-th)
Allow precompiling CUDA code. (#1865) (@maleadt)
Assert CUDA.jl is functional when creating the TLS. (#1868) (@maleadt)
Update manifest (#1871) (@github-actions[bot])
Don't collect AbstractQ objects in tests (#1872) (@dkarrasch)
Add compatibility entry for Lovelace (#1873) (@xaellison)
remove some type-piracy from cusparse (#1876) (@vtjnash)
Remove more unneeded ndims methods. (#1878) (@maleadt)
Guard the initialization-time CUDA driver check in a try/catch. (#1881) (@maleadt)
Update manifest (#1882) (@github-actions[bot])
Update CUDA 12.1 to 12.1.1. (#1883) (@maleadt)
Use atomics for allocation statistics. (#1884) (@maleadt)
Fix atomic increment of alloc stats. (#1885) (@maleadt)
Update manifest (#1889) (@github-actions[bot])