Release v3.6.0 · JuliaGPU/CUDA.jl

CUDA v3.6.0

Diff since v3.5.0

Closed issues:

Conversion issue (#157)
Extend new RNG to Complex numbers & normal distributions (#726)
Fatal errors during sorting tests (#916)
deepcopy failing (#1202)
Kernel compilation fails when specifying shared memory array size as a tuple consisting of block dimension and kernel argument (#1205)
ERROR: LoadError: The artifact at C:\Users\name.julia\artifacts\58bd87695e9ccdb508cb38be1ab717315ecc9152 is empty. (#1209)
InvalidIRError when displaying a model which is on the GPU (#1212)
CUDA.jl tries to load CUDA compat loaded via jll even though system package is installed (#1216)
Synchronizing over blocks (#1220)
assignment changes random seed (#1226)
accumulate gives wrong answer when init != 0 (#1227)
Generic dot kernel: use multiple kernels instead of atomics (#1244)
integer division error creating CuVector of missing and nothing (#1251)
unsupported dynamic function invocation with union type of more than 2 elements (#1252)
three CUDA.@atomic in a row result in out-of-bounds error (#1254)
Float16 CAS cannot use atom.cas.b16.global on sm_61 (#1258)
cu(::SVector) gives SVector, cu(::MVector) gives CuArray (#1262)
Get back unsafe_copyto!methods for unified<-unified and unified<->device (#1263)
Passing and using a FFT plan in a CUDA kernel seems impossible (#1266)
Inplace Complex FFT and Threads (#1268)
sort returns nothing (#1270)
Release a new version (#1276)
__init_driver__ not called in 3.5 (#1280)
Shared memory does not support isbits unions. (#1281)
NVIDIA Nsight Systems and CUDA.@profile error (#1282)
nvprof with using CUDA crashes julia (#1283)

Merged pull requests:

Addition over CuSparseMatrix (#1195) (@yuehhua)
[CUSOLVER] Add ordering functions (#1198) (@amontoison)
Correctly handle multi-GPU instances with NVML. (#1199) (@maleadt)
CI improvements. (#1200) (@maleadt)
fix FFT workarea typo leading to memory corruption (#1204) (@marius311)
Update manifest (#1206) (@github-actions[bot])
Minor improvements for library wrappers (#1207) (@maleadt)
Various small improvements (#1210) (@maleadt)
Extend CuDeviceArray ctors for mixed-int indices. (#1211) (@maleadt)
Deprecate non-blocking sync, and always call the synchronization API. (#1213) (@maleadt)
Generic CUSPARSE: use the index arguments. (#1214) (@maleadt)
Add bitonic sort implementation (#1217) (@xaellison)
Update manifest (#1218) (@github-actions[bot])
Reverted deepcopy, added test (#1221) (@birkmichael)
Use broadcast instead of copies to initialize mapreduce buffers. (#1223) (@maleadt)
Remove some unneeded Base module prefixes. (#1224) (@maleadt)
Update manifest (#1225) (@github-actions[bot])
Cherry-picked improvements (#1228) (@maleadt)
Update introduction.jl (#1232) (@aramirezreyes)
Update manifest (#1233) (@github-actions[bot])
Fix SpMV for CUDA 11.5 (#1234) (@amontoison)
Add support for randn and randexp. (#1236) (@maleadt)
Avoid double-initializing partial accumulate results. (#1237) (@maleadt)
Fix cuTENSOR contractions not working for FP16 inputs (#1238) (@thomasfaingnaert)
Bump CUTENSOR and fix on CUDA 11.5 (#1239) (@maleadt)
Support dot product on GPU between CuArrays with inconsistent eltypes (#1240) (@findmyway)
Update manifest (#1241) (@github-actions[bot])
Optimize CUTENSOR contraction. (#1243) (@maleadt)
Don't use nondeterministic atomics in dot when requested. (#1245) (@maleadt)
Remove CUBLAS decomposition tests without pivoting. (#1246) (@maleadt)
Update manifest (#1247) (@github-actions[bot])
wrap CUBLAS spmv and spr (#1248) (@bjarthur)
CompatHelper: bump compat for "SpecialFunctions" to "2" (#1249) (@github-actions[bot])
Update manifest (#1250) (@github-actions[bot])
Store array offset as elements to fix all-singleton case. (#1255) (@maleadt)
Update CUDA to 11.5 Update 1. (#1256) (@maleadt)
Use Base functionality for iteration Union type components. (#1257) (@maleadt)
Bump CI to Julia 1.7. (#1260) (@maleadt)
Update manifest (#1261) (@github-actions[bot])
Use CUDA APIs for unoptimized copies. (#1265) (@maleadt)
Bump CUDNN to 8.3.1, enable CUDA 11.5 by default. (#1267) (@maleadt)
Adding stream update for inplace complex FFT (#1269) (@ovanvincq)
Fix sort! return type. (#1272) (@maleadt)
Add const keyword to type aliases declarations. (#1273) (@eliascarv)
Update manifest (#1274) (@github-actions[bot])
Avoid eager expansion of CUDA_compat artifact string. (#1275) (@maleadt)
Allow copies between unified arrays in different contexts. (#1277) (@maleadt)
fix zeros and ones for user defined types (#1278) (@GiggleLiu)
Make CUDNN depend on CUBLAS. (#1279) (@maleadt)
Update manifest (#1286) (@github-actions[bot])
Restore call to init_driver. (#1287) (@maleadt)
Improvements for isbits union shared memory (#1288) (@maleadt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.6.0

CUDA v3.6.0

Contributors