v3.6.0
CUDA v3.6.0
Closed issues:
- Conversion issue (#157)
- Extend new RNG to Complex numbers & normal distributions (#726)
- Fatal errors during sorting tests (#916)
deepcopy
failing (#1202)- Kernel compilation fails when specifying shared memory array size as a tuple consisting of block dimension and kernel argument (#1205)
- ERROR: LoadError: The artifact at C:\Users\name.julia\artifacts\58bd87695e9ccdb508cb38be1ab717315ecc9152 is empty. (#1209)
- InvalidIRError when displaying a model which is on the GPU (#1212)
- CUDA.jl tries to load CUDA compat loaded via jll even though system package is installed (#1216)
- Synchronizing over blocks (#1220)
- assignment changes random seed (#1226)
accumulate
gives wrong answer wheninit != 0
(#1227)- Generic dot kernel: use multiple kernels instead of atomics (#1244)
- integer division error creating CuVector of
missing
andnothing
(#1251) - unsupported dynamic function invocation with union type of more than 2 elements (#1252)
- three CUDA.@atomic in a row result in out-of-bounds error (#1254)
- Float16 CAS cannot use atom.cas.b16.global on sm_61 (#1258)
cu(::SVector)
givesSVector
,cu(::MVector)
givesCuArray
(#1262)- Get back
unsafe_copyto!
methods for unified<-unified and unified<->device (#1263) - Passing and using a FFT plan in a CUDA kernel seems impossible (#1266)
- Inplace Complex FFT and Threads (#1268)
sort
returns nothing (#1270)- Release a new version (#1276)
__init_driver__
not called in 3.5 (#1280)- Shared memory does not support isbits unions. (#1281)
- NVIDIA Nsight Systems and
CUDA.@profile
error (#1282) - nvprof with
using CUDA
crashes julia (#1283)
Merged pull requests:
- Addition over CuSparseMatrix (#1195) (@yuehhua)
- [CUSOLVER] Add ordering functions (#1198) (@amontoison)
- Correctly handle multi-GPU instances with NVML. (#1199) (@maleadt)
- CI improvements. (#1200) (@maleadt)
- fix FFT workarea typo leading to memory corruption (#1204) (@marius311)
- Update manifest (#1206) (@github-actions[bot])
- Minor improvements for library wrappers (#1207) (@maleadt)
- Various small improvements (#1210) (@maleadt)
- Extend CuDeviceArray ctors for mixed-int indices. (#1211) (@maleadt)
- Deprecate non-blocking sync, and always call the synchronization API. (#1213) (@maleadt)
- Generic CUSPARSE: use the index arguments. (#1214) (@maleadt)
- Add bitonic sort implementation (#1217) (@xaellison)
- Update manifest (#1218) (@github-actions[bot])
- Reverted deepcopy, added test (#1221) (@birkmichael)
- Use broadcast instead of copies to initialize mapreduce buffers. (#1223) (@maleadt)
- Remove some unneeded Base module prefixes. (#1224) (@maleadt)
- Update manifest (#1225) (@github-actions[bot])
- Cherry-picked improvements (#1228) (@maleadt)
- Update introduction.jl (#1232) (@aramirezreyes)
- Update manifest (#1233) (@github-actions[bot])
- Fix SpMV for CUDA 11.5 (#1234) (@amontoison)
- Add support for randn and randexp. (#1236) (@maleadt)
- Avoid double-initializing partial accumulate results. (#1237) (@maleadt)
- Fix cuTENSOR contractions not working for FP16 inputs (#1238) (@thomasfaingnaert)
- Bump CUTENSOR and fix on CUDA 11.5 (#1239) (@maleadt)
- Support dot product on GPU between CuArrays with inconsistent eltypes (#1240) (@findmyway)
- Update manifest (#1241) (@github-actions[bot])
- Optimize CUTENSOR contraction. (#1243) (@maleadt)
- Don't use nondeterministic atomics in dot when requested. (#1245) (@maleadt)
- Remove CUBLAS decomposition tests without pivoting. (#1246) (@maleadt)
- Update manifest (#1247) (@github-actions[bot])
- wrap CUBLAS spmv and spr (#1248) (@bjarthur)
- CompatHelper: bump compat for "SpecialFunctions" to "2" (#1249) (@github-actions[bot])
- Update manifest (#1250) (@github-actions[bot])
- Store array offset as elements to fix all-singleton case. (#1255) (@maleadt)
- Update CUDA to 11.5 Update 1. (#1256) (@maleadt)
- Use Base functionality for iteration Union type components. (#1257) (@maleadt)
- Bump CI to Julia 1.7. (#1260) (@maleadt)
- Update manifest (#1261) (@github-actions[bot])
- Use CUDA APIs for unoptimized copies. (#1265) (@maleadt)
- Bump CUDNN to 8.3.1, enable CUDA 11.5 by default. (#1267) (@maleadt)
- Adding stream update for inplace complex FFT (#1269) (@ovanvincq)
- Fix sort! return type. (#1272) (@maleadt)
- Add const keyword to type aliases declarations. (#1273) (@eliascarv)
- Update manifest (#1274) (@github-actions[bot])
- Avoid eager expansion of CUDA_compat artifact string. (#1275) (@maleadt)
- Allow copies between unified arrays in different contexts. (#1277) (@maleadt)
- fix zeros and ones for user defined types (#1278) (@GiggleLiu)
- Make CUDNN depend on CUBLAS. (#1279) (@maleadt)
- Update manifest (#1286) (@github-actions[bot])
- Restore call to init_driver. (#1287) (@maleadt)
- Improvements for isbits union shared memory (#1288) (@maleadt)