Release v2.6.2 · JuliaGPU/CUDA.jl

CUDA v2.6.2

Diff since v2.6.1

Closed issues:

High allocations and getindex (#150)
ResNet spending much time in CuArrays GC (#149)
Broadcast inference failure results in scalar iteration (#145)
Allocator very slow to reclaim memory after running for sufficiently long (#137)
Assignment using logical indexing (#131)
CUDNN convolution allocates outside of the memory pool (#111)
Logical indexing per-dim (#106)
Threading-related assertion failure in split allocator (#97)
dims support for softmax (#226)
Memory pinning needs more features (#242)
External allocations fail under high memory pressure (#340)
Incomplete CUDNN wrappers (#343)
softmax(x) and logsoftmax(x) update their arguments (#592)
Freeing large buffers takes a while (#594)
softmax has problem with dim parameter (#599)
gemmEx on sm_52 results in CUBLAS_STATUS_ARCH_MISMATCH (#609)
about the document of conditional use (#689)
GPU run out of memory if 2 workers use the same GPU (#692)
CURAND handles are collected early (#699)
cudnnConvolutionForward fails memory checking (#702)
Deadlock during OOM (#706)
Segfault during trampoline allocation when querying occupancy from multiple threads (#707)
Ballot intrinsics should use .sync variety (#711)
cfunction $shmem_cint use after free (#713)
OOM when evaluating a small resnet (with both Flux and Knet) (#714)
Supprt CUDA 11.2 Update 1 (#715)
Base.mapreducedim returns wrong answer with non-zero target array (#720)
CUBLAS_STATUS_ARCH_MISMATCH (#722)
Test failures on linux (#727)
Switching devices causes GC errors (#731)
Pin CPU buffers when doing memory copies (#735)
Memory free error with CUDA 11.2 and multi threads/GPUs (#737)
Per-device memory pool (#742)
Could not load library cudnn_ops_infer64_8.dll (#757)
CUDA.lgamma(x) crashes Julia (#758)

Merged pull requests:

New high level interface for cuDNN (#523) (@denizyuret)
bilinear upsampling (#636) (@maxfreu)
Use CUDA 11.2's stream-ordered allocator (#679) (@maleadt)
Support an additional nvdisasm version. (#680) (@maleadt)
Add fast getri_strided_batch (#682) (@cfranken)
Fix race during multi-threaded init. (#687) (@maleadt)
Update manifest (#690) (@github-actions[bot])
Change to Buildkite v1 plugins. (#691) (@maleadt)
CUPTI improvements for multithreading (#693) (@maleadt)
Fix exception flag linkage for linking. (#694) (@maleadt)
Update manifest (#695) (@github-actions[bot])
Use simpler try/catch in show(CuError). (#696) (@maleadt)
fix bug in CURAND.jl's set_stream function. (#698) (@norci)
Update CUDNN to 8.1. (#701) (@maleadt)
Remove special-cased algorithm selection for CUDNN convolution (#703) (@denizyuret)
Keep track of active handles to avoid early collection. (#704) (@maleadt)
Backports for Julia 1.5 (#705) (@maleadt)
Support for cushow-ing multiple values, including LLVMPtrs. (#709) (@maleadt)
Make CUDNN tests eagerly invoke at-test for better error reporting. (#710) (@maleadt)
Report JIT error log with linker errors. (#712) (@maleadt)
Update manifest (#716) (@github-actions[bot])
Flip exception_flag filter! predicate (#717) (@S-D-R)
Keep some memory reserved for external allocations. (#718) (@maleadt)
Upgrade CUDA 11.2 to Update 1. (#719) (@maleadt)
Add Abstract FFT compat (#721) (@DhairyaLGandhi)
Add support for and switch test to warp-synchrnous vote intrinsics. (#723) (@maleadt)
Specialize Base.to_index for AnyCuArray{Bool} (#724) (@pabloferz)
Update manifest (#728) (@github-actions[bot])
Eagerly dlopen cublasLt to prevent a system library getting picked up. (#729) (@maleadt)
Switch tests over to compute-sanitizer. (#730) (@maleadt)
Perform pool operations in the correct context. (#732) (@maleadt)
Streamline use of retry_reclaim. (#733) (@maleadt)
Threading fixes (#734) (@maleadt)
copied the old rnn.jl->rnncompat.jl for Flux compatibility (#738) (@denizyuret)
fix testmode batchnorm back (#739) (@CarloLucibello)
Don't error out if failing to parse the local CUDA version. (#740) (@maleadt)
Backport #739 (#741) (@maleadt)
Add back an older artifact for CUDNN on PPC with CUDA 10.2. (#743) (@maleadt)
Use the default memory pool. (#745) (@maleadt)
Use a memory pool per device. (#746) (@maleadt)
Test sort with at-test at the toplevel, for better reporting. (#749) (@maleadt)
remove NNlib (#753) (@CarloLucibello)
Update manifest (#754) (@github-actions[bot])
Fix cuda-memcheck, don't use memory pools. (#756) (@maleadt)
Update generated wrappers (#759) (@maleadt)
Rework memory pinning and speed up async ops on unpinned memory (#760) (@maleadt)
Improve context switching (#761) (@maleadt)
Update manifest (#765) (@github-actions[bot])
Docs on multitasking (#766) (@maleadt)
Update to CUDA 11.2 Update 2. (#768) (@maleadt)
Small backports for CUDA 2.4 / Julia 1.5 (#770) (@maleadt)
Backports for CUDA 2.6 / Julia 1.6 (#771) (@maleadt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.6.2

CUDA v2.6.2