v2.6.2
CUDA v2.6.2
Closed issues:
- High allocations and getindex (#150)
- ResNet spending much time in CuArrays GC (#149)
- Broadcast inference failure results in scalar iteration (#145)
- Allocator very slow to reclaim memory after running for sufficiently long (#137)
- Assignment using logical indexing (#131)
- CUDNN convolution allocates outside of the memory pool (#111)
- Logical indexing per-dim (#106)
- Threading-related assertion failure in split allocator (#97)
- dims support for softmax (#226)
- Memory pinning needs more features (#242)
- External allocations fail under high memory pressure (#340)
- Incomplete CUDNN wrappers (#343)
- softmax(x) and logsoftmax(x) update their arguments (#592)
- Freeing large buffers takes a while (#594)
- softmax has problem with dim parameter (#599)
- gemmEx on sm_52 results in CUBLAS_STATUS_ARCH_MISMATCH (#609)
- about the document of conditional use (#689)
- GPU run out of memory if 2 workers use the same GPU (#692)
- CURAND handles are collected early (#699)
- cudnnConvolutionForward fails memory checking (#702)
- Deadlock during OOM (#706)
- Segfault during trampoline allocation when querying occupancy from multiple threads (#707)
- Ballot intrinsics should use .sync variety (#711)
- cfunction $shmem_cint use after free (#713)
- OOM when evaluating a small resnet (with both Flux and Knet) (#714)
- Supprt CUDA 11.2 Update 1 (#715)
Base.mapreducedim
returns wrong answer with non-zero target array (#720)- CUBLAS_STATUS_ARCH_MISMATCH (#722)
- Test failures on linux (#727)
- Switching devices causes GC errors (#731)
- Pin CPU buffers when doing memory copies (#735)
- Memory free error with CUDA 11.2 and multi threads/GPUs (#737)
- Per-device memory pool (#742)
- Could not load library cudnn_ops_infer64_8.dll (#757)
- CUDA.lgamma(x) crashes Julia (#758)
Merged pull requests:
- New high level interface for cuDNN (#523) (@denizyuret)
- bilinear upsampling (#636) (@maxfreu)
- Use CUDA 11.2's stream-ordered allocator (#679) (@maleadt)
- Support an additional nvdisasm version. (#680) (@maleadt)
- Add fast getri_strided_batch (#682) (@cfranken)
- Fix race during multi-threaded init. (#687) (@maleadt)
- Update manifest (#690) (@github-actions[bot])
- Change to Buildkite v1 plugins. (#691) (@maleadt)
- CUPTI improvements for multithreading (#693) (@maleadt)
- Fix exception flag linkage for linking. (#694) (@maleadt)
- Update manifest (#695) (@github-actions[bot])
- Use simpler try/catch in show(CuError). (#696) (@maleadt)
- fix bug in CURAND.jl's set_stream function. (#698) (@norci)
- Update CUDNN to 8.1. (#701) (@maleadt)
- Remove special-cased algorithm selection for CUDNN convolution (#703) (@denizyuret)
- Keep track of active handles to avoid early collection. (#704) (@maleadt)
- Backports for Julia 1.5 (#705) (@maleadt)
- Support for cushow-ing multiple values, including LLVMPtrs. (#709) (@maleadt)
- Make CUDNN tests eagerly invoke at-test for better error reporting. (#710) (@maleadt)
- Report JIT error log with linker errors. (#712) (@maleadt)
- Update manifest (#716) (@github-actions[bot])
- Flip exception_flag filter! predicate (#717) (@S-D-R)
- Keep some memory reserved for external allocations. (#718) (@maleadt)
- Upgrade CUDA 11.2 to Update 1. (#719) (@maleadt)
- Add Abstract FFT compat (#721) (@DhairyaLGandhi)
- Add support for and switch test to warp-synchrnous vote intrinsics. (#723) (@maleadt)
- Specialize Base.to_index for AnyCuArray{Bool} (#724) (@pabloferz)
- Update manifest (#728) (@github-actions[bot])
- Eagerly dlopen cublasLt to prevent a system library getting picked up. (#729) (@maleadt)
- Switch tests over to compute-sanitizer. (#730) (@maleadt)
- Perform pool operations in the correct context. (#732) (@maleadt)
- Streamline use of retry_reclaim. (#733) (@maleadt)
- Threading fixes (#734) (@maleadt)
- copied the old rnn.jl->rnncompat.jl for Flux compatibility (#738) (@denizyuret)
- fix testmode batchnorm back (#739) (@CarloLucibello)
- Don't error out if failing to parse the local CUDA version. (#740) (@maleadt)
- Backport #739 (#741) (@maleadt)
- Add back an older artifact for CUDNN on PPC with CUDA 10.2. (#743) (@maleadt)
- Use the default memory pool. (#745) (@maleadt)
- Use a memory pool per device. (#746) (@maleadt)
- Test sort with at-test at the toplevel, for better reporting. (#749) (@maleadt)
- remove NNlib (#753) (@CarloLucibello)
- Update manifest (#754) (@github-actions[bot])
- Fix cuda-memcheck, don't use memory pools. (#756) (@maleadt)
- Update generated wrappers (#759) (@maleadt)
- Rework memory pinning and speed up async ops on unpinned memory (#760) (@maleadt)
- Improve context switching (#761) (@maleadt)
- Update manifest (#765) (@github-actions[bot])
- Docs on multitasking (#766) (@maleadt)
- Update to CUDA 11.2 Update 2. (#768) (@maleadt)
- Small backports for CUDA 2.4 / Julia 1.5 (#770) (@maleadt)
- Backports for CUDA 2.6 / Julia 1.6 (#771) (@maleadt)