v3.5.0
·
1472 commits
to master
since this release
CUDA v3.5.0
Closed issues:
- Illegal memory access on 3.3 (#975)
- Forward compatibility (#1071)
- ambiguous
sparseconstructor (#1088) - Map reduce with float 16 (#1124)
- Allow invalid GPU pointers not allowed in unsafe_wrap (#1125)
- Scalar Indexing error in the Introduction docs (#1127)
- stackoverflow when printing a custom subtype of AbstractCuSparseMatrix (#1128)
- missing
randmethods (#1138) - Error mapreducing over a 0 dimensional array (#1141)
- seed! is not thread safe (#1158)
- Simplify Int32-based indices (#1160)
- Concatenating a scalar to a CuArray gives an Array (#1162)
- Calling
byte_permwithInt32values inserts sign checks (#1165) sum!does not compile for large arrays (#1169)- Same random sequence on GPU and CPU? (#1170)
- Specifying eltype and buffer type when adapting to
CuArray? (#1171) - Inefficient
lop3.lutinstructions generated (#1172) - Writing temporary PTX files can fail (#1173)
- Switching devices doesn't switch the REPL's output task (#1175)
- GC is not working for CuSparseMatrixCSR (#1178)
- sparse*dense operations shouldn't drop sparseness (#1188)
- Raises illegal memory access error randomly (#1189)
Merged pull requests:
- CI fixes (#950) (@maleadt)
- implement sparse (#1093) (@CarloLucibello)
- Use the kernel state object to pass the exception flag location. (#1110) (@maleadt)
- Update manifest (#1123) (@github-actions[bot])
- Improve show methods in sparse GPU arrays. (#1129) (@maleadt)
- Use warp intrinsics for a wider range of reductions. (#1130) (@maleadt)
- Support wrapping a host buffer with a CuArray (#1131) (@maleadt)
- support transpose CSC to CUDA CSR (#1132) (@Roger-luo)
- Small improvements to discovery of local toolkits. (#1134) (@maleadt)
- Rework device and context getters. (#1135) (@maleadt)
- Avoid memory operations during graph capture. (#1137) (@maleadt)
- Streamline the random number interface. (#1146) (@maleadt)
- Native device synchronization (#1147) (@maleadt)
- support interpret(reshape) (#1149) (@Roger-luo)
- add a gitignore (#1150) (@Roger-luo)
- Fix normalize on complex number (#1151) (@maleadt)
- Addition and multiplication over cuarray and cusparse (#1152) (@maleadt)
- Preserve Int32 hardware indices (#1153) (@maleadt)
- remove mutable to make device sparse type bitstype (#1154) (@Roger-luo)
- Update manifest (#1155) (@github-actions[bot])
- CompatHelper: bump compat for "BFloat16s" to "0.2" (#1156) (@github-actions[bot])
- Perform actual synchronization API calls when we need the memory (#1157) (@maleadt)
- Binary dependency changes (#1159) (@maleadt)
- Bump dependencies. (#1161) (@maleadt)
- Generalize Sparse Array Indices Type in Struct Def (#1163) (@Roger-luo)
- Use unchecked type conversions for
byte_permarguments (#1166) (@eschnett) - Fix performance regressions (#1167) (@maleadt)
- Fix big mapreduce kernel for inputs without neutral element. (#1174) (@maleadt)
- Switch contexts before performing memory operations on arrays (#1176) (@maleadt)
- Improvements to stream-ordered memory management (#1177) (@maleadt)
- Update manifest (#1180) (@github-actions[bot])
- Consistently use chars instead of raw enums in CUSPARSE/CUSOLVER functions. (#1181) (@maleadt)
- Implement forward compatibility (#1182) (@maleadt)
- Bump GPUCompiler for 1.8 compat. (#1183) (@maleadt)
- Bump GPUArrays. (#1186) (@maleadt)
- Update documentation (#1187) (@maleadt)