-
Notifications
You must be signed in to change notification settings - Fork 258
Description
Describe the bug
Have been working on model using 5 CNN layers and Dense, but same model that worked before after updating CUDA started giving out error
Out of GPU memory trying to allocate 653.794 MiB
Effective GPU memory usage: 99.32% (7.734 GiB/7.787 GiB)
Memory pool usage: 5.630 GiB (6.344 GiB reserved)
Revetring back to CUDA@3.8.0 which I luckily had in git log fixed the problem.
To reproduce
I used https://github.com/FluxML/model-zoo/tree/master/vision/conv_mnist
Updated CUDA and Flux to latest versions and changed a bit model to make it stupidly bigger so number of parameters is more similar to my model. See attached diff
LaNet.txt
Manifest.toml
I've tested in the following order
CUDA@3.8.5 #1 memory out
CUDA@3.8.4 #2 memory out #6 memory out again
CUDA@3.8.0 #3 works
CUDA@3.8.2 #4 works
CUDA@3.8.3 #5 works
So something bad happens between 3.8.3 and 3.8.4.
Expected behavior
I expected the same code that worked before to still work with CUDA@3.8.4+
Version info
Details on Julia:
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 6
JULIA_EDITOR = atom -a
Details on CUDA:
CUDA toolkit 11.6, artifact installation
NVIDIA driver 470.103.1, for CUDA 11.4
CUDA driver 11.4
Libraries:
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: 11.0.0+470.103.1
- CUDNN: 8.30.2 (for CUDA 11.5.0)
Downloaded artifact: CUTENSOR
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.7.2
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
1 device:
0: NVIDIA GeForce RTX 2070 with Max-Q Design (sm_75, 4.775 GiB / 7.787 GiB available)
Additional context
With broken CUDA@3.8.4 using
GC.gc(true);
CUDA.reclaim();
at the end of each mini-batch also helped to keep memory usage down to 21%, when this is not done in all cases nvtop shows 100% memory usage, but CUDA 3.8.0->3.8.3 is not crashing with out of memory error, while 3.8.4 and 3.8.5 just crash.