Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep some memory reserved for external allocations. #718

Merged
merged 1 commit into from
Feb 19, 2021

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Feb 16, 2021

This PR splits the memory limit into a 'soft' and 'hard' limit, where the soft limit tries to keep a chunk of memory available for external libraries like CUDNN or CUBLAS. This should reduce the change of running into bad error situations that are often caused by an OOM but not reported as such. If it isn't possible to stick to the soft memory limit (i.e, after calling the GC a couple of times) we'll use the hard limit which is allowed to eat into the memory reserve up to any user-imposed memory limit.

@codecov
Copy link

codecov bot commented Feb 16, 2021

Codecov Report

Merging #718 (8ec0a33) into master (8f38be5) will decrease coverage by 3.16%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #718      +/-   ##
==========================================
- Coverage   79.88%   76.71%   -3.17%     
==========================================
  Files         122      122              
  Lines        7388     7714     +326     
==========================================
+ Hits         5902     5918      +16     
- Misses       1486     1796     +310     
Impacted Files Coverage Δ
src/pool/none.jl 0.00% <0.00%> (ø)
src/pool/simple.jl 0.00% <0.00%> (ø)
src/pool.jl 72.45% <85.71%> (+0.02%) ⬆️
src/pool/binned.jl 95.65% <100.00%> (ø)
src/pool/split.jl 96.08% <100.00%> (ø)
examples/vadd.jl 25.00% <0.00%> (-75.00%) ⬇️
lib/cufft/CUFFT.jl 50.00% <0.00%> (-50.00%) ⬇️
src/nnlib.jl 30.10% <0.00%> (-39.90%) ⬇️
src/indexing.jl 65.06% <0.00%> (-33.13%) ⬇️
examples/peakflops.jl 68.57% <0.00%> (-31.43%) ⬇️
... and 57 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f38be5...8ec0a33. Read the comment docs.

@maleadt
Copy link
Member Author

maleadt commented Feb 19, 2021

cc @denizyuret @DhairyaLGandhi @jonathan-laurent; these changes should make CUDNN less likely to crash when close to OOM, at the expense of some additional memory pressure. Please report back if you'd see excessive GC memory times on master or once this lands in a release.

Another part of the puzzle is #679, and I hope the combination of these two changes to make a serious dent in the memory management issue. What remains then is making the Julia GC free memory a little more often, but that's a much bigger change.

@maleadt maleadt merged commit a34e69e into master Feb 19, 2021
@maleadt maleadt deleted the tb/reserve_memory branch February 19, 2021 15:02
@DhairyaLGandhi
Copy link
Member

Anecdotally, it seems to have helped, but I don't have enough "before" numbers to say empirically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants