Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page based heap size heuristics #50144

Merged
merged 13 commits into from Jul 23, 2023
Merged

Conversation

gbaraldi
Copy link
Member

@gbaraldi gbaraldi commented Jun 12, 2023

This PR implements GC heuristics based on the amount of pages allocated instead of live objects like was done before.
The heuristic for new heap target is based on https://dl.acm.org/doi/10.1145/3563323 (in summary it argues that the heap target should have square root behaviour).
From my testing this fixes #49545 and #49761

@giordano giordano added the GC Garbage collector label Jun 12, 2023
@gbaraldi gbaraldi marked this pull request as ready for review June 14, 2023 22:05
src/gc.c Outdated Show resolved Hide resolved
@vchuravy
Copy link
Sponsor Member

Could you add documentation to https://docs.julialang.org/en/v1/devdocs/gc/ this would also help with other people understanding the trade-offs in the design.

@vchuravy vchuravy added needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels Jun 15, 2023
@gbaraldi
Copy link
Member Author

It does seem like 32 bit is just very near OOM, this PR shouldn't really affect that.

@gbaraldi
Copy link
Member Author

Wtf is up with windows

@gbaraldi gbaraldi removed needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels Jun 22, 2023
@oscardssmith
Copy link
Member

Am I reading correctly that this PR sets the max heap for 32 bit systems to 1mb?

sysimage.mk Outdated Show resolved Hide resolved
@gbaraldi
Copy link
Member Author

It does, I'm losing my mind with the 32 bit stuff.

@oscardssmith
Copy link
Member

IMO playing with the number more is unlikely to help. If it's running out of address space with a 1GB cap that pretty strongly suggests to me that the algorithm is the problem, not the number.

@gbaraldi
Copy link
Member Author

So the issue here is that the old algorithm didn't actually use the target you set it to, so 2GB for it didn't mean much, while the new algorithm will use that much memory if needed. And it turns out that finding that number is annoying. (You can check the maxrss of the test processes, I'm basically trying to match what we got before.

@vchuravy vchuravy changed the title First implementation of proper heap size heuristics Page based heap size heuristics Jun 26, 2023
src/gc-pages.c Outdated Show resolved Hide resolved
src/gc-pages.c Outdated Show resolved Hide resolved
src/gc.c Outdated Show resolved Hide resolved
@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 14, 2023

Each GC may thrash the pagefile, so I would think any extra GC is too much

@gbaraldi
Copy link
Member Author

gbaraldi commented Jul 14, 2023

Not sure what you mean by extra GC? This is for the basically hanged condition of using more memory than we allowed. For now if we detect that we now increase the heap using the default behaviour because the rate based ones are probably nonsense in this condition.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 14, 2023

I meant that uv_get_total_memory should not be involved with the computation, since that is a lower bound on the amount of usable memory, not an upper bound.

@gbaraldi
Copy link
Member Author

Oh, I see. The way I implemented it GC thrashing is GC time vs mutator time, the heap size doesn't actually influence it.

src/gc.c Outdated
@@ -3575,7 +3588,7 @@ void jl_gc_init(void)
if (total_mem < 128e9)
percent = total_mem * 2.34375e-12 + 0.6; // 60% at 0 gigs and 90% at 128 to not
else // overcommit too much on memory contrained devices
percent = 0.9;
percent = 0.8;
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these thresholds are now provably wrong now. For uv_get_total_memory, we want allow allocating 10x that amount before we start worry too much about the OOM killer, otherwise we are just trashing performance by thrashing swap for no reason. For uv_get_constrained_memory, the number should be something like 0.8% - 256MB so that we account for the fixed cost of the system image (roughly) and otherwise try to push right up against the limit (leaving only about 20% for other parts of the program, as a guess).

Comment on lines +1080 to +1081
jl_atomic_store_relaxed(&gc_heap_stats.heap_size,
jl_atomic_load_relaxed(&gc_heap_stats.heap_size) - (v->sz&~3));
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong, you want an atomic decrement, probably?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big sweep is single threaded for now.

Comment on lines +1141 to +1143
uint64_t alloc_acc = jl_atomic_load_relaxed(&ptls->gc_num.alloc_acc);
if (alloc_acc + sz < 16*1024)
jl_atomic_store_relaxed(&ptls->gc_num.alloc_acc, alloc_acc + sz);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similary here, you could lose updates if you don't do atomic_inc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloc_acc is a per thread value, the global update is a couple lines below with the full fetch_add

@oscardssmith oscardssmith merged commit 32aa29f into JuliaLang:master Jul 23, 2023
4 of 6 checks passed
KristofferC pushed a commit that referenced this pull request Jul 24, 2023
This PR implements GC heuristics based on the amount of pages allocated
instead of live objects like was done before.
The heuristic for new heap target is based on
https://dl.acm.org/doi/10.1145/3563323 (in summary it argues that the
heap target should have square root behaviour).
From my testing this fixes
#49545 and
#49761

(cherry picked from commit 32aa29f)
KristofferC added a commit that referenced this pull request Jul 24, 2023
Backported PRs:
- [x] #50411 <!-- Fix weird dispatch of * with zero arguments -->
- [x] #50202 <!-- Remove dynamic dispatch from _wait/wait2 -->
- [x] #50064 <!-- Fix numbered prompt with input only with comment -->
- [x] #50026 <!-- Store heapsnapshot files in tempdir() instead of
current directory -->
- [x] #50402 <!-- Add CPU feature helper function -->
- [x] #50387 <!-- update newpages pointer after actually sweeping pages
-->
- [x] #50424 <!-- avoid potential type-instability in _replace_(str,
...) -->
- [x] #50444 <!-- Optimize getfield lowering to avoid boxing in some
cases -->
- [x] #50474 <!-- docs: Fix a `!!! note` which was miscapitalized -->
- [x] #50466 <!-- relax assertion involving pg->nold to reflect that it
may be a bit in… -->
- [x] #50490 <!-- Fix compat annotation for italic printstyled -->
- [x] #50488 <!-- fix typo in `Base.isassigned` with `Tridiagonal` -->
- [x] #50476 <!-- Profile: Add specifying dir for `take_heap_snapshot`
and handling if current dir is unwritable -->
- [x] #50461 <!-- fix typo in the --gcthreads argument description -->
- [x] #50528 <!-- ssair: Correctly handle stmt insertion at end of basic
block -->
- [x] #50533 <!-- ensure internal_obj_base_ptr checks whether objects
past freelist pointer are in freelist -->
- [x] #49322 <!-- improve cat design / performance -->
- [x] #50540 <!-- gc: remove over-eager assertion -->
- [x] #50542 <!-- gf: remove unnecessary assert cycle==depth -->
- [x] #50559 <!-- Expand kwcall lowering positional default check to
vararg -->
- [x] #50058 <!-- Add unwrapping mechanism for triangular mul and solves
-->
- [x] #50551 <!-- typeintersect: also record chained `innervars` -->
- [x] #50552 <!-- read(io, Char): fix read with too many leading ones
-->
- [x] #50541 <!-- precompile: ensure globals are not accidentally
created where disallowed -->
- [x] #50576 <!-- use atomic compare exchange when setting the GC
mark-bit -->
- [x] #50578 <!-- gf: make method overwrite/delete an error during
precompile -->
- [x] #50516 <!-- Fix visibility of assert on GCC12/13 -->
- [x] #50597 <!-- Fix memory corruption if task is launched inside
finalizer -->
- [x] #50591 <!-- build: fix various makefile bugs -->
- [x] #50599 <!-- faster invalid object lookup in conservative gc -->
- [x] #50634 <!-- 🤖 [master] Bump the SparseArrays stdlib from b4b0e72
to 99c99b4 -->
- [x] #50639 <!-- Backport LLVM patches to fix various issues. -->
- [x] #50546 <!-- Revert storage of method instance in LineInfoNode -->
- [x] #50631 <!-- Shift DCE pass to optimize imaging mode code better
-->
- [x] #50525 <!-- only check that values are finite in `generic_lufact`
when `check=true` -->
- [x] #50587 <!-- isassigned for ranges with BigInt indices -->
- [x] #50144 <!-- Page based heap size heuristics -->


Need manual backport:
- [ ] #50595 <!-- Rename ENV variable `JULIA_USE_NEW_PARSER` ->
`JULIA_USE_FLISP_PARSER` -->



Non-merged PRs with backport label:
- [ ] #50637 <!-- Remove SparseArrays legacy code -->
- [ ] #50618 <!-- inference: continue const-prop' when concrete-eval
returns non-inlineable -->
- [ ] #50598 <!-- only limit types in stack traces in the REPL -->
- [ ] #50594 <!-- Disallow non-index Integer types in isassigned -->
- [ ] #50568 <!-- `Array(::AbstractRange)` should return an `Array` -->
- [ ] #50523 <!-- Avoid generic call in most cases for getproperty -->
- [ ] #50172 <!-- print feature flags used for matching pkgimage -->
@KristofferC KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label Jul 24, 2023
@gbaraldi gbaraldi deleted the new-heuristics branch August 14, 2023 16:15
if (target_allocs == 0.0 || thrashing) // If we are thrashing go back to default
target_allocs = 2*sqrt((double)heap_size/min_interval);

uint64_t target_heap = (uint64_t)target_allocs*min_interval + heap_size;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be equivalent to sqrt(heap_sizealloc_rate/(gc_ratetuning_factor)) * sqrt(min_interval) + heap_size.
I am a bit confused - shouldnt this be independent of min_interval?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used it as a scaling factor basically. It made it easier for me to reason about what taking the sqrt meant.

d-netto pushed a commit to RelationalAI/julia that referenced this pull request Oct 4, 2023
This PR implements GC heuristics based on the amount of pages allocated
instead of live objects like was done before.
The heuristic for new heap target is based on
https://dl.acm.org/doi/10.1145/3563323 (in summary it argues that the
heap target should have square root behaviour).
From my testing this fixes
JuliaLang#49545 and
JuliaLang#49761
d-netto added a commit that referenced this pull request Oct 20, 2023
The 1.10 GC heuristics introduced in
#50144 have been a source of
concerning issues such as
#50705 and
#51601. The PR also doesn't
correctly implement the paper on which it's based, as discussed in
#51498.

Test whether the 1.8 GC heuristics are a viable option.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IOBuffer/fileIO Memory leak with Threads.@spawn
8 participants