Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent page sweeping #48969

Merged
merged 1 commit into from Jun 28, 2023
Merged

Concurrent page sweeping #48969

merged 1 commit into from Jun 28, 2023

Conversation

d-netto
Copy link
Member

@d-netto d-netto commented Mar 11, 2023

Extends #48600 by making sweeping of object pools concurrent.

@d-netto d-netto added the GC Garbage collector label Mar 11, 2023
@d-netto d-netto marked this pull request as draft March 11, 2023 23:38
@vchuravy
Copy link
Sponsor Member

Can you rebase on top of #48600?

@d-netto d-netto force-pushed the dcn/psweep branch 4 times, most recently from c2c1855 to 1b22c5d Compare May 9, 2023 02:53
@d-netto d-netto marked this pull request as ready for review May 10, 2023 19:04
@d-netto d-netto changed the title WIP: Parallel sweeping Parallel/concurrent sweeping May 10, 2023
@oscardssmith
Copy link
Member

Is this intentionally on top of #49644?

@d-netto d-netto force-pushed the dcn/psweep branch 9 times, most recently from cf8d9fb to a9ad110 Compare May 14, 2023 15:33
@chriselrod

This comment was marked as outdated.

Copy link
Sponsor Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you disentangle this from #49644 so that we can study
#48969 (comment) independently?

@d-netto
Copy link
Member Author

d-netto commented May 15, 2023

Should be independent of #49644 now.

src/gc.c Outdated Show resolved Hide resolved
src/gc.c Outdated Show resolved Hide resolved
@d-netto
Copy link
Member Author

d-netto commented May 16, 2023

Latest commit should allow part of sweeping of object pools to run concurrently with mutator threads independently of whether we have GC threads or not (e.g. a program running with --threads=4 --gcthreads=1 could in theory benefit from it).

The cost if, of course, more contention on gc_perm_lock. This needs more careful analysis to confirm that we are not making GC pauses shorter at the cost of throughput.

@gbaraldi
Copy link
Member

gbaraldi commented Jun 6, 2023

I think the solution is to do away with that perm_lock. It doesn't seem too complicated to do that and switch to doing Compare and Swap.

@vchuravy vchuravy self-requested a review June 6, 2023 15:27
@PallHaraldsson
Copy link
Contributor

it seems like this PR can be detrimental to throughput even though it reduces GC pauses in a few cases.

Both are good properties, so if there's (necessarily) a trade-off, can it still me merged with it off by default, and an ENV var to enable for low GC pauses? While you want to avoid allocations, and the GC entirely, for real-time, it's hard to do fully, and shorter pauses very valuable for soft real-time. It's just a question what to call this ENV var, CONCURRENT_SWEEP_GC (or e.g. SOFT_REAL-TIME_GC)?

@vchuravy vchuravy self-requested a review June 25, 2023 16:50
@vchuravy vchuravy changed the title Concurrent sweeping Concurrent page sweeping Jun 25, 2023
Copy link
Sponsor Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR brings real and tangible benefits for multi-threaded code that is allocating, by significantly shortening the STW phase, therefore improving scalabity (Amdahl's law says hi).

I believe we should add an environment and runtime flag for this feature.

On systems vulnerable to Meltdown&Spector KPTI can cause iTLB flushes. With concurrent page-sweeping instead of paying this cost "once"
we will concurrently invalidate the iTLB leading to runtime performance loss.

In particular for the GCBenchmark tree_multable I saw an increase in cpu-time being spent in __madvise and cpu-time being spent in asm_sysvec_call_function on the threads that are not running concurrent GC.

@kpamnany also voiced discomfort with the system being oversubscribed.

I also found it counter-intuitive that --gcthreads=1 would disable concurrent page sweeping.

In the long-term open questions for me are:

  • Could we implement this with the tasking system, e.g. schedule a task that will some cleanup work?
  • We could try out io_uring for batching the madvise calls, but that would be significant work.
  • If concurrent sweeping is disabled, we could run this after the STW phase ended, but before the finalizers. This would alleviate some of @kpamnany oversubscription concerns, while still moving the cost out of the STW phase.

@d-netto
Copy link
Member Author

d-netto commented Jun 25, 2023

To keep things consistent with --threads, I was thinking about something like --gcthreads=X,Y, with X being the number of threads that may run parallel marking and Y being the number of threads that may run concurrent page sweeping (0 or 1, chosen to be 0 by default).

Open to suggestions on that.

@d-netto d-netto force-pushed the dcn/psweep branch 3 times, most recently from cee0701 to 4fedfda Compare June 26, 2023 20:27
@d-netto
Copy link
Member Author

d-netto commented Jun 27, 2023

Bump.

@vchuravy vchuravy added the status:merge me PR is reviewed. Merge when all tests are passing label Jun 27, 2023
@d-netto d-netto merged commit 9dc2991 into JuliaLang:master Jun 28, 2023
5 of 6 checks passed
@d-netto d-netto removed the status:merge me PR is reviewed. Merge when all tests are passing label Jun 28, 2023
d-netto added a commit to RelationalAI/julia that referenced this pull request Aug 6, 2023
Implements concurrent sweeping of fully empty pages.

Concurrent sweeping is disabled by default and may be enabled through the --gcthreads flag.

Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants