Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intern repository mapping entries #19269

Closed
wants to merge 2 commits into from
Closed

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Aug 17, 2023

All repositories generated by a module extension have the same repository mapping entries. Without interning, if a module extension generates N repositories, each such repository would have its own copy with (more than) N entries, resulting in overall memory usage quadratic in N.

We intern the entries, not the RepositoryMapping itself, since the latter also includes the owner repo, which differs between extension repos.

Output of bazel info used-heap-size-after-gc after running a build with a synthetic module extension generating N + 1 repos and requesting all of them:

before
N=100:     32MB
N=1000:    77MB
N=3000:   371MB
N=5000:   961MB
N=10000: 3614MB

after
N=100:     32MB
N=1000:    44MB
N=3000:    71MB
N=5000:    91MB
N=10000:  158MB

All repositories generated by a module extension have the same
repository mapping entries. Without interning, if a module extension
generates N repositories, each such repository would have its own copy
with (more than) N entries, resulting in memory usage that is quadratic
in N.

We intern the entries, not the RepositoryMapping itself, because the
latter also includes the owner repo, which differs between extension
repos.

Output of `bazel info used-heap-size-after-gc` after running a build
with a synthetic module extension generating N + 1 repos:

before
N=100:     32MB
N=1000:    77MB
N=3000:   371MB
N=5000:   961MB
N=10000: 3614MB

after
N=100:     32MB
N=1000:    44MB
N=3000:    71MB
N=5000:    91MB
N=10000:  158MB
@github-actions github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Aug 17, 2023
@fmeum
Copy link
Collaborator Author

fmeum commented Aug 17, 2023

@Wyverald What do you think, should this go into a patch release together with #19245?

@fmeum
Copy link
Collaborator Author

fmeum commented Aug 17, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Aug 17, 2023
Copy link
Member

@Wyverald Wyverald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thank you for the investigation and prompt fix!

@iancha1992
Copy link
Member

@bazel-io fork 6.4.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Aug 17, 2023
@iancha1992 iancha1992 added awaiting-review PR is awaiting review from an assigned reviewer and removed awaiting-review PR is awaiting review from an assigned reviewer labels Aug 18, 2023
@iancha1992
Copy link
Member

iancha1992 commented Aug 18, 2023

@fmeum I accidentally clicked on "Update branch" button, causing a merge commit to be made. Can you please reset to the original commit? My apologies.

cc: @bazelbuild/triage

@meteorcloudy
Copy link
Member

@iancha1992 I think you are still able to import the PR, you can fix the GIT_AUTHORS manually, I can help review ;)

@iancha1992 iancha1992 added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Aug 21, 2023
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Aug 22, 2023
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Aug 22, 2023
All repositories generated by a module extension have the same repository mapping entries. Without interning, if a module extension generates N repositories, each such repository would have its own copy with (more than) N entries, resulting in overall memory usage quadratic in N.

We intern the entries, not the `RepositoryMapping` itself, since the latter also includes the owner repo, which differs between extension repos.

Output of `bazel info used-heap-size-after-gc` after running a build with a synthetic module extension generating N + 1 repos and requesting all of them:

```
before
N=100:     32MB
N=1000:    77MB
N=3000:   371MB
N=5000:   961MB
N=10000: 3614MB

after
N=100:     32MB
N=1000:    44MB
N=3000:    71MB
N=5000:    91MB
N=10000:  158MB
```

Closes bazelbuild#19269.

PiperOrigin-RevId: 558940840
Change-Id: I07402f203b5f11bf448a1ae9e9ee4637ad4c536d
@fmeum fmeum deleted the intern-entries branch August 22, 2023 08:11
iancha1992 pushed a commit that referenced this pull request Aug 22, 2023
All repositories generated by a module extension have the same
repository mapping entries. Without interning, if a module extension
generates N repositories, each such repository would have its own copy
with (more than) N entries, resulting in overall memory usage quadratic
in N.

We intern the entries, not the `RepositoryMapping` itself, since the
latter also includes the owner repo, which differs between extension
repos.

Output of `bazel info used-heap-size-after-gc` after running a build
with a synthetic module extension generating N + 1 repos and requesting
all of them:

```
before
N=100:     32MB
N=1000:    77MB
N=3000:   371MB
N=5000:   961MB
N=10000: 3614MB

after
N=100:     32MB
N=1000:    44MB
N=3000:    71MB
N=5000:    91MB
N=10000:  158MB
```

Closes #19269.

Commit
74aadb2

PiperOrigin-RevId: 558940840
Change-Id: I07402f203b5f11bf448a1ae9e9ee4637ad4c536d

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
@iancha1992
Copy link
Member

The changes in this PR have been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants