Skip to content

FR: Share extracted external libraries across workspaces #12227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dan-cohn opened this issue Oct 7, 2020 · 10 comments
Open

FR: Share extracted external libraries across workspaces #12227

dan-cohn opened this issue Oct 7, 2020 · 10 comments
Assignees
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@dan-cohn
Copy link

dan-cohn commented Oct 7, 2020

ATTENTION! Please read and follow:

  • if this is a question about how to build / test / query / deploy using Bazel, or a discussion starter, send it to bazel-discuss@googlegroups.com
  • if this is a bug or feature request, fill the form below as best as you can.

Description of the problem / feature request:

I would like Bazel to be able to share extracted repos across workspaces. In other words, the "repository cache" should go deeper than simply making downloaded archives available in a centralized location. It would help save time and space if the unpacked contents were also shared.

Feature requests: what underlying problem are you trying to solve with this feature?

Bazel's repository cache is shared across workspaces, which avoids downloading the same external libraries multiple times. This saves a lot of time and network utilization, especially in CI environments where builds occur across many different repos and versions. The problem is that only the original downloaded artifacts are shared and not the extracted files. This means that Bazel has to repeatedly unzip/unpack archives. This takes a lot of time, consumes CPU, and fills the filesystem with multiple copies of the exact same extracted libraries.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

N/A

What operating system are you running Bazel on?

RHEL 7/8 and CentOS 7/8

What's the output of bazel info release?

release 3.4.1

If bazel info release returns "development version" or "(@Non-Git)", tell us how you built Bazel.

N/A

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

No.

Any other information, logs, or outputs that you want to share?

N/A

@lberki lberki added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Oct 8, 2020
@sventiffe sventiffe added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website untriaged labels Oct 8, 2020
@philwo philwo added P2 We'll consider working on this in future. (Assignee optional) type: feature request and removed untriaged labels Oct 8, 2020
@philwo philwo assigned Wyverald and meteorcloudy and unassigned philwo Oct 8, 2020
@Wyverald
Copy link
Member

Hi dan-cohn, we're working on a new design for Bazel's external dependencies, and this idea is being incorporated into the new design. Please consider joining the external-deps@bazel.build mailing list or following along on the #external-deps channel on Slack.

@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
@tim-chaplin-dd
Copy link

Hi dan-cohn, we're working on a new design for Bazel's external dependencies, and this idea is being incorporated into the new design. Please consider joining the external-deps@bazel.build mailing list or following along on the #external-deps channel on Slack.

Hello from the future. It sounds like this new design ended up being Bzlmod, but from what I can tell it doesn't address the problem described here since the unpacked repos still live in the external subdir alongside the ones declared in WORKSPACE. Please let me know if I'm wrong about that.

@Wyverald Wyverald added the area-Bzlmod Bzlmod-specific PRs, issues, and feature requests label May 8, 2023
@Wyverald Wyverald added this to Bzlmod Jul 12, 2023
@Wyverald Wyverald added P1 I'll work on this now. (Assignee required) and removed P2 We'll consider working on this in future. (Assignee optional) labels Jul 12, 2023
@Wyverald Wyverald added this to the 7.0.0 branch cut milestone Jul 13, 2023
@meteorcloudy meteorcloudy removed their assignment Jul 21, 2023
@Wyverald Wyverald moved this to In Progress in Bzlmod Jul 26, 2023
@Wyverald
Copy link
Member

I'm currently working on a design for this. It's trickier than expected so it's taking a bit more time than I'd like, but hopefully I'll be able to share it this week.

@Wyverald
Copy link
Member

Update: unfortunately the doc is still not quite ready, and I'll be on leave next week. So this will need to wait until September.

@Wyverald
Copy link
Member

Wyverald commented Sep 7, 2023

The design is ready for review here: https://docs.google.com/document/d/1ZScqiIQi9l7_8eikbsGI-rjupdbCI7wgm1RYD76FJfM/edit

It's, uh, a bit longer than I expected.

@meteorcloudy meteorcloudy removed this from the 7.0.0 branch cut milestone Sep 19, 2023
@LittleCuteBug
Copy link

Hi, is there any update on this feature?

@Wyverald Wyverald added this to the 7.1.0 release blockers milestone Nov 16, 2023
@iancha1992 iancha1992 removed this from the 7.1.0 release blockers milestone Nov 29, 2023
@iancha1992
Copy link
Member

@bazel-io fork 7.1.0

@keertk keertk added this to the 7.1.0 release blockers milestone Nov 29, 2023
Wyverald added a commit that referenced this issue Feb 2, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.
copybara-service bot pushed a commit that referenced this issue Feb 6, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Closes #21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Wyverald added a commit that referenced this issue Feb 20, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Closes #21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Wyverald added a commit that referenced this issue Feb 20, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Closes #21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
@Wyverald Wyverald removed this from the Mainline issues targeted for 7.2.0 milestone Apr 30, 2024
@Wyverald
Copy link
Member

Too late for 7.2.0; punting to later.

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Aug 27, 2024
@antspy
Copy link

antspy commented Feb 12, 2025

Hi,

Any new updates on this feature? Would be really cool

@Wyverald
Copy link
Member

Hi there -- I'm planning to work on this in Q1/Q2 of this year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
Status: In Progress
Development

No branches or pull requests