-
Notifications
You must be signed in to change notification settings - Fork 4.2k
FR: Share extracted external libraries across workspaces #12227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi dan-cohn, we're working on a new design for Bazel's external dependencies, and this idea is being incorporated into the new design. Please consider joining the external-deps@bazel.build mailing list or following along on the #external-deps channel on Slack. |
Hello from the future. It sounds like this new design ended up being Bzlmod, but from what I can tell it doesn't address the problem described here since the unpacked repos still live in the |
I'm currently working on a design for this. It's trickier than expected so it's taking a bit more time than I'd like, but hopefully I'll be able to share it this week. |
Update: unfortunately the doc is still not quite ready, and I'll be on leave next week. So this will need to wait until September. |
The design is ready for review here: https://docs.google.com/document/d/1ZScqiIQi9l7_8eikbsGI-rjupdbCI7wgm1RYD76FJfM/edit It's, uh, a bit longer than I expected. |
Hi, is there any update on this feature? |
@bazel-io fork 7.1.0 |
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache). - The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line). - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact. - Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string. - We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do). Work towards #20952 and #12227.
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache). - The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line). - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact. - Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string. - We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do). Work towards #20952 and #12227. Closes #21182. PiperOrigin-RevId: 604522692 Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache). - The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line). - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact. - Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string. - We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do). Work towards #20952 and #12227. Closes #21182. PiperOrigin-RevId: 604522692 Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache). - The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line). - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact. - Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string. - We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do). Work towards #20952 and #12227. Closes #21182. PiperOrigin-RevId: 604522692 Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Too late for 7.2.0; punting to later. |
Hi, Any new updates on this feature? Would be really cool |
Hi there -- I'm planning to work on this in Q1/Q2 of this year. |
Description of the problem / feature request:
I would like Bazel to be able to share extracted repos across workspaces. In other words, the "repository cache" should go deeper than simply making downloaded archives available in a centralized location. It would help save time and space if the unpacked contents were also shared.
Feature requests: what underlying problem are you trying to solve with this feature?
Bazel's repository cache is shared across workspaces, which avoids downloading the same external libraries multiple times. This saves a lot of time and network utilization, especially in CI environments where builds occur across many different repos and versions. The problem is that only the original downloaded artifacts are shared and not the extracted files. This means that Bazel has to repeatedly unzip/unpack archives. This takes a lot of time, consumes CPU, and fills the filesystem with multiple copies of the exact same extracted libraries.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
N/A
What operating system are you running Bazel on?
RHEL 7/8 and CentOS 7/8
What's the output of
bazel info release
?release 3.4.1
If
bazel info release
returns "development version" or "(@Non-Git)", tell us how you built Bazel.N/A
What's the output of
git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
?N/A
Have you found anything relevant by searching the web?
No.
Any other information, logs, or outputs that you want to share?
N/A
The text was updated successfully, but these errors were encountered: