Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow only downloading important outputs, e.g. with a --remote_download_important #12656

Closed
gibfahn opened this issue Dec 8, 2020 · 6 comments
Assignees
Labels
not stale Issues or PRs that are inactive but not considered stale P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@gibfahn
Copy link
Contributor

gibfahn commented Dec 8, 2020

tl;dr could we add a --remote_download_important flag that only downloads "important outputs".

Background

We maintain a Remote Execution cluster internally, and want to make it easy for users to get the benefits of remote execution without the pain-points.
When moving locally → remotely, all artifacts are now built on remote executors, and thus not not necessarily downloaded locally. Bazel has a number of flags to control the behaviour, and we need to set good defaults while making it easy for users to override them as needed.

Today we set -—remote_download_minimal by default in our shared bazelrc.

Bazel currently supports the following flags:

  • --remote_download_outputs=all → download everything
  • --remote_download_outputs=toplevel → download outputs of top level targets
  • --remote_download_outputs=minimal → download only what’s needed for local actions

There are also some flags that do the same but add some more in-memory options:

  • --remote_download_minimal--remote_download_output=minimal + write less to disk
  • --remote_download_toplevel--remote_download_output=toplevel + write less to disk

Requirements

Ideal

  1. Everything users need is available locally.
  2. No time is spent downloading things users don’t need.

Realistic

  1. Things users need are mostly automatically downloaded, and it’s easy to turn on downloading as needed.
  2. Minimal time is spent downloading things users don’t need, and it’s configurable.

The core user expectation is probably:

If I run bazel build //... I’m just checking things build, and if I run bazel test //... I’m just checking things test. If I build a specific target, e.g. bazel build src/foo:bar, I actually want to use that thing, so download it.
I might also want to combine these, e.g. bazel test //... src/foo:bar to only download src/foo:bar but test everything.

Either way give me defaults I can easily override on a one-off or per-project level.

Proposal

Default

So the default behaviour we want is that the output that Bazel prints to the screen is what will be downloaded:

User Runs Bazel Downloads
bazel build //... Nothing
bazel test //... Test output xml
bazel build //foo:bar //foo:bar main output

This is the same as what --remote_download_outputs=toplevel does, except when you pass a wildcard like src/....

We implement this by creating a new option:

--remote_download_important

Which would expand to:

Suggested name "important" comes from the fact that these are apparently called "important outputs" in Bazel (see #8739 (comment)).

Overrides

If I don’t want the default behaviour, I have the following options:

  • Fastest builds, download nothing: --remote_download_output=minimal
  • Fully populate local caches: --remote_download_output=all
  • Download all my artifacts: --remote_download_output=toplevel

This should satisfy all the common use-cases (and can be put in a local bazelrc to change the default).

@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request labels Dec 9, 2020
@coeuvre coeuvre self-assigned this Dec 9, 2020
@coeuvre coeuvre added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Dec 9, 2020
@coeuvre
Copy link
Member

coeuvre commented Dec 9, 2020

Hi Gibson, thanks for the proposal!

I am working on separating downloads from action execution #12665. It seems to me that the requirements of this proposal can be fulfilled with that change. Please take a look and share your ideas.

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 28, 2023
@coeuvre
Copy link
Member

coeuvre commented Apr 28, 2023

I think currently toplevel mode already only downloads important outputs. But I would like to keep this open until I have time to verify your use case is supported. It would be nice if you can check whether the rolling release fixes your issue.

@coeuvre coeuvre added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Apr 28, 2023
@gibfahn
Copy link
Contributor Author

gibfahn commented May 7, 2023

I think currently toplevel mode already only downloads important outputs. But I would like to keep this open until I have time to verify your use case is supported. It would be nice if you can check whether the rolling release fixes your issue.

When I filed this, bazel build --remote_download_outputs=toplevel //... would download all outputs, @coeuvre are you saying this behaviour has changed in the latest pre-release builds?

@coeuvre
Copy link
Member

coeuvre commented May 8, 2023

To be more specific, with command bazel build --remote_download_outputs=toplevel //..., all targets are toplevel targets, so Bazel will download all important outputs of all the targets.

@coeuvre
Copy link
Member

coeuvre commented Sep 7, 2023

I think with our recent changes, --remote_download_toplevel now works correctly as you have described. Closing.

@coeuvre coeuvre closed this as completed Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

2 participants