Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate downloads from action execution to provide more download options #12665

Closed
17 tasks done
coeuvre opened this issue Dec 9, 2020 · 4 comments
Closed
17 tasks done
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@coeuvre
Copy link
Member

coeuvre commented Dec 9, 2020

In the current implementation of remote execution, outputs can only be downloaded when an action executes. We should
separate downloads from action execution so that we can provide more download options to users. For example:

  1. Provide a download API so that other code can download outputs of previously executed actions when necessary. Use cases:
    • Currently, we download the outputs after executed an action remotely and wait for the downloads to complete before starting execution of other actions. We can change to download the outputs of current action and start executing next action in the same time. This will improve the performance of full-downloads mode a lot.
    • Some files downloaded by the minimal/toplevel mode are symlink pointing to files we didn't download. There are some issues root-caused by this behavior. We can provide options to download the actual files when reading the symlink. This will fix most of bugs related to symlink produced by remote execution.
    • Dynamic execution can download outputs of previously remotely executed action in the local branch.
  2. Add a post-build step for downloads so that we can download files which are only known after the build complete. Use cases:
    • We can combine this with aspects and BEP to download files required by IDE after the build.
    • Tests require some of the outputs to run. There are some issues related to running tests with top-level mode which should be fixed by this change.
  3. Add a command line flag to only download arbitrary output files. Use cases:
    • Allow to download anything of the output tree without going through the build steps

We need a way to specify which files to download after the build (option 2) or with the download flag (option 3). For example:

  • Reuse the flags --remote_download_toplevel to only download output files of top level targets and --remote_download_minimal to only download necessary files.
  • Add a new flag which accept a list of targets whose output files should be downloaded.
  • I am not sure how to specify download files at file level. Is that useful (and can be implemented) to support regex search over the output tree? (like rg over a directory)

Options for downloading the content of a symlink file:

  • Provide a flag to control whether the download should follow the symlinks, this applies to all 3 download options above
  • When the code is reading the content of a symlink file (e.g. in the RemoteActionFS), we can follow the symlink using option 1
  • When other tools are reading the content of a symlink (e.g. IDE), the tool can use option 3 to download the file.
  • When users want to read the content of a symlink, they can use option 3 to download the file manually.

Related Issues

@coeuvre coeuvre self-assigned this Dec 9, 2020
@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request labels Dec 9, 2020
@coeuvre
Copy link
Member Author

coeuvre commented Dec 9, 2020

Gibson proposed that we can add --remote_download_important flag to download files that are important. #12656

bazel-io pushed a commit that referenced this issue Mar 16, 2021
Before this change, when remote build without bytes is enabled, intermediate outputs which are also inputs to local actions will be downloaded for local execution. After build finished, these downloaded files are deleted to keep Bazel's view of the output base identical with the output base i.e. files that Bazel thinks exist only remotely actually do. However, these intermediate outputs maybe used later in which case Bazel need to download them again which is a performance issue. #12855

This change fix the issue by removing the code used to delete downloaded files. Bazel should be able to take the local file as the source of truth if it exists (otherwise, it is a bug).

This is also an essential step to implement separation of downloads. #12665

PiperOrigin-RevId: 363131672
bazel-io pushed a commit that referenced this issue Aug 4, 2021
…ing action cache

This PR extends `ActionCache.Entry` to store output metadata by having a map of <Path, Metadata>. This map is updated after action execution when we update action cache so that metadata of all outputs of the action are saved. Before checking the action cache (when executing actions), we will load the output metadata into output store if it is remote and the correspondingly local one is missing.

With this change, remote output metadata is saved to disk so build without bytes can use them among server restarts.
 We can also download outputs after action execution since remote output metadata can be accessed outside.

Part of #12665.

Fixes #8248.

Closes #13604.

PiperOrigin-RevId: 388586691
glukasiknuro pushed a commit to glukasiknuro/bazel that referenced this issue Sep 28, 2021
Before this change, when remote build without bytes is enabled, intermediate outputs which are also inputs to local actions will be downloaded for local execution. After build finished, these downloaded files are deleted to keep Bazel's view of the output base identical with the output base i.e. files that Bazel thinks exist only remotely actually do. However, these intermediate outputs maybe used later in which case Bazel need to download them again which is a performance issue. bazelbuild#12855

This change fix the issue by removing the code used to delete downloaded files. Bazel should be able to take the local file as the source of truth if it exists (otherwise, it is a bug).

This is also an essential step to implement separation of downloads. bazelbuild#12665

PiperOrigin-RevId: 363131672
@purkhusid
Copy link

Has there been any recent work on this? It is very unfortunate that it's not possible to use --remote_download_minimal for the build/test part and then --remote_download_outputs=all or --remote_download_toplevel for running executable targets (e.g. deployment targets) at the end of the CI pipeline.

@coeuvre
Copy link
Member Author

coeuvre commented Jun 1, 2022

Yes, I am working on similar features for the internal remote executor. Most of changes that are necessary for this feature to work are already open sourced. but there are still some changes to be made for the remote module. I will continue this work after the work for the internal one is done.

@purkhusid
Copy link

Awesome! Looking forward to the improvements in this area!

copybara-service bot pushed a commit that referenced this issue Oct 24, 2022
Previously, with `--remote_download_toplevel`, Bazel only downloads toplevel outputs during spawn execution. It has a drawback that, if the toplevel targets are changed in a following invocation, but the generating actions are not able to be executed because of skyframe/action cache, the outputs of new toplevel targets are not downloaded.

`ToplevelArtifactsDownloader` fixes that issue by listening to the `TargetCompleteEvent` (which is fired every time after the toplevel target is built event if it hit the cache) and download outputs in the background. Additionally, it can listen to more events during the build hence is more flexible to define additional outputs to be downloaded as toplevel outputs.

Working towards #12665.

Fixes #13625.
Fixes #11834.
Fixes #10525.

Closes #16524.

PiperOrigin-RevId: 483368093
Change-Id: I2184cbbb1d54548498eaa6caa07055a9336fd97e
@coeuvre coeuvre closed this as completed Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

2 participants