Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that downloads are cancelled on repo rule restart #22748

Closed
wants to merge 1 commit into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Jun 14, 2024

When a repository rule is restarted due to memory pressure, all its downloads must be interrupted. While this happened for repository_ctx.download, the download_and_extract method did not register its PendingTask. Since downloads happen on a separate executor, they would continue even through restarts, leading to warnings (see below) or even left over download temp directories.

WARNING: Download from https://github.com/uutils/coreutils/releases/download/0.0.26/coreutils-0.0.26-aarch64-apple-darwin.tar.gz failed: class java.io.FileNotFoundException /private/var/tmp/_bazel_fmeum/0465a0857e26a35c072f48ab751a1794/external/aspect_bazel_lib~~toolchains~coreutils_darwin_arm64/temp2576567839043404337/coreutils-0.0.26-aarch64-apple-darwin.tar.gz (No such file or directory)

@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Jun 14, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Jun 14, 2024

I could reproduce this reliably on the bazel-examples frontend project on my Mac with:

rm -rf repo_cache && bazel-dev --host_jvm_args=-Xmx256m --nosystem_rc --nohome_rc clean --expunge && bazel-dev --host_jvm_args=-Xmx160m --nosystem_rc --nohome_rc build --repository_cache=repo_cache //...

@fmeum fmeum requested review from meteorcloudy and Wyverald and removed request for meteorcloudy June 14, 2024 11:37
@fmeum
Copy link
Collaborator Author

fmeum commented Jun 14, 2024

This might solve #21773, not sure though. Should we cherry-pick this into 7.2.1?

When a repository rule is restarted due to memory pressure, all its downloads must be interrupted. While this happened for `repository_ctx.download`, the `download_and_extract` method did not register its `PendingTask`. Since downloads happen on a separate executor, they would continue even through restarts, leading to warnings (see below) or even left over download temp directories.

```
WARNING: Download from https://github.com/uutils/coreutils/releases/download/0.0.26/coreutils-0.0.26-aarch64-apple-darwin.tar.gz failed: class java.io.FileNotFoundException /private/var/tmp/_bazel_fmeum/0465a0857e26a35c072f48ab751a1794/external/aspect_bazel_lib~~toolchains~coreutils_darwin_arm64/temp2576567839043404337/coreutils-0.0.26-aarch64-apple-darwin.tar.gz (No such file or directory)
```
@fmeum
Copy link
Collaborator Author

fmeum commented Jun 14, 2024

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jun 14, 2024
Copy link
Member

@meteorcloudy meteorcloudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@meteorcloudy meteorcloudy added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Jun 14, 2024
@moroten
Copy link
Contributor

moroten commented Jun 14, 2024

Running with the reproducing example above on my code base, I still get the same java.io.FileNotFoundException as in the description. The profile also shows parallel downloads of the same repository as described in #21773.

I agree that missing registering some cancellations is most probably the root cause. Thank you for looking into it @fmeum. My workaround is just to use --experimental_worker_for_repo_fetching=off.

@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Jun 14, 2024
@fmeum fmeum deleted the fix-repo-races branch June 14, 2024 18:04
@fmeum
Copy link
Collaborator Author

fmeum commented Jun 14, 2024

@bazel-io fork 7.3.0

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 14, 2024

@bazel-io fork 7.2.1

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jun 14, 2024
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Jun 14, 2024
When a repository rule is restarted due to memory pressure, all its downloads must be interrupted. While this happened for `repository_ctx.download`, the `download_and_extract` method did not register its `PendingTask`. Since downloads happen on a separate executor, they would continue even through restarts, leading to warnings (see below) or even left over download temp directories.

```
WARNING: Download from https://github.com/uutils/coreutils/releases/download/0.0.26/coreutils-0.0.26-aarch64-apple-darwin.tar.gz failed: class java.io.FileNotFoundException /private/var/tmp/_bazel_fmeum/0465a0857e26a35c072f48ab751a1794/external/aspect_bazel_lib~~toolchains~coreutils_darwin_arm64/temp2576567839043404337/coreutils-0.0.26-aarch64-apple-darwin.tar.gz (No such file or directory)
```

Closes bazelbuild#22748.

PiperOrigin-RevId: 643389876
Change-Id: Ia2cab4b4d73340379c17d4a1e5c327ff43505050
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Jun 14, 2024
When a repository rule is restarted due to memory pressure, all its downloads must be interrupted. While this happened for `repository_ctx.download`, the `download_and_extract` method did not register its `PendingTask`. Since downloads happen on a separate executor, they would continue even through restarts, leading to warnings (see below) or even left over download temp directories.

```
WARNING: Download from https://github.com/uutils/coreutils/releases/download/0.0.26/coreutils-0.0.26-aarch64-apple-darwin.tar.gz failed: class java.io.FileNotFoundException /private/var/tmp/_bazel_fmeum/0465a0857e26a35c072f48ab751a1794/external/aspect_bazel_lib~~toolchains~coreutils_darwin_arm64/temp2576567839043404337/coreutils-0.0.26-aarch64-apple-darwin.tar.gz (No such file or directory)
```

Closes bazelbuild#22748.

PiperOrigin-RevId: 643389876
Change-Id: Ia2cab4b4d73340379c17d4a1e5c327ff43505050
@moroten
Copy link
Contributor

moroten commented Jun 14, 2024

Just to clarify, the problem is still there after this PR.

github-merge-queue bot pushed a commit that referenced this pull request Jun 17, 2024
)

When a repository rule is restarted due to memory pressure, all its
downloads must be interrupted. While this happened for
`repository_ctx.download`, the `download_and_extract` method did not
register its `PendingTask`. Since downloads happen on a separate
executor, they would continue even through restarts, leading to warnings
(see below) or even left over download temp directories.

```
WARNING: Download from https://github.com/uutils/coreutils/releases/download/0.0.26/coreutils-0.0.26-aarch64-apple-darwin.tar.gz failed: class java.io.FileNotFoundException /private/var/tmp/_bazel_fmeum/0465a0857e26a35c072f48ab751a1794/external/aspect_bazel_lib~~toolchains~coreutils_darwin_arm64/temp2576567839043404337/coreutils-0.0.26-aarch64-apple-darwin.tar.gz (No such file or directory)
```

Closes #22748.

PiperOrigin-RevId: 643389876
Change-Id: Ia2cab4b4d73340379c17d4a1e5c327ff43505050

Commit
2fb187f

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
github-merge-queue bot pushed a commit that referenced this pull request Jun 17, 2024
)

When a repository rule is restarted due to memory pressure, all its
downloads must be interrupted. While this happened for
`repository_ctx.download`, the `download_and_extract` method did not
register its `PendingTask`. Since downloads happen on a separate
executor, they would continue even through restarts, leading to warnings
(see below) or even left over download temp directories.

```
WARNING: Download from https://github.com/uutils/coreutils/releases/download/0.0.26/coreutils-0.0.26-aarch64-apple-darwin.tar.gz failed: class java.io.FileNotFoundException /private/var/tmp/_bazel_fmeum/0465a0857e26a35c072f48ab751a1794/external/aspect_bazel_lib~~toolchains~coreutils_darwin_arm64/temp2576567839043404337/coreutils-0.0.26-aarch64-apple-darwin.tar.gz (No such file or directory)
```

Closes #22748.

PiperOrigin-RevId: 643389876
Change-Id: Ia2cab4b4d73340379c17d4a1e5c327ff43505050

Commit
2fb187f

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
copybara-service bot pushed a commit that referenced this pull request Jun 19, 2024
`StarlarkBaseExternalContext` now implements `AutoCloseable` and, in `close()`:
1. Cancels all pending async tasks.
2. Awaits their termination.
3. Cleans up the working directory (always for module extensions, on failure for repo rules).
4. Fails if there were pending async tasks in an otherwise successful evaluation.

Previously, module extensions didn't do any of those. Repo rules did 1 and 4 and sometimes 3, but not in all cases.

This change required replacing the fixed-size thread pool in `DownloadManager` with virtual threads, thereby resolving a TODO about not using a fixed-size thread pool for the `GrpcRemoteDownloader`.

Work towards #22680
Work towards #22748

Closes #22772.

PiperOrigin-RevId: 644669599
Change-Id: Ib71e5bf346830b92277ac2bd473e11c834cb2624
fmeum added a commit to fmeum/bazel that referenced this pull request Jun 20, 2024
`StarlarkBaseExternalContext` now implements `AutoCloseable` and, in `close()`:
1. Cancels all pending async tasks.
2. Awaits their termination.
3. Cleans up the working directory (always for module extensions, on failure for repo rules).
4. Fails if there were pending async tasks in an otherwise successful evaluation.

Previously, module extensions didn't do any of those. Repo rules did 1 and 4 and sometimes 3, but not in all cases.

This change required replacing the fixed-size thread pool in `DownloadManager` with virtual threads, thereby resolving a TODO about not using a fixed-size thread pool for the `GrpcRemoteDownloader`.

Work towards bazelbuild#22680
Work towards bazelbuild#22748

Closes bazelbuild#22772.

PiperOrigin-RevId: 644669599
Change-Id: Ib71e5bf346830b92277ac2bd473e11c834cb2624
github-merge-queue bot pushed a commit that referenced this pull request Jun 20, 2024
…22814)

`StarlarkBaseExternalContext` now implements `AutoCloseable` and, in
`close()`:
1. Cancels all pending async tasks.
2. Awaits their termination.
3. Cleans up the working directory (always for module extensions, on
failure for repo rules).
4. Fails if there were pending async tasks in an otherwise successful
evaluation.

Previously, module extensions didn't do any of those. Repo rules did 1
and 4 and sometimes 3, but not in all cases.

This change required replacing the fixed-size thread pool in
`DownloadManager` with virtual threads, thereby resolving a TODO about
not using a fixed-size thread pool for the `GrpcRemoteDownloader`.

Work towards #22680
Work towards #22748

Closes #22772

PiperOrigin-RevId: 644669599
Change-Id: Ib71e5bf346830b92277ac2bd473e11c834cb2624

Closes #22775
@iancha1992
Copy link
Member

iancha1992 commented Jun 21, 2024

The changes in this PR have been included in Bazel 7.2.1 RC2. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.2.1rc2. Thanks!

fmeum added a commit to fmeum/bazel that referenced this pull request Jun 25, 2024
`StarlarkBaseExternalContext` now implements `AutoCloseable` and, in `close()`:
1. Cancels all pending async tasks.
2. Awaits their termination.
3. Cleans up the working directory (always for module extensions, on failure for repo rules).
4. Fails if there were pending async tasks in an otherwise successful evaluation.

Previously, module extensions didn't do any of those. Repo rules did 1 and 4 and sometimes 3, but not in all cases.

This change required replacing the fixed-size thread pool in `DownloadManager` with virtual threads, thereby resolving a TODO about not using a fixed-size thread pool for the `GrpcRemoteDownloader`.

Work towards bazelbuild#22680
Work towards bazelbuild#22748

Closes bazelbuild#22772.

PiperOrigin-RevId: 644669599
Change-Id: Ib71e5bf346830b92277ac2bd473e11c834cb2624
github-merge-queue bot pushed a commit that referenced this pull request Jun 25, 2024
…22883)

`StarlarkBaseExternalContext` now implements `AutoCloseable` and, in
`close()`:
1. Cancels all pending async tasks.
2. Awaits their termination.
3. Cleans up the working directory (always for module extensions, on
failure for repo rules).
4. Fails if there were pending async tasks in an otherwise successful
evaluation.

Previously, module extensions didn't do any of those. Repo rules did 1
and 4 and sometimes 3, but not in all cases.

This change required replacing the fixed-size thread pool in
`DownloadManager` with virtual threads, thereby resolving a TODO about
not using a fixed-size thread pool for the `GrpcRemoteDownloader`.

Work towards #22680
Work towards #22748

Closes #22772.

PiperOrigin-RevId: 644669599
Change-Id: Ib71e5bf346830b92277ac2bd473e11c834cb2624

Closes #22776
fmeum added a commit to fmeum/bazel that referenced this pull request Jul 18, 2024
…azelbuild#22814)

`StarlarkBaseExternalContext` now implements `AutoCloseable` and, in
`close()`:
1. Cancels all pending async tasks.
2. Awaits their termination.
3. Cleans up the working directory (always for module extensions, on
failure for repo rules).
4. Fails if there were pending async tasks in an otherwise successful
evaluation.

Previously, module extensions didn't do any of those. Repo rules did 1
and 4 and sometimes 3, but not in all cases.

This change required replacing the fixed-size thread pool in
`DownloadManager` with virtual threads, thereby resolving a TODO about
not using a fixed-size thread pool for the `GrpcRemoteDownloader`.

Work towards bazelbuild#22680
Work towards bazelbuild#22748

Closes bazelbuild#22772

PiperOrigin-RevId: 644669599
Change-Id: Ib71e5bf346830b92277ac2bd473e11c834cb2624

Closes bazelbuild#22775
oliviernotteghem pushed a commit to uber-common/bazel that referenced this pull request Aug 13, 2024
…azelbuild#22814)

`StarlarkBaseExternalContext` now implements `AutoCloseable` and, in
`close()`:
1. Cancels all pending async tasks.
2. Awaits their termination.
3. Cleans up the working directory (always for module extensions, on
failure for repo rules).
4. Fails if there were pending async tasks in an otherwise successful
evaluation.

Previously, module extensions didn't do any of those. Repo rules did 1
and 4 and sometimes 3, but not in all cases.

This change required replacing the fixed-size thread pool in
`DownloadManager` with virtual threads, thereby resolving a TODO about
not using a fixed-size thread pool for the `GrpcRemoteDownloader`.

Work towards bazelbuild#22680
Work towards bazelbuild#22748

Closes bazelbuild#22772

PiperOrigin-RevId: 644669599
Change-Id: Ib71e5bf346830b92277ac2bd473e11c834cb2624

Closes bazelbuild#22775
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants