Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--experimental_remote_cache_async hangs forever when using a HTTP remote cache #19273

Closed
vogelsgesang opened this issue Aug 18, 2023 · 3 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@vogelsgesang
Copy link

vogelsgesang commented Aug 18, 2023

Description of the bug:

When combining --experimental_remote_cache_async with a HTTP remote cache, Bazel sometimes hangs after the build is finished with the output

Waiting for remote cache: 1 upload; 1458s

The output is still updating (so it's not completely hanging), but there is no observable progress.
While the example below reproduces the issue for a bazel test invocation, the same issue also occurs for bazel build.

The issue does not appear if:

  • using grpc instead of HTTP for the remote cache
  • calling bazel clean --expunge right before the test invocation.

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

The bug is not deterministic, so it might need a couple of tries.

The following script usually reproduces the hang after 2-5 iterations of the for loop. Execute it in a newly created, empty directory

wget https://github.com/buchgr/bazel-remote/releases/download/v2.4.1/bazel-remote-2.4.1-linux-x86_64
chmod +x bazel-remote-2.4.1-linux-x86_64
./bazel-remote-2.4.1-linux-x86_64 --dir cache-dir --max_size=1 --http_address=0.0.0.0:8080 &

touch WORKSPACE
cat > BUILD.bazel << EOF
sh_test(
   name = "test",
   srcs = ["test.sh"],
)
EOF

bazelisk clean --expunge

for i in `seq 1 100`; do
   echo "try $i"
   # If you enable the following line, the issue disappears
   # bazelisk clean --expunge
   # Recreate `test.sh` with a random comment in it, so we don't get cache hits
   timestamp=`date +"%s"`
   echo "# ${timestamp}" > test.sh
   echo "exit 0" >> test.sh
   chmod +x test.sh
   # This will hang after a couple of tries
   bazelisk test :test --remote_cache=http://localhost:8080 --experimental_remote_cache_async
done

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 6.3.2

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

not inside any git repository

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

afaict, not a regression

Have you found anything relevant by searching the web?

@Pavank1992 Pavank1992 added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Aug 18, 2023
@joeleba joeleba added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Aug 22, 2023
@coeuvre
Copy link
Member

coeuvre commented Sep 7, 2023

Might be related to #18296.

@jmelahman
Copy link
Contributor

Just wanna mention this also occurs even when setting an explicit timeout like --remote_timeout=600

@coeuvre
Copy link
Member

coeuvre commented Jun 3, 2024

Should be fixed by a804fb1. The fix will be available in 7.2.

@coeuvre coeuvre closed this as completed Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

7 participants