-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Description of the bug:
experimental_repository_downloader_retries setting seems to work for 500 error codes but not for 400. In the next section, I have provided a reproducible demonstration of this problem. It would be good to have the possibility of retrying 400 errors as well, to mitigate the following problems:
2025-04-30T04:51:17.4199017Z Error in download_and_extract: java.io.IOException: Error downloading [https://github.com/indygreg/python-build-standalone/releases/download/20241206/cpython-3.12.8+20241206-x86_64-pc-windows-msvc-shared-install_only.tar.gz] to /home/runner/.cache/bazel/_bazel_runner/206c3fd0b92f3df84e290ddf46260e45/external/rules_python++python+python_3_12_x86_64-pc-windows-msvc/temp10454833902719319927/cpython-3.12.8+20241206-x86_64-pc-windows-msvc-shared-install_only.tar.gz: GET returned 400 Bad Request
These seem to be random problems on GitHub’s side. I sometimes observe these issues in the CI pipeline we use in our organization—but it appears we are not the only ones experiencing them, as shown in these examples: https://drake-cdash.csail.mit.edu/builds/1851800 and https://drake-cdash.csail.mit.edu/builds/1851811/configure. In our CI pipelines, the issue typically subsides after a retry, so if experimental_repository_downloader_retries natively handled 400 errors, it would improve stability.
Which category does this issue belong to?
Core
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Create a "fake" Python server that returns a 400 error code for all requests (e.g., under ~/fakeserver.py)
from http.server import BaseHTTPRequestHandler, HTTPServer
RESPONSE_CODE = 400
class BadRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
print("get")
self.send_response(RESPONSE_CODE)
self.end_headers()
self.wfile.write(b"Bad Request")
def do_POST(self):
print("post")
self.send_response(RESPONSE_CODE)
self.end_headers()
self.wfile.write(b"Bad Request")
def do_PUT(self):
print("put")
self.send_response(RESPONSE_CODE)
self.end_headers()
self.wfile.write(b"Bad Request")
def do_DELETE(self):
print("delete")
self.send_response(RESPONSE_CODE)
self.end_headers()
self.wfile.write(b"Bad Request")
def do_HEAD(self):
print("head")
self.send_response(RESPONSE_CODE)
self.end_headers()
def log_message(self, format, *args):
# Suppress logging to keep the output clean
return
def run(server_class=HTTPServer, handler_class=BadRequestHandler, port=8080):
server_address = ('', port)
httpd = server_class(server_address, handler_class)
print(f"Starting server on port {port}...")
httpd.serve_forever()
if __name__ == "__main__":
run()
git clone https://github.com/bazelbuild/bazel.git
cd bazel
python ~/fakeserver.py # in the separate terminal tab
# in MODULE.bazel replace all occurences of "https://github.com" with "http://localhost:8080"
bazel build @jq_linux_arm64//file --experimental_repository_downloader_retries=20 # And observe in the fakeserver tab that there is only one log entry, not 20, which indicates the request was not retried
In comparison, you can flip RESPONSE_CODE = 400 in ~/fakeserver.py to RESPONSE_CODE = 500 then restart the server and run once again:
bazel build @jq_linux_arm64//file --experimental_repository_downloader_retries=20
You will see in the "fake server" logs that the retry is now happening, which proves that retries occur for 500 HTTP error codes but not for 400 HTTP error codes.
Which operating system are you running Bazel on?
Ubuntu 22.04
What is the output of bazel info release?
release 8.2.1
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse HEAD ?
If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
No response
Have you found anything relevant by searching the web?
No
Any other information, logs, or outputs that you want to share?
No response