Skip to content

Commit

Permalink
Improve retries in dev/run cluster setup
Browse files Browse the repository at this point in the history
Noticed we still get connection refused errors for nouveau tests
cluster setup tests. Try to handle retries for connection errors which
are thrown not just due to a status code. After retries are exhausted,
then fail as before.
  • Loading branch information
nickva committed Jan 7, 2024
1 parent ae36a14 commit a39917c
Showing 1 changed file with 17 additions and 11 deletions.
28 changes: 17 additions & 11 deletions dev/run
Original file line number Diff line number Diff line change
Expand Up @@ -1104,17 +1104,23 @@ def try_request(
):
while True:
conn = httpclient.HTTPConnection(host, port)
if headers is not None:
conn.request(meth, path, body=body, headers=headers)
else:
conn.request(meth, path, body=body)
resp = conn.getresponse()
if resp.status in success_codes:
result = (resp.status, resp.read())
resp.close()
return result
elif retries <= 0:
assert resp.status in success_codes, "%s%s" % (error, resp.read())
try:
if headers is not None:
conn.request(meth, path, body=body, headers=headers)
else:
conn.request(meth, path, body=body)
resp = conn.getresponse()
if resp.status in success_codes:
result = (resp.status, resp.read())
resp.close()
return result
elif retries <= 0:
assert resp.status in success_codes, "%s%s" % (error, resp.read())
except Exception as e:
if retries <= 0:
print("Connection failed %s %s" % (e, error))
raise e
print("Retrying ... %s " % e)
retries -= 1
time.sleep(retry_dt)

Expand Down

0 comments on commit a39917c

Please sign in to comment.