-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object: introduce a "curl --max-time 3" in a rgw readiness probe #14144
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PLease add a proper commit message and PR description
https://rook.io/docs/rook/latest-release/Contributing/development-flow/#commit-structure
# --insecure - don't validate ssl if using secure port only | ||
# --silent - don't output progress info | ||
# --output /dev/stderr - output HTML header to stdout (good for debugging) | ||
# --write-out '%{response_code}' - print the HTTP response code to stdout | ||
curl --insecure --silent --output /dev/stderr --write-out '%{response_code}' "$URL" | ||
curl --max-time 3 --insecure --silent --output /dev/stderr --write-out '%{response_code}' "$URL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 3 sec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the healthcheck itself has default timeoutSeconds: 5
. 3 seconds is enough to detect the network problem and stop curl process right away + some more secs for bash script to process the script when the node is not responsive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bash script itself should be able to process the curl result in under a second, so I'm not too worried about the trivial extra time there.
The default curl timeout is more than 2 minutes and may cause a number of abandeoned curl processes within a network hiccup. This fix limits the curl to check the endpoint within 3 seconds. Signed-off-by: kayrus <kay.diam@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the object.yaml the user can configure the readiness and startup probe timeouts. For example, they might specify:
healthCheck:
# Configure the pod probes for the rgw daemon
startupProbe:
disabled: false
probe:
timeoutSeconds: 3
readinessProbe:
disabled: false
probe:
timeoutSeconds: 3
How about implementing this so the curl max-time is set to the timeoutSeconds
if set in the probe? And at the same time we could also have a smart default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to be sure that the best solution is indeed the curl --max-time flag. When I was implementing the RGW probe, I remember looking into that flag and rejecting it, but I forget my rationale at present.
# --insecure - don't validate ssl if using secure port only | ||
# --silent - don't output progress info | ||
# --output /dev/stderr - output HTML header to stdout (good for debugging) | ||
# --write-out '%{response_code}' - print the HTTP response code to stdout | ||
curl --insecure --silent --output /dev/stderr --write-out '%{response_code}' "$URL" | ||
curl --max-time 3 --insecure --silent --output /dev/stderr --write-out '%{response_code}' "$URL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to parameterize the --max-time
value instead of hard-coding it. When we have had hard-coded things like this in previous probes, it has led to situations where some users are encountering an error state that they are unable to work around.
Let's add a PROBE_TIMEOUT="{{ .Timeout }}
value to the beginning of this script and use --max-time "$PROBE_TIMEOUT"
. Because the bash portion of the script should process in under a second, I'm not worried about a few hundred milliseconds of time where something extra may be running.
This will require passing the configured probe timeout to the template config in .go code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to update the commit title and body, and the PR description to reflect that the probe is no longer hard-coded to 3 seconds.
@@ -17,11 +17,12 @@ RGW_URL="$PROBE_PROTOCOL://0.0.0.0:$PROBE_PORT" | |||
|
|||
function check() { | |||
local URL="$1" | |||
# --max-time 3 - override the default 2 minutes wait time down to 3 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to make sure we capture the reason why we are using this flag in this case, instead of just stating what the flag does.
# --max-time 3 - override the default 2 minutes wait time down to 3 seconds | |
# --max-time - curl doesn't always respond to SIGTERM on timeout and can become zombie; make sure it times out |
Issue resolved by this Pull Request:
Resolves #14143
Checklist: