-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outlier Detection for non-error status codes #18789
Comments
Seems reasonable to me, I don't think this would be that hard to do |
Sounds good. I will implement this. I think that the API should be extended to define status codes considered as errors, so one can specify exact codes which will cause a node to be considered an outlier. |
@cpakulski wanted to check if there are any plans to support the above? this would be amazingly helpful feature.. " I think that the API should be extended to define status codes considered as errors, so one can specify exact codes which will cause a node to be considered an outlier." this will be really helpful, for cases like lets say we want to eject for all 5xx except 502 for some reason or something like that if required 🙏 |
@gauravojha I still plan to work on this. Your example with excepting 502 is a very good point. Please keep an eye on this issue and I should land a PR within few weeks. |
@cpakulski Any progress on this pls? 🙂 |
I wrote a proposal and coded working prototype some time ago. Then it was put on hold but I plan to open a formal PR within next month. |
Title: Support outlier detection of other status codes (particularly 4xx).
Description:
Outliers can be hosts returning an abnormal rate of any status code, not just 5xx. Although 4xx errors are generally considered client errors, if a host starts returning a large number of 4xx, it may signal it has some problem (possibly related to authz, authn, etc) and should be considered an outlier. At Pinterest, we are interested in being able to identify 4xx outliers in addition to 5xx outliers (although I can imagine this could have a general solution for all 300+ status codes).
[optional Relevant Links:]
The text was updated successfully, but these errors were encountered: