-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More graceful handling of failure in DescribeInstances (ec2-discovery) #28452
Comments
@azsolinsky How did you determine for this to be the cause? Looking at |
Ok, how did you determine no caching is happening if it returns an empty list? If empty list is returned the SIngleObjectCache will certainly cache it; only if the response were null would it not be cached; the code doesn't distinguish between empty list and and non-empty list; why do you think it will not cache the empty list? (EDIT: I see now that the needsRefresh() was overridden, thus making it refresh on the next call and preventing it from effectively being cached -- I missed this in my initial inspection of the code) |
@azsolinsky The |
Ok. Won't it still use the empty list for that call though as opposed to using the last good response? Suppose it stays throttled for 30 seconds?
On Feb 7, 2018, at 2:59 PM, Ryan Ernst <notifications@github.com<mailto:notifications@github.com>> wrote:
@azsolinsky<https://github.com/azsolinsky> The empty variable is set if the returned list is empty, which means needsRefresh() will always return true.
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#28452 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AVJ-1IWzS8F-YAyQHEjVo0xfjLQiihUqks5tSiqvgaJpZM4RzWpi>.
|
Also that means there is a potential issue with retrying the call too much without exponential back off.
Jeff S
On Feb 7, 2018, at 2:59 PM, Ryan Ernst <notifications@github.com<mailto:notifications@github.com>> wrote:
@azsolinsky<https://github.com/azsolinsky> The empty variable is set if the returned list is empty, which means needsRefresh() will always return true.
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#28452 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AVJ-1IWzS8F-YAyQHEjVo0xfjLQiihUqks5tSiqvgaJpZM4RzWpi>.
|
I'm not saying the current behavior is good or correct, just that it will not "cache the empty list" as was originally stated in this issue. |
Ok, your point is taken. I took for granted that needsRefresh was being overridden, and thought it would not refresh on a second call. I updated the title. However, I think the behavior during throttling still needs to be addressed to protect the service from spikes calling the describe instance call API in the case of throttling. |
@azsolinsky I've marked this as an |
I have interest, but I'm not sure about my availability; I'll have to get back on it in the near future. |
@azsolinsky thanks for your interest in improving elasticsearch. We discussed this internally and do not currently intend to work on this in the foreseeable future. I will therefore close this issue. If you still have an interest in improving this, feel free to open a pull request for it. |
If the DescribeInstances call fails from the EC2 Discovery plugin for any reason, the code just returns an empty list of nodes.
This is bad because the code currently caches it until the refresh interval expires.This is bad because the code uses the empty list of nodes immediately, and will try to make the call again on the next get, which potentially doesn't include any retry back-off.https://github.com/elastic/elasticsearch/blob/139deb535a58de87c602888a121b2791bcd22df2/plugins/discovery-ec2/src/main/java/org/elasticsearch/discovery/ec2/AwsEc2UnicastHostsProvider.java#L106:L119
~~With the default refresh of 10s this is sometimes not catastrophic; however, if throttling is happening a lot it can potentially cause the masters to not be able to communicate with one another and lead to cluster instability. ~~
Also, with this bug, increasing the refresh interval is dangerous because the empty results list is cached until the refresh interval expires.The code should probably not return empty list if it is being throttled and continue to use the list from the last successful call, or possibly retry more with exponential back-off for throttling exceptions.The text was updated successfully, but these errors were encountered: