New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait until engine is started up when acquiring searcher #7456
Conversation
LGTM |
Today we have a small window where a searcher can be acquired but the engine is in the state of starting up. This causes a NPE triggering a shard failure if we are fast enough. This commit fixes this situation gracefully. Closes elastic#7455
We are seeing an issue with v1.3.2 which may be related to this. Would really appreciate if someone could confirm it's indeed related or not.. After node restart, sometimes one of the shard's recovery is indefinitely stuck with no updates after the trace
|
@bleskes can you confirm that #7456 (comment) is a symptom of this (fixed) issue? |
@clintongormley this bug caused an NPE (because a variable was not set yet). The stack trace is indicating that the recovery is in it's translog phase, meaning the engine has already started. @ajhalani it looks like you use a lot of delete-by-query operations. Is that true? Those are replayed when you recover the shard, which takes longer then simple indexing (and we refresh after them by default, which is not needed during recovery - a potential optimization we can do here). |
Thanks for response. @bleskes - you're right. My issue didn't seem to be related to this. It was indeed due to slowness in recovery from translog. In hot threads dump, I see frequent refresh and flush, and it was much slower than regular indexing.. |
Today we have a small window where a searcher can be acquired but the
engine is in the state of starting up. This causes a NPE triggering a
shard failure if we are fast enough. This commit fixes this situation
gracefully.
Closes #7455