Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set global checkpoint before open engine from store #27972

Merged
merged 1 commit into from Dec 23, 2017

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Dec 23, 2017

In PR #27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes #27970

In PR elastic#27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes elastic#27970
@dnhatn dnhatn added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >bug v6.2.0 v7.0.0 labels Dec 23, 2017
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good that the assertion caught this. This PR essentially reverts a small change that's part of #27965, which is the simplest thing to do right now to get CI back to green (I can't think of any other simple fix). It further shows that InternalEngine needs some refactoring, the information flow is difficult to reason about right now.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dnhatn for picking this up. I intentionally changed this to rely on the validation in engine that the checkpoint actually means something and that the translog is the translog we want to read from. Sadly this has unexpected consequences as we expose the engine before we fully bootstrap from it. I don't like it but I think it's the easiest for now to do what you did.

@ywelsch ywelsch merged commit 436a243 into elastic:master Dec 23, 2017
ywelsch pushed a commit that referenced this pull request Dec 23, 2017
In PR #27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes #27970
@dnhatn
Copy link
Member Author

dnhatn commented Dec 23, 2017

Thanks @ywelsch and @bleskes for reviewing.

@dnhatn dnhatn deleted the set-gcp-before-open-engine branch December 23, 2017 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v6.2.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] Global checkpoint assertion violated
4 participants