New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It is still not possible to set flushed frontier to a value with a lower OpId #784
Comments
yugabyte-ci
pushed a commit
that referenced
this issue
Jan 23, 2019
… edit Summary: This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier" is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft operations written to a memtable and subsequently flushed, and history cutoff of the latest compaction. A "flushed frontier" value is maintained for every version edit, as well as for every new file in a version edit. The old flushed frontier was still present in per-file metadata in version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too high and no corresponding log record was found. This resulted in a CHECK failure similar to the following in an integration test: ``` F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033 ``` This was preceded by the following message in the log: ``` E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> } ``` This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster). Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version edits that forcibly override the flushed frontier with other version edits. That could result in unpredictable and unclear behavior. Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow compactions anymore until we've finished setting the flushed frontier value so that it would have the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions. If compactions were enabled right away, a compaction that started before the flushed frontier was force-updated could install a version edit with per-file flushed frontier metadata that is still based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group to be too high, causing the same CHECK failure during the next local bootstrap. Other changes: - Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition). - History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in a boolean context). Relevant commits / GitHub issues: - 760cc91 (#773) - 23f6010 (#704) - a390c7a (#730) Test Plan: Jenkins Reviewers: sergei Reviewed By: sergei Subscribers: bharat, kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6041
Fixed in ac4846f |
mbautin
added a commit
that referenced
this issue
Jul 11, 2019
…ed to the earlier commit ac4846f Original commit message: [#784] Correctly set flushed frontier to the given value in a version edit Summary: This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier" is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft operations written to a memtable and subsequently flushed, and history cutoff of the latest compaction. A "flushed frontier" value is maintained for every version edit, as well as for every new file in a version edit. The old flushed frontier was still present in per-file metadata in version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too high and no corresponding log record was found. This resulted in a CHECK failure similar to the following in an integration test: ``` F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033 ``` This was preceded by the following message in the log: ``` E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> } ``` This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster). Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version edits that forcibly override the flushed frontier with other version edits. That could result in unpredictable and unclear behavior. Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow compactions anymore until we've finished setting the flushed frontier value so that it would have the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions. If compactions were enabled right away, a compaction that started before the flushed frontier was force-updated could install a version edit with per-file flushed frontier metadata that is still based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group to be too high, causing the same CHECK failure during the next local bootstrap. Other changes: - Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition). - History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in a boolean context). Relevant commits / GitHub issues: - 760cc91 (#773) - 23f6010 (#704) - a390c7a (#730) Test Plan: Jenkins Reviewers: sergei Reviewed By: sergei Subscribers: bharat, kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6041
mbautin
added a commit
to mbautin/yugabyte-db
that referenced
this issue
Jul 16, 2019
… version edit Summary: This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier" is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft operations written to a memtable and subsequently flushed, and history cutoff of the latest compaction. A "flushed frontier" value is maintained for every version edit, as well as for every new file in a version edit. The old flushed frontier was still present in per-file metadata in version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too high and no corresponding log record was found. This resulted in a CHECK failure similar to the following in an integration test: ``` F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033 ``` This was preceded by the following message in the log: ``` E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> } ``` This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster). Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version edits that forcibly override the flushed frontier with other version edits. That could result in unpredictable and unclear behavior. Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow compactions anymore until we've finished setting the flushed frontier value so that it would have the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions. If compactions were enabled right away, a compaction that started before the flushed frontier was force-updated could install a version edit with per-file flushed frontier metadata that is still based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group to be too high, causing the same CHECK failure during the next local bootstrap. Other changes: - Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition). - History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in a boolean context). Relevant commits / GitHub issues: - yugabyte@760cc91 (yugabyte#773) - yugabyte@23f6010 (yugabyte#704) - yugabyte@a390c7a (yugabyte#730) Test Plan: Jenkins Reviewers: sergei Reviewed By: sergei Subscribers: bharat, kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6041 Note: This commit provides additional functionality that is logically related to the earlier commit yugabyte@ac4846f and supersedes the commit yugabyte@afcc804
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The issue is that we are still writing largest user frontier values to per-new-file records within a version edit. We need to remove it from there.
The text was updated successfully, but these errors were encountered: