It is still not possible to set flushed frontier to a value with a lower OpId #784

mbautin · 2019-01-23T09:30:38Z

The issue is that we are still writing largest user frontier values to per-new-file records within a version edit. We need to remove it from there.

… edit Summary: This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier" is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft operations written to a memtable and subsequently flushed, and history cutoff of the latest compaction. A "flushed frontier" value is maintained for every version edit, as well as for every new file in a version edit. The old flushed frontier was still present in per-file metadata in version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too high and no corresponding log record was found. This resulted in a CHECK failure similar to the following in an integration test: ``` F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033 ``` This was preceded by the following message in the log: ``` E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> } ``` This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster). Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version edits that forcibly override the flushed frontier with other version edits. That could result in unpredictable and unclear behavior. Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow compactions anymore until we've finished setting the flushed frontier value so that it would have the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions. If compactions were enabled right away, a compaction that started before the flushed frontier was force-updated could install a version edit with per-file flushed frontier metadata that is still based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group to be too high, causing the same CHECK failure during the next local bootstrap. Other changes: - Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition). - History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in a boolean context). Relevant commits / GitHub issues: - 760cc91 (#773) - 23f6010 (#704) - a390c7a (#730) Test Plan: Jenkins Reviewers: sergei Reviewed By: sergei Subscribers: bharat, kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6041

mbautin · 2019-01-30T00:38:22Z

Fixed in ac4846f

…ed to the earlier commit ac4846f Original commit message: [#784] Correctly set flushed frontier to the given value in a version edit Summary: This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier" is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft operations written to a memtable and subsequently flushed, and history cutoff of the latest compaction. A "flushed frontier" value is maintained for every version edit, as well as for every new file in a version edit. The old flushed frontier was still present in per-file metadata in version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too high and no corresponding log record was found. This resulted in a CHECK failure similar to the following in an integration test: ``` F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033 ``` This was preceded by the following message in the log: ``` E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> } ``` This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster). Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version edits that forcibly override the flushed frontier with other version edits. That could result in unpredictable and unclear behavior. Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow compactions anymore until we've finished setting the flushed frontier value so that it would have the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions. If compactions were enabled right away, a compaction that started before the flushed frontier was force-updated could install a version edit with per-file flushed frontier metadata that is still based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group to be too high, causing the same CHECK failure during the next local bootstrap. Other changes: - Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition). - History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in a boolean context). Relevant commits / GitHub issues: - 760cc91 (#773) - 23f6010 (#704) - a390c7a (#730) Test Plan: Jenkins Reviewers: sergei Reviewed By: sergei Subscribers: bharat, kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6041

… version edit Summary: This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier" is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft operations written to a memtable and subsequently flushed, and history cutoff of the latest compaction. A "flushed frontier" value is maintained for every version edit, as well as for every new file in a version edit. The old flushed frontier was still present in per-file metadata in version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too high and no corresponding log record was found. This resulted in a CHECK failure similar to the following in an integration test: ``` F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033 ``` This was preceded by the following message in the log: ``` E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> } ``` This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster). Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version edits that forcibly override the flushed frontier with other version edits. That could result in unpredictable and unclear behavior. Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow compactions anymore until we've finished setting the flushed frontier value so that it would have the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions. If compactions were enabled right away, a compaction that started before the flushed frontier was force-updated could install a version edit with per-file flushed frontier metadata that is still based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group to be too high, causing the same CHECK failure during the next local bootstrap. Other changes: - Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition). - History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in a boolean context). Relevant commits / GitHub issues: - yugabyte@760cc91 (yugabyte#773) - yugabyte@23f6010 (yugabyte#704) - yugabyte@a390c7a (yugabyte#730) Test Plan: Jenkins Reviewers: sergei Reviewed By: sergei Subscribers: bharat, kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6041 Note: This commit provides additional functionality that is logically related to the earlier commit yugabyte@ac4846f and supersedes the commit yugabyte@afcc804

kmuthukk assigned mbautin Jan 30, 2019

kmuthukk added the kind/bug This issue is a bug label Jan 30, 2019

kmuthukk added this to To Do in YBase features via automation Jan 30, 2019

mbautin closed this as completed Jan 30, 2019

YBase features automation moved this from To Do to Done Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It is still not possible to set flushed frontier to a value with a lower OpId #784

It is still not possible to set flushed frontier to a value with a lower OpId #784

mbautin commented Jan 23, 2019 •

edited

mbautin commented Jan 30, 2019

It is still not possible to set flushed frontier to a value with a lower OpId #784

It is still not possible to set flushed frontier to a value with a lower OpId #784

Comments

mbautin commented Jan 23, 2019 • edited

mbautin commented Jan 30, 2019

mbautin commented Jan 23, 2019 •

edited