Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is still not possible to set flushed frontier to a value with a lower OpId #784

Closed
mbautin opened this issue Jan 23, 2019 · 1 comment
Assignees
Labels
kind/bug This issue is a bug

Comments

@mbautin
Copy link
Collaborator

mbautin commented Jan 23, 2019

The issue is that we are still writing largest user frontier values to per-new-file records within a version edit. We need to remove it from there.

yugabyte-ci pushed a commit that referenced this issue Jan 23, 2019
… edit

Summary:
This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier
to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier"
is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft
operations written to a memtable and subsequently flushed, and history cutoff of the latest
compaction. A "flushed frontier" value is maintained for every version edit, as well as for every
new file in a version edit. The old flushed frontier was still present in per-file metadata in
version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to
subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too
high and no corresponding log record was found. This resulted in a CHECK failure similar to the
following in an integration test:

```
F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033
```

This was preceded by the following message in the log:

```
E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> }
```

This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of
files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies
that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do
when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster).

Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version
edits that forcibly override the flushed frontier with other version edits. That could result in
unpredictable and unclear behavior.

Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow
compactions anymore until we've finished setting the flushed frontier value so that it would have
the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions.
If compactions were enabled right away, a compaction that started before the flushed frontier was
force-updated could install a version edit with per-file flushed frontier metadata that is still
based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group
to be too high, causing the same CHECK failure during the next local bootstrap.

Other changes:
- Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition).
- History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in
  a boolean context).

Relevant commits / GitHub issues:
- 760cc91 (#773)
- 23f6010 (#704)
- a390c7a (#730)

Test Plan: Jenkins

Reviewers: sergei

Reviewed By: sergei

Subscribers: bharat, kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6041
@kmuthukk kmuthukk added the kind/bug This issue is a bug label Jan 30, 2019
@kmuthukk kmuthukk added this to To Do in YBase features via automation Jan 30, 2019
@mbautin
Copy link
Collaborator Author

mbautin commented Jan 30, 2019

Fixed in ac4846f

@mbautin mbautin closed this as completed Jan 30, 2019
YBase features automation moved this from To Do to Done Jan 30, 2019
mbautin added a commit that referenced this issue Jul 11, 2019
…ed to the

earlier commit ac4846f

Original commit message:

[#784] Correctly set flushed frontier to the given value in a version edit

Summary:
This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier
to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier"
is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft
operations written to a memtable and subsequently flushed, and history cutoff of the latest
compaction. A "flushed frontier" value is maintained for every version edit, as well as for every
new file in a version edit. The old flushed frontier was still present in per-file metadata in
version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to
subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too
high and no corresponding log record was found. This resulted in a CHECK failure similar to the
following in an integration test:

```
F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033
```

This was preceded by the following message in the log:

```
E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> }
```

This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of
files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies
that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do
when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster).

Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version
edits that forcibly override the flushed frontier with other version edits. That could result in
unpredictable and unclear behavior.

Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow
compactions anymore until we've finished setting the flushed frontier value so that it would have
the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions.
If compactions were enabled right away, a compaction that started before the flushed frontier was
force-updated could install a version edit with per-file flushed frontier metadata that is still
based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group
to be too high, causing the same CHECK failure during the next local bootstrap.

Other changes:
- Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition).
- History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in
  a boolean context).

Relevant commits / GitHub issues:
- 760cc91 (#773)
- 23f6010 (#704)
- a390c7a (#730)

Test Plan: Jenkins

Reviewers: sergei

Reviewed By: sergei

Subscribers: bharat, kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6041
mbautin added a commit to mbautin/yugabyte-db that referenced this issue Jul 16, 2019
… version edit

Summary:
This revision rectifies an earlier omission in how we were trying to force-set the flushed frontier
to a particular value when restoring a snapshot into a new Raft group. Just a reminder, a "frontier"
is a YugaByte extension to RocksDB that is a tuple consisting of OpId and hybrid time of Raft
operations written to a memtable and subsequently flushed, and history cutoff of the latest
compaction. A "flushed frontier" value is maintained for every version edit, as well as for every
new file in a version edit. The old flushed frontier was still present in per-file metadata in
version edits, and was restored from there in VersionSet::Recover, causing local tablet bootstrap to
subsequently fail, because the apparent committed OpId (coming from the flushed frontier) was too
high and no corresponding log record was found. This resulted in a CHECK failure similar to the
following in an integration test:

```
F0122 22:13:58.233659 28338 tablet_bootstrap.cc:839] Check failed: state.prev_op_id.term() >= state.committed_op_id.term() && state.prev_op_id.index() >= state.committed_op_id.index() Last: term: 1 index: 4, committed: 1.21033
```

This was preceded by the following message in the log:

```
E0122 22:13:58.216117 28338 version_edit.cc:324] Flushed frontier is present but has to be updated with data from file boundary: flushed_frontier={ op_id: { term: 1 index: 3 } hybrid_time: { physical: 1548195133903792 logical: 1 } history_cutoff: <invalid> }, a file's larget user frontier: { op_id: { term: 1 index: 21033 } hybrid_time: { physical: 1548194599166361 } history_cutoff: <invalid> }
```

This revision is also adding a new test, DocDBTest.ForceFlushedFrontier, that flushes a number of
files into RocksDB, then tries to decrease the flushed frontier in the "force" mode and verifies
that this operation succeeds, both before and after reopening RocksDB. This is similar to what we do
when restoring a snapshot into a different Raft group (e.g. when restoring into a new cluster).

Another change in this revision is that VersionSet::LogAndApply should not attempt to batch version
edits that forcibly override the flushed frontier with other version edits. That could result in
unpredictable and unclear behavior.

Yet another change to the snapshot restoring behavior (Enterprise Edition) is that we don't allow
compactions anymore until we've finished setting the flushed frontier value so that it would have
the correct OpId of the target tablet. After that we use RocksDB's function to enable compactions.
If compactions were enabled right away, a compaction that started before the flushed frontier was
force-updated could install a version edit with per-file flushed frontier metadata that is still
based on the source Raft group's OpIds, and bump up the flushed OpId of the destination Raft group
to be too high, causing the same CHECK failure during the next local bootstrap.

Other changes:
- Using `LOG_WITH_PREFIX` instead of `LOG` in functions that create/restore snapshots (Enterprise Edition).
- History cutoff set to `HybridTime::kMin` should be treated as `HybridTime::kInvalid` ("falsey" in
  a boolean context).

Relevant commits / GitHub issues:
- yugabyte@760cc91 (yugabyte#773)
- yugabyte@23f6010 (yugabyte#704)
- yugabyte@a390c7a (yugabyte#730)

Test Plan: Jenkins

Reviewers: sergei

Reviewed By: sergei

Subscribers: bharat, kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6041

Note:
This commit provides additional functionality that is logically related to
the earlier commit yugabyte@ac4846f
and supersedes the commit yugabyte@afcc804
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This issue is a bug
Projects
Development

No branches or pull requests

2 participants