Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: sstable missing from EAR registry #106617

Closed
jbowens opened this issue Jul 11, 2023 · 6 comments · Fixed by #107249
Closed

storage: sstable missing from EAR registry #106617

jbowens opened this issue Jul 11, 2023 · 6 comments · Fixed by #107249
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-release-21.2 Used to mark technical advisories for 21.2 branch-release-22.1 Used to mark release blockers and technical advisories for 22.1 branch-release-22.2 Used to mark release blockers and technical advisories for 22.2 branch-release-23.1 Used to mark GA and release blockers and technical advisories for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team
Projects

Comments

@jbowens
Copy link
Collaborator

jbowens commented Jul 11, 2023

Internal discussion
https://cockroachlabs.slack.com/archives/C057ULDSKC0/p1689098898636449?thread_ts=1688047254.186649&cid=C057ULDSKC0

Jira issue: CRDB-29644

Epic: CRDB-26603

@jbowens jbowens added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Jul 11, 2023
@jbowens jbowens added this to Incoming in Storage via automation Jul 11, 2023
@jbowens jbowens added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jul 11, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 11, 2023

Hi @jbowens, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@jbowens jbowens moved this from Incoming to Investigations in Storage Jul 11, 2023
@itsbilal itsbilal self-assigned this Jul 13, 2023
@itsbilal
Copy link
Member

Taking a look. From code-reading, nothing has particularly stood out to me as an obvious bug in the file registry code or so, around deletions or additions of files.

The missing sstable 67105381 is not mentioned in the latest file registry at all, and I don't see it in a dump of each individual record either - so it must have been added/removed either before the last rotation of the file registry, or was never added at all. I used a modified build of Cockroach to decrypt the manifest without having the rest of the store, and I can see the file being added to the manifest in the second newest manifest. And the file is still linked in the newest Version, so it shouldn't be deleted yet.

The remaining questions:

  1. Did this store always have EAR, or was EAR enabled somewhat recently?
  2. Are there any other things to be aware of about this store (eg. fsync disabled, unique file system)?

@sumeerbhola
Copy link
Collaborator

We have a bug in the file registry which dates from the start of the record writer based registry in 239377a that can lose entries.

The writeToRegistryFile first writes the new batch, and then if the file is too big, creates a new registry file

if r.mu.registryWriter.Size() > maxRegistrySize {
if err := r.createNewRegistryFile(); err != nil {
return errors.Wrap(err, "rotating registry file")
}
}
. The new registry file is initialized with the map. But the map has not yet been updated to include the effect of the batch since that update happens after writeToRegistryFile returns, by calling applyBatch
if err := r.writeToRegistryFile(batch); err != nil {
panic(err)
}
r.applyBatch(batch)
.

So the update provided by the batch gets lost (it is not recorded persistently).
We probably did not discover this earlier if most lost updates were deletes, or when there were new files, those files happened to be GC'd due to compactions before the CockroachDB node was restarted.

@sumeerbhola sumeerbhola self-assigned this Jul 19, 2023
@itsbilal
Copy link
Member

Oh wow, nice catch @sumeerbhola. I read that code multiple times over and didn't catch the bug.

@sumeerbhola
Copy link
Collaborator

I read that code multiple times over and didn't catch the bug.

I didn't either. A new test caught it.

@jbowens jbowens moved this from Investigations to In Progress (this milestone) in Storage Jul 19, 2023
craig bot pushed a commit that referenced this issue Jul 20, 2023
105474: concurrency: hoist TxnMeta from {,un}replicatedLockInfo into the holder  r=nvanbenschoten a=arulajmani

Previously, we were storing the TxnMeta separately for both
{,un}replicatedLockInfo. The lock table is quite dumb when it comes to
replicated locks -- for good reason. As a result, tracking TxnMeta's
for replicated locks isn't of much use, as we don't make use of it,
and this patch does exactly that. This also allows us to hoist the
TxnMeta object one level higher, into the holder struct.

Epic: none

Release note: None

107211: ci: add some retries for `git fetch`es r=rail a=rickystewart

Rarely these can fail. Add retries.

Closes #107087

Epic: none
Releae note: None

107231: server: fix recently introduced bug r=yuzefovich a=yuzefovich

Over in 609230c on `TestTenant.TracerI` we returned the function rather than the underlying tracer.

Epic: None

Release note: None

107249: storage: fix PebbleFileRegistry bug that drops entry on rollover r=jbowens a=sumeerbhola

The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file.

This bug has existed since the record writer based registry was implemented 239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened.

Epic: none

Fixes: #106617

Release note: None

107259: changefeedccl: Treat drop descriptors as terminal r=miretskiy a=miretskiy

Prior to this change, changefeed would sometimes treat droped descriptors (ErrDroppedDescriptor) as a terminal error, and sometimes it would be treated as a retryable error (though, upon retry, the error would be upgraded to terminal).

This PR cleans up this logic and ensures that any
"dropped descriptor" error is treated as terminal.

Informs https://github.com/cockroachlabs/support/issues/2408 
Release note: None

Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
@craig craig bot closed this as completed in 0e5d38a Jul 20, 2023
Storage automation moved this from In Progress (this milestone) to Done Jul 20, 2023
blathers-crl bot pushed a commit that referenced this issue Jul 20, 2023
The writeToRegistryFile method first writes the new batch, containing file
mappings, to the registry file, and then if the registry file is too big,
creates a new registry file. The new registry file is populated with the
contents of the map, which doesn't yet contain the edits in the batch,
resulting in a loss of these edits when the file registry is reopened. This
PR changes the logic to first rollover if the registry file is too big,
and then writes the batch to the new file.

This bug has existed since the record writer based registry was implemented
239377a.
When it leads to a loss of a file mapping in the registry, it will be
noticed by Pebble as a corruption (so not a silent failure) since the file
corresponding to the mapping will be assumed to be unencrypted, but can't
be successfully read as an unencrypted file. Since we have not seen this
occur in production settings, we suspect that an observable mapping loss
is rare because compactions typically rewrite the files in those lost
mappings before the file registry is reopened.

Epic: none

Fixes: #106617

Release note: None
blathers-crl bot pushed a commit that referenced this issue Jul 20, 2023
The writeToRegistryFile method first writes the new batch, containing file
mappings, to the registry file, and then if the registry file is too big,
creates a new registry file. The new registry file is populated with the
contents of the map, which doesn't yet contain the edits in the batch,
resulting in a loss of these edits when the file registry is reopened. This
PR changes the logic to first rollover if the registry file is too big,
and then writes the batch to the new file.

This bug has existed since the record writer based registry was implemented
239377a.
When it leads to a loss of a file mapping in the registry, it will be
noticed by Pebble as a corruption (so not a silent failure) since the file
corresponding to the mapping will be assumed to be unencrypted, but can't
be successfully read as an unencrypted file. Since we have not seen this
occur in production settings, we suspect that an observable mapping loss
is rare because compactions typically rewrite the files in those lost
mappings before the file registry is reopened.

Epic: none

Fixes: #106617

Release note: None
blathers-crl bot pushed a commit that referenced this issue Jul 20, 2023
The writeToRegistryFile method first writes the new batch, containing file
mappings, to the registry file, and then if the registry file is too big,
creates a new registry file. The new registry file is populated with the
contents of the map, which doesn't yet contain the edits in the batch,
resulting in a loss of these edits when the file registry is reopened. This
PR changes the logic to first rollover if the registry file is too big,
and then writes the batch to the new file.

This bug has existed since the record writer based registry was implemented
239377a.
When it leads to a loss of a file mapping in the registry, it will be
noticed by Pebble as a corruption (so not a silent failure) since the file
corresponding to the mapping will be assumed to be unencrypted, but can't
be successfully read as an unencrypted file. Since we have not seen this
occur in production settings, we suspect that an observable mapping loss
is rare because compactions typically rewrite the files in those lost
mappings before the file registry is reopened.

Epic: none

Fixes: #106617

Release note: None
@itsbilal itsbilal removed their assignment Jul 31, 2023
@jbowens
Copy link
Collaborator Author

jbowens commented Jul 31, 2023

Summary: When the encryption-at-rest registry rolls over (at 128 MiB), the entry that triggered the rollover is lost. This can result in the loss of persistence of encryption metadata for up to 1 file whose creation triggered the rollover. If the file is not deleted (eg, compacted or rotated out) before the node's next restart, the restarted process considers the file to be corrupt, resulting in the loss of the store.

This bug has existed since 21.2 but we believe we've observed at least three occurrences in the past three months: this issue in a roachprod cluster, support#2340, and the CC incident on 2023-07-28.

Xiang-Gu pushed a commit to Xiang-Gu/cockroach that referenced this issue Aug 2, 2023
The writeToRegistryFile method first writes the new batch, containing file
mappings, to the registry file, and then if the registry file is too big,
creates a new registry file. The new registry file is populated with the
contents of the map, which doesn't yet contain the edits in the batch,
resulting in a loss of these edits when the file registry is reopened. This
PR changes the logic to first rollover if the registry file is too big,
and then writes the batch to the new file.

This bug has existed since the record writer based registry was implemented
cockroachdb@239377a.
When it leads to a loss of a file mapping in the registry, it will be
noticed by Pebble as a corruption (so not a silent failure) since the file
corresponding to the mapping will be assumed to be unencrypted, but can't
be successfully read as an unencrypted file. Since we have not seen this
occur in production settings, we suspect that an observable mapping loss
is rare because compactions typically rewrite the files in those lost
mappings before the file registry is reopened.

Epic: none

Fixes: cockroachdb#106617

Release note: None
@rytaft rytaft added C-technical-advisory Caused a technical advisory branch-release-22.1 Used to mark release blockers and technical advisories for 22.1 branch-release-22.2 Used to mark release blockers and technical advisories for 22.2 branch-release-23.1 Used to mark GA and release blockers and technical advisories for 23.1 branch-release-21.2 Used to mark technical advisories for 21.2 labels Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-release-21.2 Used to mark technical advisories for 21.2 branch-release-22.1 Used to mark release blockers and technical advisories for 22.1 branch-release-22.2 Used to mark release blockers and technical advisories for 22.2 branch-release-23.1 Used to mark GA and release blockers and technical advisories for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team
Projects
Storage
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants