New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: sstable missing from EAR registry #106617
Comments
Hi @jbowens, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Taking a look. From code-reading, nothing has particularly stood out to me as an obvious bug in the file registry code or so, around deletions or additions of files. The missing sstable The remaining questions:
|
We have a bug in the file registry which dates from the start of the record writer based registry in 239377a that can lose entries. The cockroach/pkg/storage/pebble_file_registry.go Lines 426 to 430 in 30acaf9
writeToRegistryFile returns, by calling applyBatch cockroach/pkg/storage/pebble_file_registry.go Lines 379 to 382 in 30acaf9
So the update provided by the batch gets lost (it is not recorded persistently). |
Oh wow, nice catch @sumeerbhola. I read that code multiple times over and didn't catch the bug. |
I didn't either. A new test caught it. |
105474: concurrency: hoist TxnMeta from {,un}replicatedLockInfo into the holder r=nvanbenschoten a=arulajmani Previously, we were storing the TxnMeta separately for both {,un}replicatedLockInfo. The lock table is quite dumb when it comes to replicated locks -- for good reason. As a result, tracking TxnMeta's for replicated locks isn't of much use, as we don't make use of it, and this patch does exactly that. This also allows us to hoist the TxnMeta object one level higher, into the holder struct. Epic: none Release note: None 107211: ci: add some retries for `git fetch`es r=rail a=rickystewart Rarely these can fail. Add retries. Closes #107087 Epic: none Releae note: None 107231: server: fix recently introduced bug r=yuzefovich a=yuzefovich Over in 609230c on `TestTenant.TracerI` we returned the function rather than the underlying tracer. Epic: None Release note: None 107249: storage: fix PebbleFileRegistry bug that drops entry on rollover r=jbowens a=sumeerbhola The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file. This bug has existed since the record writer based registry was implemented 239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened. Epic: none Fixes: #106617 Release note: None 107259: changefeedccl: Treat drop descriptors as terminal r=miretskiy a=miretskiy Prior to this change, changefeed would sometimes treat droped descriptors (ErrDroppedDescriptor) as a terminal error, and sometimes it would be treated as a retryable error (though, upon retry, the error would be upgraded to terminal). This PR cleans up this logic and ensures that any "dropped descriptor" error is treated as terminal. Informs https://github.com/cockroachlabs/support/issues/2408 Release note: None Co-authored-by: Arul Ajmani <arulajmani@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file. This bug has existed since the record writer based registry was implemented 239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened. Epic: none Fixes: #106617 Release note: None
The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file. This bug has existed since the record writer based registry was implemented 239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened. Epic: none Fixes: #106617 Release note: None
The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file. This bug has existed since the record writer based registry was implemented 239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened. Epic: none Fixes: #106617 Release note: None
Summary: When the encryption-at-rest registry rolls over (at 128 MiB), the entry that triggered the rollover is lost. This can result in the loss of persistence of encryption metadata for up to 1 file whose creation triggered the rollover. If the file is not deleted (eg, compacted or rotated out) before the node's next restart, the restarted process considers the file to be corrupt, resulting in the loss of the store. This bug has existed since 21.2 but we believe we've observed at least three occurrences in the past three months: this issue in a roachprod cluster, support#2340, and the CC incident on 2023-07-28. |
The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file. This bug has existed since the record writer based registry was implemented cockroachdb@239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened. Epic: none Fixes: cockroachdb#106617 Release note: None
Internal discussion
https://cockroachlabs.slack.com/archives/C057ULDSKC0/p1689098898636449?thread_ts=1688047254.186649&cid=C057ULDSKC0
Jira issue: CRDB-29644
Epic: CRDB-26603
The text was updated successfully, but these errors were encountered: