storage: sstable missing from EAR registry #106617

jbowens · 2023-07-11T19:23:04Z

Internal discussion
https://cockroachlabs.slack.com/archives/C057ULDSKC0/p1689098898636449?thread_ts=1688047254.186649&cid=C057ULDSKC0

Jira issue: CRDB-29644

Epic: CRDB-26603

blathers-crl · 2023-07-11T19:23:14Z

Hi @jbowens, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

itsbilal · 2023-07-17T18:13:22Z

Taking a look. From code-reading, nothing has particularly stood out to me as an obvious bug in the file registry code or so, around deletions or additions of files.

The missing sstable 67105381 is not mentioned in the latest file registry at all, and I don't see it in a dump of each individual record either - so it must have been added/removed either before the last rotation of the file registry, or was never added at all. I used a modified build of Cockroach to decrypt the manifest without having the rest of the store, and I can see the file being added to the manifest in the second newest manifest. And the file is still linked in the newest Version, so it shouldn't be deleted yet.

The remaining questions:

Did this store always have EAR, or was EAR enabled somewhat recently?
Are there any other things to be aware of about this store (eg. fsync disabled, unique file system)?

sumeerbhola · 2023-07-19T19:08:22Z

We have a bug in the file registry which dates from the start of the record writer based registry in 239377a that can lose entries.

The writeToRegistryFile first writes the new batch, and then if the file is too big, creates a new registry file

cockroach/pkg/storage/pebble_file_registry.go

Lines 426 to 430 in 30acaf9

    
           if r.mu.registryWriter.Size() > maxRegistrySize { 
        
           	if err := r.createNewRegistryFile(); err != nil { 
        
           		return errors.Wrap(err, "rotating registry file") 
        
           	} 
        
           }

. The new registry file is initialized with the map. But the map has not yet been updated to include the effect of the batch since that update happens after writeToRegistryFile returns, by calling applyBatch

cockroach/pkg/storage/pebble_file_registry.go

Lines 379 to 382 in 30acaf9

    
           if err := r.writeToRegistryFile(batch); err != nil { 
        
           	panic(err) 
        
           } 
        
           r.applyBatch(batch)

.

So the update provided by the batch gets lost (it is not recorded persistently).
We probably did not discover this earlier if most lost updates were deletes, or when there were new files, those files happened to be GC'd due to compactions before the CockroachDB node was restarted.

itsbilal · 2023-07-19T19:09:51Z

Oh wow, nice catch @sumeerbhola. I read that code multiple times over and didn't catch the bug.

sumeerbhola · 2023-07-19T19:10:55Z

I read that code multiple times over and didn't catch the bug.

I didn't either. A new test caught it.

105474: concurrency: hoist TxnMeta from {,un}replicatedLockInfo into the holder r=nvanbenschoten a=arulajmani Previously, we were storing the TxnMeta separately for both {,un}replicatedLockInfo. The lock table is quite dumb when it comes to replicated locks -- for good reason. As a result, tracking TxnMeta's for replicated locks isn't of much use, as we don't make use of it, and this patch does exactly that. This also allows us to hoist the TxnMeta object one level higher, into the holder struct. Epic: none Release note: None 107211: ci: add some retries for `git fetch`es r=rail a=rickystewart Rarely these can fail. Add retries. Closes #107087 Epic: none Releae note: None 107231: server: fix recently introduced bug r=yuzefovich a=yuzefovich Over in 609230c on `TestTenant.TracerI` we returned the function rather than the underlying tracer. Epic: None Release note: None 107249: storage: fix PebbleFileRegistry bug that drops entry on rollover r=jbowens a=sumeerbhola The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file. This bug has existed since the record writer based registry was implemented 239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened. Epic: none Fixes: #106617 Release note: None 107259: changefeedccl: Treat drop descriptors as terminal r=miretskiy a=miretskiy Prior to this change, changefeed would sometimes treat droped descriptors (ErrDroppedDescriptor) as a terminal error, and sometimes it would be treated as a retryable error (though, upon retry, the error would be upgraded to terminal). This PR cleans up this logic and ensures that any "dropped descriptor" error is treated as terminal. Informs https://github.com/cockroachlabs/support/issues/2408 Release note: None Co-authored-by: Arul Ajmani <arulajmani@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>

The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file. This bug has existed since the record writer based registry was implemented 239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened. Epic: none Fixes: #106617 Release note: None

jbowens · 2023-07-31T15:09:20Z

Summary: When the encryption-at-rest registry rolls over (at 128 MiB), the entry that triggered the rollover is lost. This can result in the loss of persistence of encryption metadata for up to 1 file whose creation triggered the rollover. If the file is not deleted (eg, compacted or rotated out) before the node's next restart, the restarted process considers the file to be corrupt, resulting in the loss of the store.

This bug has existed since 21.2 but we believe we've observed at least three occurrences in the past three months: this issue in a roachprod cluster, support#2340, and the CC incident on 2023-07-28.

The writeToRegistryFile method first writes the new batch, containing file mappings, to the registry file, and then if the registry file is too big, creates a new registry file. The new registry file is populated with the contents of the map, which doesn't yet contain the edits in the batch, resulting in a loss of these edits when the file registry is reopened. This PR changes the logic to first rollover if the registry file is too big, and then writes the batch to the new file. This bug has existed since the record writer based registry was implemented cockroachdb@239377a. When it leads to a loss of a file mapping in the registry, it will be noticed by Pebble as a corruption (so not a silent failure) since the file corresponding to the mapping will be assumed to be unencrypted, but can't be successfully read as an unencrypted file. Since we have not seen this occur in production settings, we suspect that an observable mapping loss is rare because compactions typically rewrite the files in those lost mappings before the file registry is reopened. Epic: none Fixes: cockroachdb#106617 Release note: None

jbowens added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Jul 11, 2023

jbowens added this to Incoming in Storage via automation Jul 11, 2023

jbowens added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jul 11, 2023

jbowens moved this from Incoming to Investigations in Storage Jul 11, 2023

itsbilal self-assigned this Jul 13, 2023

itsbilal mentioned this issue Jul 18, 2023

storageccl: improve observability around EAR registry bugs #107121

Closed

sumeerbhola self-assigned this Jul 19, 2023

jbowens moved this from Investigations to In Progress (this milestone) in Storage Jul 19, 2023

sumeerbhola mentioned this issue Jul 20, 2023

storage: fix PebbleFileRegistry bug that drops entry on rollover #107249

Merged

craig bot closed this as completed in 0e5d38a Jul 20, 2023

Storage automation moved this from In Progress (this milestone) to Done Jul 20, 2023

blathers-crl bot mentioned this issue Jul 20, 2023

release-22.1: storage: fix PebbleFileRegistry bug that drops entry on rollover #107275

Merged

blathers-crl bot mentioned this issue Jul 20, 2023

release-22.2: storage: fix PebbleFileRegistry bug that drops entry on rollover #107276

Merged

blathers-crl bot mentioned this issue Jul 20, 2023

release-23.1: storage: fix PebbleFileRegistry bug that drops entry on rollover #107277

Merged

exalate-issue-sync bot unassigned sumeerbhola Jul 20, 2023

itsbilal removed their assignment Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: sstable missing from EAR registry #106617

storage: sstable missing from EAR registry #106617

jbowens commented Jul 11, 2023 •

edited by nicktrav

blathers-crl bot commented Jul 11, 2023

itsbilal commented Jul 17, 2023

sumeerbhola commented Jul 19, 2023

itsbilal commented Jul 19, 2023

sumeerbhola commented Jul 19, 2023

jbowens commented Jul 31, 2023

storage: sstable missing from EAR registry #106617

storage: sstable missing from EAR registry #106617

Comments

jbowens commented Jul 11, 2023 • edited by nicktrav

blathers-crl bot commented Jul 11, 2023

itsbilal commented Jul 17, 2023

sumeerbhola commented Jul 19, 2023

itsbilal commented Jul 19, 2023

sumeerbhola commented Jul 19, 2023

jbowens commented Jul 31, 2023

jbowens commented Jul 11, 2023 •

edited by nicktrav