Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storageccl: improve observability around EAR registry bugs #107121

Closed
itsbilal opened this issue Jul 18, 2023 · 1 comment · Fixed by #107446
Closed

storageccl: improve observability around EAR registry bugs #107121

itsbilal opened this issue Jul 18, 2023 · 1 comment · Fixed by #107446
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team
Projects

Comments

@itsbilal
Copy link
Member

Currently, whenever we rotate the records-based file registry in the encryption-at-rest encryptedFS, we delete the previous one instead of keeping it around for debuggability. This doesn't match the behaviour for Pebble manifests where we keep the past two manifests around on disk, a feature that has proved invaluable in multiple past investigations.

As part of this change, also consider adding a log entry any time we elide a file due to it not being present on disk, such as during node restart. This would help a lot in investigating whether a file was removed during load-time encryptedFS.maybeElideEntries.

These changes would help in investigating #106617.

@itsbilal itsbilal added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Jul 18, 2023
@blathers-crl blathers-crl bot added this to Incoming in Storage Jul 18, 2023
@itsbilal
Copy link
Member Author

cc @sumeerbhola in case you want to pick this up

@sumeerbhola sumeerbhola self-assigned this Jul 19, 2023
@jbowens jbowens moved this from Incoming to In Progress (this milestone) in Storage Jul 19, 2023
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jul 24, 2023
Old file registry files are to aid investigations that require file
history, when using encryption-at-rest. We already keep old Pebble
MANIFESTs for such investigations, and not having history for the file
registry was an oversight. Additionally, the file registry logs the
entries it elided at load time.

The test added for this functionality also checks that the file registry
is correctly syncing changes to disk.

Epic: none

Fixes: cockroachdb#107121

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jul 24, 2023
Old file registry files are to aid investigations that require file
history, when using encryption-at-rest. We already keep old Pebble
MANIFESTs for such investigations, and not having history for the file
registry was an oversight. Additionally, the file registry logs the
entries it elided at load time.

The test added for this functionality also checks that the file registry
is correctly syncing changes to disk.

Epic: none

Fixes: cockroachdb#107121

Release note: None
craig bot pushed a commit that referenced this issue Jul 26, 2023
107446: storage: keep two old file registry files, and log entry elision r=itsbilal,jbowens a=sumeerbhola

Old file registry files are to aid investigations that require file history, when using encryption-at-rest. We already keep old Pebble MANIFESTs for such investigations, and not having history for the file registry was an oversight. Additionally, the file registry logs the entries it elided at load time.

The test added for this functionality also checks that the file registry is correctly syncing changes to disk.

Epic: none

Fixes: #107121

Release note: None

107556: changefeedccl: make TestChangefeedRandomExpressions failures fatal r=miretskiy a=jayshrivastava

When this test fails due to a changefeed not being able to handle a predicate (which is exactly what should make this test fail), the test does not terminate. This makes it so the error message get burried in log lines such as in (#107382). In some cases like (#102016), the error is not visible because the test output is truncated and the full output has expired in teamcity. This change makes such failures fatal so the error is the last thing output to the test logs.

Epic: None
Release note: None


Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
@craig craig bot closed this as completed in e1df0e3 Jul 26, 2023
Storage automation moved this from In Progress (this milestone) to Done Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team
Projects
Storage
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants