-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re-assess severity of duplicate layers: nowadays it cannot happen & we should panic/abort() early if they do #7790
Comments
The analysis is based on the early exit which tests that compaction doesn't go into a loop where L0 compaction returns the same L0 in
I've since added a test case which actually tests the known duplication situation experienced with
Your PR removes both test cases, but this issue only discusses the first one. I do agree with problem 2 and the lateness. However the known duplicated situation with
I was thinking of link + unlink earlier on internal slack thread but yes, this would be simpler (I assume you did it via |
Background
From the time before always-authoritative
index_part.json
, we had to handle duplicate layers. See the RFC for an illustration of how duplicate layers could happen:neon/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md
Lines 41 to 50 in a8e6d25
As of #5198 , we should not be exposed to that problem anymore.
Problem 1
But, we still have
However, the test in the test suite doesn't use the failpoint to induce a crash that could legitimately happen in production.
What is does instead is to return early with an
Ok()
, so that the code in Pageserver that handles duplicate layers (item 1) actually gets exercised.That "return early" would be a bug in the routine if it happened in production.
So, the tests in the test suite are tests for their own sake, but don't serve to actually regress-test any production behavior.
Problem 2
Further, if production code did (it nowawdays doesn't!) create a duplicate layer, I think the code in Pageserver that handles that condition (item 1 above) is too little too late:
struct Layer
Soution
RENAME_NOREPLACE
to detect this correctlyConcern originally raised in #7707 (comment)
The text was updated successfully, but these errors were encountered: