re-assess severity of duplicate layers: nowadays it cannot happen & we should panic/abort() early if they do

# Background

From the time before always-authoritative `index_part.json`, we had to handle duplicate layers. See  the RFC for an illustration of how duplicate layers could happen: https://github.com/neondatabase/neon/blob/a8e6d259cb49d1bf156dfc2215b92c04d1e8a08f/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md?plain=1#L41-L50 

As of #5198 , we should not be exposed to that problem anymore.

# Problem 1

But, we still have
1. [code in Pageserver](https://github.com/neondatabase/neon/blob/82960b2175211c0f666b91b5258c5e2253a245c7/pageserver/src/tenant/timeline.rs#L4502-L4521) than handles duplicate layers
2. [tests in the test suite](https://github.com/neondatabase/neon/blob/d9dcbffac37ccd3331ec9adcd12fd20ce0ea31aa/test_runner/regress/test_duplicate_layers.py#L15) that demonstrates the problem using a failpoint

However, the test in the test suite doesn't use the failpoint to induce a crash that could legitimately happen in production.
What is does instead is to return early with an `Ok()`, so that the code in Pageserver that handles duplicate layers (item 1) actually gets exercised.

That "return early" would be a bug in the routine if it happened in production.
So, the tests in the test suite are tests for their own sake, but don't serve to actually regress-test any production behavior.

# Problem 2

Further, if production code _did_ (it nowawdays doesn't!) create a duplicate layer, I think the code in Pageserver that handles that condition (item 1 above) is too little too late:

* the code handles it by discarding the newer `struct Layer`
* however, on disk, we have already overwritten the old with the new layer file
* the fact that we do it atomically doesn't matter because ...
* if the new layer file is not bit-identical, then we have a cache coherency problem
  * PS PageCache block cache: caches old bit battern
  * blob_io offsets stored in variables, based on pre-overwrite bit pattern / offsets
    * => reading based on these offsets from the new file might yield different data than before
 
# Soution

- Remove the test suite code
- Remove the Pageserver code that handles duplicate layers too late
- Add a panic/abort in the Pageserver code for when we'd overwrite a layer
  - Use `RENAME_NOREPLACE` to detect this correctly


_Concern originally raised in https://github.com/neondatabase/neon/issues/7707#issuecomment-2112743857_
            

	The implications of the above are primarily problematic for compaction.
	Specifically, the part of it that compacts L0 layers into L1 layers.

	Remember that compaction takes a set of L0 layers and reshuffles the delta records in them into L1 layer files.
	Once the L1 layer files are written to disk, it atomically removes the L0 layers from the layer map and adds the L1 layers to the layer map.
	It then deletes the L0 layers locally, and schedules an upload of the L1 layers and and updated index part.

	If we crash before deleting L0s, but after writing out L1s, the next compaction after restart will re-digest the L0s and produce new L1s.
	This means the compaction after restart will overwrite the previously written L1s.
	Currently we also schedule an S3 upload of the overwritten L1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

re-assess severity of duplicate layers: nowadays it cannot happen & we should panic/abort() early if they do #7790

Background

Problem 1

Problem 2

Soution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

re-assess severity of duplicate layers: nowadays it cannot happen & we should panic/abort() early if they do #7790

Description

Background

Problem 1

Problem 2

Soution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions