Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zenith_test_evict mode and clear_buffer_cache() invalidate buffers incorrectly. #7802

Open
hlinnaka opened this issue May 17, 2024 · 0 comments
Assignees
Labels
t/bug Issue Type: Bug

Comments

@hlinnaka
Copy link
Contributor

We have added this code in UnpinBuffer() to support the zenith_test_evict mode, which is used in tests to aggressively evict pages from page cache, and also by the clear_buffer_cache() test function used to invalidate all pages in page cache:

+
+               if (zenith_test_evict && !InRecovery)
+               {
+                       buf_state = LockBufHdr(buf);
+                       if (BUF_STATE_GET_REFCOUNT(buf_state) == 0)
+                       {
+                               if (buf_state & BM_DIRTY)
+                               {
+                                       ReservePrivateRefCountEntry();
+                                       PinBuffer_Locked(buf);
+                                       if (LWLockConditionalAcquire(BufferDescriptorGetContentLock(buf),
+                                                                                                LW_SHARED))
+                                       {
+                                               FlushOneBuffer(b);
+                                               LWLockRelease(BufferDescriptorGetContentLock(buf));
+                                       }
+                                       UnpinBuffer(buf);
+                               }
+                               else
+                               {
+                                       InvalidateBuffer(buf);
+                               }
+                       }
+                       else
+                               UnlockBufHdr(buf, buf_state);
+               }

So this uses InvalidateBuffer(). The comment on InvalidateBuffer says:

 * This is used only in contexts such as dropping a relation.  We assume
 * that no other backend could possibly be interested in using the page,
 * so the only reason the buffer might be pinned is if someone else is
 * trying to write it out.  We have to let them finish before we can
 * reclaim the buffer.

In our tests, we are using the function in context that that assumption doesn't hold.

I bumped into this while debugging #7791. My repro script calls clear_buffer_cache() very aggressively, while WAL redo tries to read the pages at the same time. Every now and then the InvalidateBuffer() call would invalidate a dirty buffer. Writing out a dirty buffer is a no-op in Neon, except that the last-written LSN doesn't get updated if you don't call the smgrwrite() function. That's what I saw in my test case: the last-written LSN of a page was not updated when it was invalidated by the clear_buffer_cache() call, and the page was concurrently pinned and updated by the redo function.

@hlinnaka hlinnaka added the t/bug Issue Type: Bug label May 17, 2024
@hlinnaka hlinnaka self-assigned this May 17, 2024
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 17, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 17, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 17, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit that referenced this issue May 17, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue #7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 17, 2024
hlinnaka added a commit that referenced this issue May 18, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue #7802
hlinnaka added a commit that referenced this issue May 19, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue #7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 21, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 21, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 21, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 21, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 21, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit to neondatabase/postgres that referenced this issue May 21, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue neondatabase/neon#7802
hlinnaka added a commit that referenced this issue May 21, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue #7802
hlinnaka added a commit that referenced this issue May 22, 2024
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue #7802
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

1 participant