New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable purging of BufferPool pages based on time-since-last-unpinned #11441
Conversation
This PR added the Perhaps we can make a table function that gives us insight into the active/freed buffers and using But that will probably require a statement to force trigger this purging |
aa272d8
to
a17d273
Compare
I might have misjudged the overhead of steady_clock. I'll verify before re-publishing. |
Question: is this ever relevant mid-query or only at query boundaries? If the second, maybe this could be exposed to SQL via a |
3946ed7
to
7f4680d
Compare
@carlopi: Queries can run quite a while. I think it should be possible for a C++ client to invoke cleanup independently of query execution, while queries are running, either on a background thread or something else. For the purposes of testing though, I think a more constrained path for exercising the code would probably be adequate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good - comments below:
src/include/duckdb/common/chrono.hpp
Outdated
@@ -17,4 +17,9 @@ using std::chrono::high_resolution_clock; | |||
using std::chrono::milliseconds; | |||
using std::chrono::system_clock; | |||
using std::chrono::time_point; | |||
|
|||
// The portable way to do this (steady_clock) is not performant on all platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what platforms this isn't performant, and why it isn't performant? Could we use steady_clock::now()
on platforms where it is performant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial implementation used steady_clock because initial research suggested that it should be broadly performant. However, when I submitted the PR, there was an unmistakable perf regression reported by CI. I switched to this implementation and the regression went away.
Unfortunately, since steady_clock makes no actual guarantees about performance, it seemed safer to provide an implementation that uses the most performant method on each platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you point out to where the performance regression is in the CI? The regression tests sometimes have false positives. Unless the perf regression is consistent/severe I would prefer to stick to the portable implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That other PR didn't complain about perf. Should I switch this back to steady_clock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's switch it to steady_clock, we can always switch back
src/storage/buffer/buffer_pool.cpp
Outdated
@@ -50,7 +56,8 @@ bool BufferPool::AddToEvictionQueue(shared_ptr<BlockHandle> &handle) { | |||
// or the block handle is still a local variable (ConvertToPersistent) | |||
|
|||
D_ASSERT(handle->readers == 0); | |||
auto ts = ++handle->eviction_timestamp; | |||
auto ts = ++handle->eviction_seq_num; | |||
handle->lru_timestamp_msec = GetCoarseMonotonicMillisecondTimestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the eviction isn't enabled by default and PurgeAgedBlocks
is never called in the common case - can we avoid making this call in all cases?
For example can we have a setting (e.g. buffer_manager_eviction_timestamps
) - that defaults to false - and when turned off we will avoid making this call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can make this change.
The test failure is almost certainly unrelated to this PR. It's a concurrency test, so it's possibly a real issue in trunk that repros only occasionally. Rerun might pass, but I don't want to do anything that would lose state, if it's useful for debugging. |
Indeed it's most likely related to my recent checkpoint rework - I'll have a look. |
Thanks! |
Merge pull request duckdb/duckdb#11441 from jkub/age_based_eviction
The bufferpool currently uses a fixed-size LRU, but will not evict pages unless it is at the size limit. Stale pages can accumulate indefinitely despite not having been recently referenced. Because these pages are 'anonymous' the OS can only reclaim memory by swapping them, which is often counterproductive, compared to simply evicting them. This PR enables evicting pages that have not been referenced in some recency window, defined by the number of seconds since they were last unpinned.
Overhead
The overhead involves 8 bytes added to the BlockHandle (which is insignificant on this already heavy data structure) and a call to steady_clock::now (which should have a latency measured in ~10 nanoseconds on most platforms, and again be dwarfed by the other work happening when we queue a page for eviction).
Status
I haven't included tests, in part because I'm not sure the best way to expose this feature. As a DUCKDB_API on the database object? As a table function, similar to checkpoint()? I'd appreciate suggestions. Thanks.