-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with asynchronous readers #26791
Conversation
We can use
for synchronous reads + thread pool for asynchronous reads. |
I checked manually all the tests that slowed down in performance test. At the same time I remember that But reproducing this difference in multiple runs of performance test will be suspicious. |
Checked on mtlog-perftest03j:
9000 - old, 9001 - new. |
Ok. I've found the reason - performance test has
but it's not used in production. In this PR, the user has to activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think it's Ok for the old (master) version? |
Integration -- no related failures. |
Experiment: compare
Then run two concurrent queries
So, the Regardless to setting different
It has failed with
Note: tables may have different read performance due to different placement of files on HDD platters (near or far from the center). The effect of High priority query: 147.093 sec. Swapped tables: High priority query: 175.486 sec. But low priority query still slows down the higher priority query.
Now low priority query has waited before high priority query completely finishes. High priority query: 121.232 sec.
93.583 sec. It is still 25% better than for concurrent queries.
Without prefetch: 162.786 sec. and 202.415 sec. So, the
160.886 sec. and 162.142 sec. |
But we also need to test on EBS and local SSD. |
When reading data from page cache, |
Experiment: read data from page cache, single column and compare various methods:
|
/// Check if data is already in page cache with preadv2 syscall. | ||
|
||
/// We don't want to depend on new Linux kernel. | ||
static std::atomic<bool> has_pread_nowait_support{true}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::atomic is superfluous right? Since initializing static variable is thread safe.
Also maybe it worth initializing it before any read to avoid any atomic requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's lazy initialized which is not thread safe.
@alexey-milovidov interesting numbers, but what about real queries (when the data is in the local filesystem) in production env, do you think that it will give any benefits?
Especially when underlying storage is some RAID device, for regular HDD 16 threads does not looks optimal. |
When data is not in page cache it is beneficial to limit parallelism for HDD.
This server is using RAID of 8 HDDs. |
Experiment: compare The server has 80 vCPU.
There is no difference between There is no difference between Result: |
Experiment: multicolumn short query on a server with EBS in Yandex Cloud.
When reading from page cache there is no difference between |
Also I've tested on Intel Optane SSD. |
Experiment: testing
First session:
Second session:
Data is being read at constant speed of 430 MB/sec (the limit of EBS). First query started processing data instantly. The second query started only after several tens of seconds and then process data nearly at the same speed as it picks up data from page cache. The second query also paused a few times in the middle - it's when it has to read something from disk but get preempted by the first query. First query has finished in 618.832 sec.
612.859 sec. The same speed as for two concurrent queries (within 1% difference). This is very good result.
689.694 sec. Result: |
I see that we need to enable
|
Should be addressed in #30191 |
Changelog category (leave one):
Currently unrelated and with current setup it makes performance slightly worse most of the times.
But it is aimed to fix absolutely terrible performance of point queries from external storage: #23199
by requesting reading of multiple columns at once (vertical parallelization).