Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANALYZE not throttled by max_bytes_per_sec configuration #15072

Closed
karynzv opened this issue Nov 20, 2023 · 7 comments
Closed

ANALYZE not throttled by max_bytes_per_sec configuration #15072

karynzv opened this issue Nov 20, 2023 · 7 comments

Comments

@karynzv
Copy link
Contributor

karynzv commented Nov 20, 2023

CrateDB version

5.3.3

CrateDB setup information

CR1 - 3 node cluster
Related settings as below:
max_bytes_per_sec = 40mb
settings['stats']['service']['interval'] = 24h

Regarding the pg_stats table:

SELECT schemaname, count(*) FROM pg_catalog.pg_stats GROUP BY schemaname LIMIT 100;                                                                                                                             
+------------+----------+
| schemaname | count(*) |
+------------+----------+
| doc        |     1238 |
+------------+----------+

Problem description

Whenever the automated ANALYZE runs, it goes over the 40mb/s as stablished by the default configuration of max_bytes_per_sec.

Steps to Reproduce

This was observed on a production cluster but we didn't manage to reproduce yet.

Actual Result

Whenever the ANALYZE is automatically triggered the cluster goes beyond the 40mb limit
image (1)

edit: this cluster stores ~3 TiB (i.e. 1.5 TiB in primaries) and apparently reads through he complete volume in an ANALYZE run

Expected Result

The throttling configuration should be enforced for the ANALYZER preserving other queries responsiveness.

@karynzv karynzv added the triage An issue that needs to be triaged by a maintainer label Nov 20, 2023
@mkleen mkleen changed the title ANALYZER not throttled by max_bytes_per_sec configuration ANALYZE not throttled by max_bytes_per_sec configuration Nov 20, 2023
@mfussenegger mfussenegger self-assigned this Nov 21, 2023
@mfussenegger
Copy link
Member

I had a look at this, but as far as I can tell the throttle mechanism is generally working.

You can observe this by using different values. For example, in my tests the duration decreased each time I increased the max_bytes_per_sec:

I cleaned the fs cache after each run with echo 3 | sudo tee /proc/sys/vm/drop_caches

cr> set global stats.service.max_bytes_per_sec = '5mb';
SET OK, 1 row affected (0.053 sec)
cr> analyze;
ANALYZE OK, 1 row affected (22.300 sec)


cr> set global stats.service.max_bytes_per_sec = '10mb';
SET OK, 1 row affected (0.019 sec)
cr> analyze;
ANALYZE OK, 1 row affected (11.781 sec)


cr> set global stats.service.max_bytes_per_sec = '20mb';
SET OK, 1 row affected (0.021 sec)
cr> analyze;
ANALYZE OK, 1 row affected (6.389 sec)

I made some changes that should reduce the amount of disk hits, and it should make the throttle work a bit more accurately: #15078
But that change probably won't be backported.

@proddata
Copy link
Member

I had a look at this, but as far as I can tell the throttle mechanism is generally working.

The original post clearly indicates that the throttle mechanism is not functioning as expected.

@mfussenegger
Copy link
Member

I had a look at this, but as far as I can tell the throttle mechanism is generally working.

The original post clearly indicates that the throttle mechanism is not functioning as expected.

Can you provide minimal reproduction steps then?

@proddata
Copy link
Member

I had a look at this, but as far as I can tell the throttle mechanism is generally working.

The original post clearly indicates that the throttle mechanism is not functioning as expected.

Can you provide minimal reproduction steps then?

It is a bit hard to reproduce this with a minimal example, when it isn't even documented what ANALYZE really does or what could affect it's performance. So is it expected, that ANALYZE reads > 1 - 2 TiB from disk every day with a cluster ~ storing 3 TiB of data (i.e. roughly 1.5 TiB primaries)?

@mfussenegger mfussenegger removed their assignment Nov 21, 2023
@mfussenegger mfussenegger added needs reproduction and removed triage An issue that needs to be triaged by a maintainer labels Nov 21, 2023
@mfussenegger
Copy link
Member

mfussenegger commented Nov 22, 2023

Okay, so what I could verify is that reads on the disks don't correspond 1:1 to the "application" reads.
The bytes/s rate limit is applied to how the analyze is accessing the values. But disk reads are at a lower level - and how exactly depends on the store.type of a table. E.g. for mmap the kernel takes care of the file access. Then there's disk page sizes, compression, etc. both which can translate a 1 byte lookup, into a much higher disk read value.

E.g. this is the dstat output of ANALYZE with a 2mb rate limit:

-dsk/total-
 read  writ
  38M   43M
   0     0
2608k   11M
2800k  136k
  16k  584k
  91M  440k
 484M    0
 681M  512k
 469M    0
 438M    0
 532M  584k
 616M    0
 433M    0
 456M    0
 469M    0
 288M  584k
 393M    0
 367M    0
 329M    0
 277M    0
 277M    0
 211M    0
 250M    0
 205M    0
 198M    0
 260M  584k
 178M    0
 237M    0
 211M    0
 258M    0
 189M    0
 186M    0
 209M    0
 198M  136k
 181M   10M
 151M  960k
 145M    0
 172M    0
 159M   16k
 130M    0
 143M    0
 106M    0
 173M  272k
 124M  160k
 163M    0
 146M  584k
 115M    0
 126M    0
 163M  136k
 127M  248k
 120M   24M
   0     0

Compared to a 8000mb limit:

-dsk/total-
 read  writ
  39M   43M
   0     0
5936k   17M
 432k  592k
 280k 1792k
  74M 1040k
1366M  136k
1976M  520k
1719M  328k
1457M    0
1196M  584k
 944M   40k
 852M    0
 873M    0
 688M  184k
 246M   37M
   0     0

To apply a rate limit closer at the source we'd probably have to wrap the IndexInput, but I'm not aware of an easy way to do that.

@mfussenegger
Copy link
Member

Closing this as the throttle is generally working. #15087 should also reduce the number of disk reads.

@seut
Copy link
Member

seut commented Nov 27, 2023

Okay, so what I could verify is that reads on the disks don't correspond 1:1 to the "application" reads. The bytes/s rate limit is applied to how the analyze is accessing the values. But disk reads are at a lower level - and how exactly depends on the store.type of a table. E.g. for mmap the kernel takes care of the file access. Then there's disk page sizes, compression, etc. both which can translate a 1 byte lookup, into a much higher disk read value.

Maybe worth adding a note related to this to the documentation of the max_bytes_per_sec setting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants