ANALYZE not throttled by `max_bytes_per_sec` configuration #15072

karynzv · 2023-11-20T14:23:04Z

CrateDB version

5.3.3

CrateDB setup information

CR1 - 3 node cluster
Related settings as below:
max_bytes_per_sec = 40mb
settings['stats']['service']['interval'] = 24h

Regarding the pg_stats table:

SELECT schemaname, count(*) FROM pg_catalog.pg_stats GROUP BY schemaname LIMIT 100;                                                                                                                             
+------------+----------+
| schemaname | count(*) |
+------------+----------+
| doc        |     1238 |
+------------+----------+

Problem description

Whenever the automated ANALYZE runs, it goes over the 40mb/s as stablished by the default configuration of max_bytes_per_sec.

Steps to Reproduce

This was observed on a production cluster but we didn't manage to reproduce yet.

Actual Result

Whenever the ANALYZE is automatically triggered the cluster goes beyond the 40mb limit

edit: this cluster stores ~3 TiB (i.e. 1.5 TiB in primaries) and apparently reads through he complete volume in an ANALYZE run

Expected Result

The throttling configuration should be enforced for the ANALYZER preserving other queries responsiveness.

The text was updated successfully, but these errors were encountered:

mfussenegger · 2023-11-21T14:19:59Z

I had a look at this, but as far as I can tell the throttle mechanism is generally working.

You can observe this by using different values. For example, in my tests the duration decreased each time I increased the max_bytes_per_sec:

I cleaned the fs cache after each run with echo 3 | sudo tee /proc/sys/vm/drop_caches

cr> set global stats.service.max_bytes_per_sec = '5mb';
SET OK, 1 row affected (0.053 sec)
cr> analyze;
ANALYZE OK, 1 row affected (22.300 sec)


cr> set global stats.service.max_bytes_per_sec = '10mb';
SET OK, 1 row affected (0.019 sec)
cr> analyze;
ANALYZE OK, 1 row affected (11.781 sec)


cr> set global stats.service.max_bytes_per_sec = '20mb';
SET OK, 1 row affected (0.021 sec)
cr> analyze;
ANALYZE OK, 1 row affected (6.389 sec)

I made some changes that should reduce the amount of disk hits, and it should make the throttle work a bit more accurately: #15078
But that change probably won't be backported.

proddata · 2023-11-21T14:59:34Z

I had a look at this, but as far as I can tell the throttle mechanism is generally working.

The original post clearly indicates that the throttle mechanism is not functioning as expected.

mfussenegger · 2023-11-21T15:00:25Z

I had a look at this, but as far as I can tell the throttle mechanism is generally working.

The original post clearly indicates that the throttle mechanism is not functioning as expected.

Can you provide minimal reproduction steps then?

proddata · 2023-11-21T15:32:21Z

I had a look at this, but as far as I can tell the throttle mechanism is generally working.

The original post clearly indicates that the throttle mechanism is not functioning as expected.

Can you provide minimal reproduction steps then?

It is a bit hard to reproduce this with a minimal example, when it isn't even documented what ANALYZE really does or what could affect it's performance. So is it expected, that ANALYZE reads > 1 - 2 TiB from disk every day with a cluster ~ storing 3 TiB of data (i.e. roughly 1.5 TiB primaries)?

mfussenegger · 2023-11-22T13:15:41Z

Okay, so what I could verify is that reads on the disks don't correspond 1:1 to the "application" reads.
The bytes/s rate limit is applied to how the analyze is accessing the values. But disk reads are at a lower level - and how exactly depends on the store.type of a table. E.g. for mmap the kernel takes care of the file access. Then there's disk page sizes, compression, etc. both which can translate a 1 byte lookup, into a much higher disk read value.

E.g. this is the dstat output of ANALYZE with a 2mb rate limit:

-dsk/total-
 read  writ
  38M   43M
   0     0
2608k   11M
2800k  136k
  16k  584k
  91M  440k
 484M    0
 681M  512k
 469M    0
 438M    0
 532M  584k
 616M    0
 433M    0
 456M    0
 469M    0
 288M  584k
 393M    0
 367M    0
 329M    0
 277M    0
 277M    0
 211M    0
 250M    0
 205M    0
 198M    0
 260M  584k
 178M    0
 237M    0
 211M    0
 258M    0
 189M    0
 186M    0
 209M    0
 198M  136k
 181M   10M
 151M  960k
 145M    0
 172M    0
 159M   16k
 130M    0
 143M    0
 106M    0
 173M  272k
 124M  160k
 163M    0
 146M  584k
 115M    0
 126M    0
 163M  136k
 127M  248k
 120M   24M
   0     0

Compared to a 8000mb limit:

-dsk/total-
 read  writ
  39M   43M
   0     0
5936k   17M
 432k  592k
 280k 1792k
  74M 1040k
1366M  136k
1976M  520k
1719M  328k
1457M    0
1196M  584k
 944M   40k
 852M    0
 873M    0
 688M  184k
 246M   37M
   0     0

To apply a rate limit closer at the source we'd probably have to wrap the IndexInput, but I'm not aware of an easy way to do that.

mfussenegger · 2023-11-27T10:32:42Z

Closing this as the throttle is generally working. #15087 should also reduce the number of disk reads.

seut · 2023-11-27T10:50:57Z

Okay, so what I could verify is that reads on the disks don't correspond 1:1 to the "application" reads. The bytes/s rate limit is applied to how the analyze is accessing the values. But disk reads are at a lower level - and how exactly depends on the store.type of a table. E.g. for mmap the kernel takes care of the file access. Then there's disk page sizes, compression, etc. both which can translate a 1 byte lookup, into a much higher disk read value.

Maybe worth adding a note related to this to the documentation of the max_bytes_per_sec setting?

karynzv added the triage An issue that needs to be triaged by a maintainer label Nov 20, 2023

mkleen changed the title ~~ANALYZER not throttled by max_bytes_per_sec configuration~~ ANALYZE not throttled by max_bytes_per_sec configuration Nov 20, 2023

mfussenegger self-assigned this Nov 21, 2023

mfussenegger mentioned this issue Nov 21, 2023

Avoid cycling searchers & leafs in ReservoirSampler; prefer source lookups #15078

Closed

mfussenegger removed their assignment Nov 21, 2023

mfussenegger added needs reproduction and removed triage An issue that needs to be triaged by a maintainer labels Nov 21, 2023

mfussenegger added feature: performance complexity: no estimate and removed needs reproduction labels Nov 22, 2023

mfussenegger closed this as completed Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANALYZE not throttled by `max_bytes_per_sec` configuration #15072

ANALYZE not throttled by `max_bytes_per_sec` configuration #15072

karynzv commented Nov 20, 2023 •

edited by proddata

mfussenegger commented Nov 21, 2023

proddata commented Nov 21, 2023

mfussenegger commented Nov 21, 2023

proddata commented Nov 21, 2023

mfussenegger commented Nov 22, 2023 •

edited

mfussenegger commented Nov 27, 2023

seut commented Nov 27, 2023

ANALYZE not throttled by max_bytes_per_sec configuration #15072

ANALYZE not throttled by max_bytes_per_sec configuration #15072

Comments

karynzv commented Nov 20, 2023 • edited by proddata

CrateDB version

CrateDB setup information

Problem description

Steps to Reproduce

Actual Result

Expected Result

mfussenegger commented Nov 21, 2023

proddata commented Nov 21, 2023

mfussenegger commented Nov 21, 2023

proddata commented Nov 21, 2023

mfussenegger commented Nov 22, 2023 • edited

mfussenegger commented Nov 27, 2023

seut commented Nov 27, 2023

ANALYZE not throttled by `max_bytes_per_sec` configuration #15072

ANALYZE not throttled by `max_bytes_per_sec` configuration #15072

karynzv commented Nov 20, 2023 •

edited by proddata

mfussenegger commented Nov 22, 2023 •

edited