Add ability to limit the maximum number of CPU cores used for queries #291

valyala · 2020-01-20T12:01:09Z

Is your feature request related to a problem? Please describe.
A single heavy query in VictoriaMetrics can occupy all the available CPU cores. This is good for returning query results ASAP. But this can negatively affect data ingestion pipeline, which may starve for CPU resources while heavy queries are performed. This is seen in production.

Describe the solution you'd like
It would be great adding -search.maxCPUs command-line flag for limiting the maximum number of CPUs that can be used by query pipeline. Then we can leave the guaranteed number of CPU cores for data ingestion pipeline. For instance, if a system has 64 CPU cores, then -search.maxCPUs=60 would guarantee that at least 4 CPU cores will be always available for data ingestion path.

Describe alternatives you've considered
An alternative is to reduce -search.maxConcurrentRequests. This reduces the number of concurrently running queries, leaving more chances for ingestion path to get the required CPU resources. Note that -search.maxConcurrentRequests doesn't limit the number of CPU cores that can be used by queries, since a single heavy query can take all the available CPU cores as described above.

The text was updated successfully, but these errors were encountered:

Heavy queries could result in the lack of CPU resources for processing the current data ingestion stream. Prevent this by delaying queries' execution until free resources are available for data ingestion. Expose `vm_search_delays_total` metric, which may be used in for alerting when there is no enough CPU resources for data ingestion and/or for executing heavy queries. Updates #291

valyala · 2020-07-08T18:50:54Z

FYI, VictoriaMetrics implements a mechanism for prioritizing data ingestion over querying starting from v1.38.0.

Prioritize also small merges over big merges. Updates #291 Updates #648

valyala · 2020-07-24T18:49:12Z

FYI, the release v1.39.0 should improve further the prioritization of data ingestion over heavy queries.

…s during assisted merges Updates #3647 Updates #3641 Updates #648 Updates #291

@misutoth

…lable - Maintain a separate worker pool per each part type (in-memory, file, big and small). Previously a shared pool was used for merging all the part types. A single merge worker could merge parts with mixed types at once. For example, it could merge simultaneously an in-memory part plus a big file part. Such a merge could take hours for big file part. During the duration of this merge the in-memory part was pinned in memory and couldn't be persisted to disk under the configured -inmemoryDataFlushInterval . Another common issue, which could happen when parts with mixed types are merged, is uncontrolled growth of in-memory parts or small parts when all the merge workers were busy with merging big files. Such growth could lead to significant performance degradataion for queries, since every query needs to check ever growing list of parts. This could also slow down the registration of new time series, since VictoriaMetrics searches for the internal series_id in the indexdb for every new time series. The third issue is graceful shutdown duration, which could be very long when a background merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted, since it merges in-memory parts. A separate pool of merge workers per every part type elegantly resolves both issues: - In-memory parts are merged to file-based parts in a timely manner, since the maximum size of in-memory parts is limited. - Long-running merges for big parts do not block merges for in-memory parts and small parts. - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files. Merging for file parts is instantly canceled on graceful shutdown now. - Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm should automatically self-tune according to the number of available CPU cores. - Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly. It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge - Tune the number of shards for pending rows and items before the data goes to in-memory parts and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate for registration of new time series. This should reduce the duration of data ingestion slowdown in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily unavailable. - Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown. This is a follow-up for fa566c6 . Thanks @misutoth to for the inspiration at #5212 Updates #5190 Updates #3790 Updates #3551 Updates #3337 Updates #3425 Updates #3647 Updates #3641 Updates #648 Updates #291

@misutoth

…lable - Maintain a separate worker pool per each part type (in-memory, file, big and small). Previously a shared pool was used for merging all the part types. A single merge worker could merge parts with mixed types at once. For example, it could merge simultaneously an in-memory part plus a big file part. Such a merge could take hours for big file part. During the duration of this merge the in-memory part was pinned in memory and couldn't be persisted to disk under the configured -inmemoryDataFlushInterval . Another common issue, which could happen when parts with mixed types are merged, is uncontrolled growth of in-memory parts or small parts when all the merge workers were busy with merging big files. Such growth could lead to significant performance degradataion for queries, since every query needs to check ever growing list of parts. This could also slow down the registration of new time series, since VictoriaMetrics searches for the internal series_id in the indexdb for every new time series. The third issue is graceful shutdown duration, which could be very long when a background merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted, since it merges in-memory parts. A separate pool of merge workers per every part type elegantly resolves both issues: - In-memory parts are merged to file-based parts in a timely manner, since the maximum size of in-memory parts is limited. - Long-running merges for big parts do not block merges for in-memory parts and small parts. - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files. Merging for file parts is instantly canceled on graceful shutdown now. - Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm should automatically self-tune according to the number of available CPU cores. - Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly. It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge - Tune the number of shards for pending rows and items before the data goes to in-memory parts and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate for registration of new time series. This should reduce the duration of data ingestion slowdown in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily unavailable. - Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown. This is a follow-up for fa566c6 . Thanks @misutoth to for the inspiration at #5212 Updates #5190 Updates #3790 Updates #3551 Updates #3337 Updates #3425 Updates #3647 Updates #3641 Updates #648 Updates #291

valyala · 2024-02-07T22:09:16Z

FYI, VictoriaMetrics supports -search.maxWorkersPerQuery command-line flag starting from v1.95.0 release - see this pull request for details.

This allows limiting the number of CPU cores, which can be used by a single query. VictoriaMetrics also provides an ability to configure the maximum number of concurrent queries via -search.maxConcurrentRequests command-line flag. See these docs for details. These two command-line flags allow limiting the number of CPU cores used for queries.

Closing the feature request as done.

valyala added the enhancement New feature or request label Jan 20, 2020

valyala added a commit that referenced this issue Jul 23, 2020

lib/storage: improve prioritizing of data ingestion over querying

6f05c4d

Prioritize also small merges over big merges. Updates #291 Updates #648

valyala added a commit that referenced this issue Jul 23, 2020

lib/storage: improve prioritizing of data ingestion over querying

b8303af

Prioritize also small merges over big merges. Updates #291 Updates #648

valyala mentioned this issue Jul 23, 2020

Regular high disk read utilization causes freezing all other operations #648

Closed

valyala added a commit that referenced this issue Jan 16, 2023

lib/{mergeset,storage}: do not slow down concurrently executed querie…

09d7fa2

…s during assisted merges Updates #3647 Updates #3641 Updates #648 Updates #291

valyala added a commit that referenced this issue Jan 16, 2023

lib/{mergeset,storage}: do not slow down concurrently executed querie…

103dfd0

…s during assisted merges Updates #3647 Updates #3641 Updates #648 Updates #291

valyala closed this as completed Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to limit the maximum number of CPU cores used for queries #291

Add ability to limit the maximum number of CPU cores used for queries #291

valyala commented Jan 20, 2020

valyala commented Jul 8, 2020

valyala commented Jul 24, 2020

valyala commented Feb 7, 2024

Add ability to limit the maximum number of CPU cores used for queries #291

Add ability to limit the maximum number of CPU cores used for queries #291

Comments

valyala commented Jan 20, 2020

valyala commented Jul 8, 2020

valyala commented Jul 24, 2020

valyala commented Feb 7, 2024