New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutex contention in vmselect #5087
Comments
Hello @misutoth , thank you for the detailed report! Currently, VictoriaMetrics uses Golang sync.Map which is optimized for this use-case(1):
So there is not much we can optimize on our side without removing @valyala Could you advise if it would make sense to try replacing |
Hello all. Thanks for the quick response.
Yes, a lot of time series are used. However, there are not that many new time series on each execution of the queries. I am thinking that the |
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087
I havent reached that conclusion yet but I believe you. 😄 I was just thinking that maybe with a more generous eviction time it might be still useful even in an environment with high cpu cores and multiple vmselects. Because I guess in most deployments and in most of the times churn is supposed to be low. So the interned strings may be read multiple times throughout their lifecycle. If you give me a reasonable eviction time I can run the performance test in our environment with that. |
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087
…5119) reduce lock contention for heavy aggregation requests previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087
…5119) reduce lock contention for heavy aggregation requests previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087
…5119) reduce lock contention for heavy aggregation requests previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087
…5119) reduce lock contention for heavy aggregation requests previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087
@misutoth , could you build |
FYI, the fix, which reduces mutex contention at P.S. The fix will be also included in the upcoming v1.95.0 release. |
All looks good with v1.93.6. Thank you all! ❤️ |
Closing the issue as fixed |
- The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates #5087
* app/vmselect: limit the number of parallel workers by 32 The change should improve performance and memory usage during query processing on machines with big number of CPU cores. The number of parallel workers for query processing is controlled via `-search.maxWorkersPerQuery` command-line flag. By default, the number of workers is limited by the number of available CPU cores, but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip - The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates #5087 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* app/vmselect: limit the number of parallel workers by 32 The change should improve performance and memory usage during query processing on machines with big number of CPU cores. The number of parallel workers for query processing is controlled via `-search.maxWorkersPerQuery` command-line flag. By default, the number of workers is limited by the number of available CPU cores, but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip - The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates #5087 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* app/vmselect: limit the number of parallel workers by 32 The change should improve performance and memory usage during query processing on machines with big number of CPU cores. The number of parallel workers for query processing is controlled via `-search.maxWorkersPerQuery` command-line flag. By default, the number of workers is limited by the number of available CPU cores, but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip - The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates #5087 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* app/vmselect: limit the number of parallel workers by 32 The change should improve performance and memory usage during query processing on machines with big number of CPU cores. The number of parallel workers for query processing is controlled via `-search.maxWorkersPerQuery` command-line flag. By default, the number of workers is limited by the number of available CPU cores, but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip - The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates #5087 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
FYI, the next release of VictoriaMetrics will support |
…ictoriaMetrics#5119) reduce lock contention for heavy aggregation requests previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. VictoriaMetrics#5087
… string when storing a value by map key The assigned map key shouldn't change over time, otherwise the map won't work properly. This is a follow-up for 1f91f22 Updates VictoriaMetrics#5087
…rics#5195) * app/vmselect: limit the number of parallel workers by 32 The change should improve performance and memory usage during query processing on machines with big number of CPU cores. The number of parallel workers for query processing is controlled via `-search.maxWorkersPerQuery` command-line flag. By default, the number of workers is limited by the number of available CPU cores, but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip - The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates VictoriaMetrics#5087 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The |
Describe the bug
Using VictoriaMetrics cluster version, and I choose to use vmselect v1.93.3-cluster, some queries execute fast but some takes orders of magnitude longer. See
We are using HPA but apparently the system does not want to scale up. It seems there is some problem preventing parallelization. There are "only" 6 vmselect pods so it seems there is not a loack of CPU resources. And indeed, when I execute the following on vmselect:
$ go tool pprof -web http://localhost:53808/debug/pprof/goroutine
We can see that the goroutines are waiting on the mutex to access items InternString cache. Please see
Mutex contention.pdf
When I disable caching with
internStringDisableCache=true
the contention disappears, vmselect instance count increases to 9, response time drops and stabilizes:No contention.pdf
To Reproduce
We have a performance test environment to simulate the production load. We simulate about 430 million unique time series, stored with replication factor of 2. The ingestion speed is 6.5 million samples per second. We have about 700 recording rules generating queries quite frequently.
Version
/ # /vmselect-prod --version
vmselect-20230902-002725-tags-v1.93.3-cluster-0-gf78d8b994d
/ # /vmstorage-prod --version
vmstorage-20230902-002932-tags-v1.93.3-cluster-0-gf78d8b994d
/ # /vminsert-prod --version
vminsert-20230902-002549-tags-v1.93.3-cluster-0-gf78d8b994d
Logs
may not be needed
Screenshots
Mutex contention.pdf
No contention.pdf
Used command-line flags
vmslect:
dedup.minScrapeInterval=1ms
envflag.enable=true
envflag.prefix=VM_
http.connTimeout=15s
http.maxGracefulShutdownDuration=2m
loggerFormat=json
memory.allowedBytes=40GiB
replicationFactor=2
search.maxConcurrentRequests=150
search.maxLookback=6m
search.maxQueryDuration=300s
search.maxSamplesPerQuery=5000000000
search.maxSeries=500000
search.maxStalenessInterval=6m
search.maxUniqueTimeseries=100000000
search.minStalenessInterval=5m
vmalert.proxyURL=http://alert-rules:8880
vmstorageDialTimeout=1s
vmstorage:
retentionPeriod=1
storageDataPath=/storage
envflag.enable=true
envflag.prefix=VM_
http.connTimeout=15s
loggerFormat=json
memory.allowedBytes=220GiB
search.maxUniqueTimeseries=100000000
smallMergeConcurrency=4
vminsert:
envflag.enable=true
envflag.prefix=VM_
http.connTimeout=15s
insert.maxQueueDuration=30s
loggerFormat=json
maxConcurrentInserts=50
maxInsertRequestSize=1GiB
maxLabelsPerTimeseries=70
memory.allowedBytes=20GiB
opentsdbhttp.maxInsertRequestSize=512MiB
replicationFactor=2
vmstorageDialTimeout=2s
Additional information
I have just realized as I was writing my report and was looking at the source code that I can disable this kind of caching using the
internStringDisableCache
flag. Originally I wanted to report it as a regression and compare it with v1.81.2.Disabling this cache is a good workaround but probably the motivation was that it tolerates high concurrency. So this bug report may be useful still.
The text was updated successfully, but these errors were encountered: