Mutex contention in vmselect #5087

misutoth · 2023-09-28T16:12:16Z

Describe the bug

Using VictoriaMetrics cluster version, and I choose to use vmselect v1.93.3-cluster, some queries execute fast but some takes orders of magnitude longer. See
We are using HPA but apparently the system does not want to scale up. It seems there is some problem preventing parallelization. There are "only" 6 vmselect pods so it seems there is not a loack of CPU resources. And indeed, when I execute the following on vmselect:

$ go tool pprof -web http://localhost:53808/debug/pprof/goroutine

We can see that the goroutines are waiting on the mutex to access items InternString cache. Please see
Mutex contention.pdf

When I disable caching with internStringDisableCache=true the contention disappears, vmselect instance count increases to 9, response time drops and stabilizes:
No contention.pdf

To Reproduce

We have a performance test environment to simulate the production load. We simulate about 430 million unique time series, stored with replication factor of 2. The ingestion speed is 6.5 million samples per second. We have about 700 recording rules generating queries quite frequently.

Version

/ # /vmselect-prod --version
vmselect-20230902-002725-tags-v1.93.3-cluster-0-gf78d8b994d

/ # /vmstorage-prod --version
vmstorage-20230902-002932-tags-v1.93.3-cluster-0-gf78d8b994d

/ # /vminsert-prod --version
vminsert-20230902-002549-tags-v1.93.3-cluster-0-gf78d8b994d

Logs

may not be needed

Screenshots

Mutex contention.pdf

No contention.pdf

Used command-line flags

vmslect:
dedup.minScrapeInterval=1ms
envflag.enable=true
envflag.prefix=VM_
http.connTimeout=15s
http.maxGracefulShutdownDuration=2m
loggerFormat=json
memory.allowedBytes=40GiB
replicationFactor=2
search.maxConcurrentRequests=150
search.maxLookback=6m
search.maxQueryDuration=300s
search.maxSamplesPerQuery=5000000000
search.maxSeries=500000
search.maxStalenessInterval=6m
search.maxUniqueTimeseries=100000000
search.minStalenessInterval=5m
vmalert.proxyURL=http://alert-rules:8880
vmstorageDialTimeout=1s

vmstorage:
retentionPeriod=1
storageDataPath=/storage
envflag.enable=true
envflag.prefix=VM_
http.connTimeout=15s
loggerFormat=json
memory.allowedBytes=220GiB
search.maxUniqueTimeseries=100000000
smallMergeConcurrency=4

vminsert:
envflag.enable=true
envflag.prefix=VM_
http.connTimeout=15s
insert.maxQueueDuration=30s
loggerFormat=json
maxConcurrentInserts=50
maxInsertRequestSize=1GiB
maxLabelsPerTimeseries=70
memory.allowedBytes=20GiB
opentsdbhttp.maxInsertRequestSize=512MiB
replicationFactor=2
vmstorageDialTimeout=2s

Additional information

I have just realized as I was writing my report and was looking at the source code that I can disable this kind of caching using the internStringDisableCache flag. Originally I wanted to report it as a regression and compare it with v1.81.2.

Disabling this cache is a good workaround but probably the motivation was that it tolerates high concurrency. So this bug report may be useful still.

The text was updated successfully, but these errors were encountered:

zekker6 · 2023-09-29T08:27:52Z

Hello @misutoth , thank you for the detailed report!

Currently, VictoriaMetrics uses Golang sync.Map which is optimized for this use-case(1):

The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.

So there is not much we can optimize on our side without removing sync.Map.
Is it possible that during alert / recording rules calculation a lot of unique series is fetched?
In case it is true, there will be a lot of unique strings fetched at every evaluation, which makes it almost harmful to try to intern and cache all these values as values will not be reused later on.

@valyala Could you advise if it would make sense to try replacing sync.Map with fastcache here?

misutoth · 2023-10-02T08:37:27Z

Hello all. Thanks for the quick response.

Is it possible that during alert / recording rules calculation a lot of unique series is fetched?

Yes, a lot of time series are used. However, there are not that many new time series on each execution of the queries.

I am thinking that the internStringCacheExpireDuration default of 6 minutes may be too short in our use case. We have around 8-12 instances of vmselect which are load balanced. Also, many of the rules have 3 minutes interval. All these increases the chances that the interned strings get evicted. That could lead to such inefficiency, couldnt it?

hagen1778 · 2023-10-03T09:12:36Z

@misutoth if I'm not mistaken, it is a bug that interning is still used in vmselect. It shouldn't be there.
AFAIK, @f41gh7 is going to remove it as soon as he has time.

previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087

misutoth · 2023-10-04T14:42:00Z

@misutoth if I'm not mistaken, it is a bug that interning is still used in vmselect. It shouldn't be there. AFAIK, @f41gh7 is going to remove it as soon as he has time.

I havent reached that conclusion yet but I believe you. 😄 I was just thinking that maybe with a more generous eviction time it might be still useful even in an environment with high cpu cores and multiple vmselects. Because I guess in most deployments and in most of the times churn is supposed to be low. So the interned strings may be read multiple times throughout their lifecycle. If you give me a reasonable eviction time I can run the performance test in our environment with that.

previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087

…5119) reduce lock contention for heavy aggregation requests previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087

… string when storing a value by map key The assigned map key shouldn't change over time, otherwise the map won't work properly. This is a follow-up for 1f91f22 Updates #5087

…5119) reduce lock contention for heavy aggregation requests previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. #5087

… string when storing a value by map key The assigned map key shouldn't change over time, otherwise the map won't work properly. This is a follow-up for 1f91f22 Updates #5087

valyala · 2023-10-16T11:59:53Z

@misutoth , could you build vmselect from the commit 4278b00 according to these instructions and verify whether it resolves the mutex contention during querying under your workload?

valyala · 2023-10-16T22:46:56Z

FYI, the fix, which reduces mutex contention at vmselect on machines with big number of CPU cores has been included in VictoriaMetrics v1.93.6. @misutoth , could you verify whether this release really fixes the issue with high mutex contention at vmselect under your workload?

P.S. The fix will be also included in the upcoming v1.95.0 release.

misutoth · 2023-10-17T11:08:49Z

All looks good with v1.93.6. Thank you all! ❤️

valyala · 2023-10-18T07:59:46Z

@misutoth , thanks for the confirmation that the fix resolves scalability issues for vmselect running on systems with big number of CPU cores. FYI, there is yet another pull request, which can improve vmselect scalability on systems with big number of CPU cores further - #5036 .

valyala · 2023-10-18T10:50:57Z

Closing the issue as fixed

- The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates #5087

* app/vmselect: limit the number of parallel workers by 32 The change should improve performance and memory usage during query processing on machines with big number of CPU cores. The number of parallel workers for query processing is controlled via `-search.maxWorkersPerQuery` command-line flag. By default, the number of workers is limited by the number of available CPU cores, but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip - The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates #5087 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>

valyala · 2023-10-27T11:32:30Z

FYI, the next release of VictoriaMetrics will support -search.maxWorkersPerQuery command-line flag, which can be used for fine-tuning query performance on systems with more than 16 CPU cores for particular workloads. See the description of this flag in the Troubleshooting docs for more details.

…ictoriaMetrics#5119) reduce lock contention for heavy aggregation requests previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests. Now instead of interning, new string is created. It may increase CPU and memory usage for some cases. VictoriaMetrics#5087

… string when storing a value by map key The assigned map key shouldn't change over time, otherwise the map won't work properly. This is a follow-up for 1f91f22 Updates VictoriaMetrics#5087

…rics#5195) * app/vmselect: limit the number of parallel workers by 32 The change should improve performance and memory usage during query processing on machines with big number of CPU cores. The number of parallel workers for query processing is controlled via `-search.maxWorkersPerQuery` command-line flag. By default, the number of workers is limited by the number of available CPU cores, but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip - The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates VictoriaMetrics#5087 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>

valyala · 2023-11-15T21:23:28Z

The -search.maxWorkersPerQuery command-line flag is available starting from v1.95.0 release.

misutoth added the bug Something isn't working label Sep 28, 2023

zekker6 self-assigned this Sep 29, 2023

hagen1778 added the TBD To Be Done label Oct 2, 2023

f41gh7 assigned f41gh7 and unassigned zekker6 Oct 2, 2023

f41gh7 mentioned this issue Oct 3, 2023

app/vmselect: reduce lock contention for heavy aggregation requests #5119

Merged

f41gh7 added a commit that referenced this issue Oct 4, 2023

Merge branch 'cluster' into gh-5087

b27beb2

valyala removed the TBD To Be Done label Oct 16, 2023

valyala mentioned this issue Oct 18, 2023

vmselect/netstorage: introduce search.maxWorkersPerQuery for reduci… #5036

Closed

valyala closed this as completed Oct 18, 2023

valyala mentioned this issue Nov 16, 2023

app/vmstorage,lib/storage: WIP add command-line flags to tweak storage/metricName cache #5302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutex contention in vmselect #5087

Mutex contention in vmselect #5087

misutoth commented Sep 28, 2023

zekker6 commented Sep 29, 2023

misutoth commented Oct 2, 2023

hagen1778 commented Oct 3, 2023

misutoth commented Oct 4, 2023

valyala commented Oct 16, 2023

valyala commented Oct 16, 2023

misutoth commented Oct 17, 2023

valyala commented Oct 18, 2023

valyala commented Oct 18, 2023

valyala commented Oct 27, 2023

valyala commented Nov 15, 2023

Mutex contention in vmselect #5087

Mutex contention in vmselect #5087

Comments

misutoth commented Sep 28, 2023

Describe the bug

To Reproduce

Version

Logs

Screenshots

Used command-line flags

Additional information

zekker6 commented Sep 29, 2023

misutoth commented Oct 2, 2023

hagen1778 commented Oct 3, 2023

misutoth commented Oct 4, 2023

valyala commented Oct 16, 2023

valyala commented Oct 16, 2023

misutoth commented Oct 17, 2023

valyala commented Oct 18, 2023

valyala commented Oct 18, 2023

valyala commented Oct 27, 2023

valyala commented Nov 15, 2023