Series limit applied at vmagent, but churn rate is high #3660

blesswinsamuel · 2023-01-16T19:53:19Z

Describe the bug

When I use series limit to limit the series ingested into victoriametrics, the automatic metrics generated by vmagent shows that new series are being dropped (scrape_series_limit_samples_dropped), which is expected. But, checking the victoriametrics grafana dashboard, it shows a high churn rate, which suggests that even though vmagent shows the series as dropped, they are ingested into victoriametrics. I expect the samples shown by vmagent as dropped should not be ingested into victoriametrics.

I used avalanche (https://github.com/prometheus-community/avalanche) to run this test.

To Reproduce

Start victoriametrics:

./victoria-metrics-prod

Start avalanche:

git clone https://github.com/prometheus-community/avalanche.git
cd avalanche
go run cmd/avalanche.go --metric-count=500 --series-count=30 --port=9101

Start vmagent with seriesLimitPerTarget setting:

scrape-config.yaml

global:
  scrape_interval: 30s
  scrape_timeout: 10s
scrape_configs:
  - job_name: 'vmagent'
    scrape_interval: 30s
    static_configs:
      - targets:
        - 'localhost:8429'
        labels:
          pod: vmagent-pod
  - job_name: 'victoriametrics'
    scrape_interval: 30s
    static_configs:
      - targets:
        - 'localhost:8428'
        labels:
          pod: victoriametrics-pod
  - job_name: 'avalanche'
    scrape_interval: 30s
    static_configs:
      - targets:
        - 'localhost:9101'
        labels:
          pod: avalanche-pod

./vmagent-prod -remoteWrite.url "http://localhost:8428/api/v1/write" -promscrape.config /tmp/scrape-config.yaml -promscrape.seriesLimitPerTarget 5000 -promscrape.streamParse=true

Version

❯ ./vmutils-darwin-amd64-v1.86.1/vmagent-prod --version
vmagent-20230111-093903-tags-v1.86.1-0-g351fc152e

❯ ./victoria-metrics-prod --version
victoria-metrics-20230111-093558-tags-v1.86.1-0-g351fc152e

Logs

Not relevant to this issue.

Screenshots

The automatic metrics generated by vmagent show that samples are being dropped (this is as expected because avalanche is generating completely new series every minute):

The churn rate panel in VictoriaMetrics cluster grafana dashboard is consistently high and new series over 24h is increasing:

Here, churn rate = 250 series/sec = 250x60 series/min = 15,000 series/min
The number of series generated by Avalanche for every scrape (based on the above configuration) is 500x30=15,000 samples. The default configuration of avalanche is to generate entirely new series every 60s. So, looking at the churn rate, none of the series is being dropped by vmagent.

I expect the churn rate to be close to 0 here since all the new metrics emitted by avalanche should be dropped by vmagent.

Used command-line flags

Mentioned under the "To Reproduce" section.

Additional information

No response

The text was updated successfully, but these errors were encountered:

hagen1778 · 2023-01-17T14:51:01Z

Hello! Can confirm the issue.

There are two changes here: 1. Do not account for `sw.Config.NoStaleMarkers`. Otherwise, disabling staleness markers would also mean disabling of `seriesLimiter`; 2. Prevent sending staleness markers if series limit has been exceeded. To send staleness markers we need to check which series disappeared between current and previous scrapes. But when series limit is dropping series - there is no an easy way to calculate it anymore. Hence, wrong markers could be send to remote storage. Although, each series which was rejected by `seriesLimiter` will be then accounted as a new time series by VM TSDB, after it was received in a form of stale marker. See #3660 Signed-off-by: hagen1778 <roman@victoriametrics.com>

@hagen1778

Fix the following issues: - Series limit wasn't applied when staleness tracking was disabled. - Series limit didn't prevent from sending staleness markers for new series exceeding the limit. Updates #3660 Thanks to @hagen1778 for the initial attempt to fix the issue at #3665

@hagen1778

Fix the following issues: - Series limit wasn't applied when staleness tracking was disabled. - Series limit didn't prevent from sending staleness markers for new series exceeding the limit. Updates #3660 Thanks to @hagen1778 for the initial attempt to fix the issue at #3665

valyala · 2023-01-17T18:34:08Z

@blesswinsamuel , thanks for filing the detailed bugreport! The issue should be fixed in the commit 289af65 . This commit will be included in the next release. In the mean time you can build vmagent or single-node VictoriaMetrics from this commit and verify whether they correctly apply the series limit. See build instructions below:

blesswinsamuel · 2023-01-18T11:22:50Z

@valyala @hagen1778 Thanks for fixing this so fast! I built vmagent from the commit a844b97, and it is working as expected. Thank you!

One more thing - memory usage is going up a lot when running avalanche with the same configuration (total 15k series that keeps changing every minute). I can raise a separate issue to track this if you'd like.

I have streamParse enabled, which according to the docs, should help when targets export a big number of metrics. Unfortunately, I cleared the data in victoriametrics before trying out the new vmagent, so I don't have the memory usage details before the update.

hagen1778 · 2023-01-18T11:29:37Z

vmagent can require more memory than usual if seriesLimitPerTarget is enabled. To check whether a specific series has already been seen before vmagent maintains bloom filter in memory. The filter requires memory proportionally to the seriesLimitPerTarget limit (the higher the limit, the more memory is needed). Bloom filter is created per each target (the more targets, the more memory is needed). So it could be the reason why it needs more memory.

To verify this please capture the memory profile and attach to the issue. Yes, creating a new issue for memory usage is preferable.

blesswinsamuel · 2023-01-18T20:02:52Z

@hagen1778 thanks for your response. I created a new issue #3675 with more details about the memory spike. After doing some tests, it looks like this memory spike happens when the target exposes a large number of new series it hasn't seen previously on every scrape.

valyala · 2023-01-18T21:55:41Z

vmagent should properly apply series limit starting from v1.86.2. Closing this issue as fixed.

…markers … (#5577)" This reverts commit cfec258. Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled. The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled. This prevents from storing the last response at scrapeWork.processScrapedData(). It looks like the reverted commit could also return back the issue #3660 Updates #5577

blesswinsamuel added the bug Something isn't working label Jan 16, 2023

hagen1778 added the vmagent label Jan 17, 2023

hagen1778 mentioned this issue Jan 17, 2023

lib/promscrape: update series-limit logic #3665

Closed

blesswinsamuel mentioned this issue Jan 18, 2023

Memory usage spikes up when scraping from target exposing different series on every scrape (with series limits set) #3675

Closed

valyala closed this as completed Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series limit applied at vmagent, but churn rate is high #3660

Series limit applied at vmagent, but churn rate is high #3660

blesswinsamuel commented Jan 16, 2023 •

edited

hagen1778 commented Jan 17, 2023

valyala commented Jan 17, 2023

blesswinsamuel commented Jan 18, 2023

hagen1778 commented Jan 18, 2023 •

edited

blesswinsamuel commented Jan 18, 2023

valyala commented Jan 18, 2023

Series limit applied at vmagent, but churn rate is high #3660

Series limit applied at vmagent, but churn rate is high #3660

Comments

blesswinsamuel commented Jan 16, 2023 • edited

Describe the bug

To Reproduce

Start victoriametrics:

Start avalanche:

Start vmagent with seriesLimitPerTarget setting:

Version

Logs

Screenshots

Used command-line flags

Additional information

hagen1778 commented Jan 17, 2023

valyala commented Jan 17, 2023

blesswinsamuel commented Jan 18, 2023

hagen1778 commented Jan 18, 2023 • edited

blesswinsamuel commented Jan 18, 2023

valyala commented Jan 18, 2023

blesswinsamuel commented Jan 16, 2023 •

edited

hagen1778 commented Jan 18, 2023 •

edited