Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempts to configure VM for small memory footprint don't yield expected results #6276

Open
1 of 3 tasks
aprospero opened this issue May 15, 2024 · 3 comments
Open
1 of 3 tasks
Labels
question The question issue

Comments

@aprospero
Copy link

Is your question request related to a specific component?

VictoriaMetrics

Describe the question in detail

Abstract

I am evaluating VM on a embedded device with limited resources. My ultimate goal is to only allocate 20-30MiB RAM to VM.

Setup

Platform:

  • armv7 single core
  • 512 MB RAM
  • 1GB Flash

OS:

  • yocto dunfell
  • kernel: 5.4.219
  • go runtime 1.4

VM Version:

  • victoria-metrics-20240301-013527-tags-v1.99.0-0-g9cd4b0537

Test data:

  • 1.5 month
  • 160 series
  • 5min sample rate
  • retention period 5 month

Test setup:

  • no ingestion
  • single queries for varying series over varying timespans
  • OpenTSDB over HTTP via query_range API and raw data as json line protocol via export API (both yield similr results)

Command line flags:

  • varying, always in conjunction with -opentsdbHTTPListenAddr=:4242 -retentionPeriod=5
  • -memory.allowedPercent=5
  • -memory.allowedBytes=30MiB
  • -search.maxMemoryPerQuery=4MiB -search.maxConcurrentRequests=1

Also tried but only sporadic:

  • -http.disableResponseCompression
  • -internStringDisableCache
  • -loggerLevel="PANIC"
  • -prevCacheRemovalPercent=0.8
  • -search.maxConcurrentRequests=1
  • -search.maxExportSeries=1000
  • -search.queryStats.lastQueriesCount=0
  • -search.maxWorkersPerQuery=1
  • -search.maxUniqueTimeseries=200
  • -search.maxTSDBStatusSeries=1

Regardless of the combination of listed command line flags the result is always pretty much the same (see below).

Observed behavior

the RSS page count allocated by vm starts after startup at around 60MB (which is already way more than expected) and begins to continually rise when proceeding with the benchmark. This behaviour goes on until the system runs out of free memory pages and the kernel kills the vm process.

A typical VM startup log looks like this:

 /usr/bin/vm -opentsdbHTTPListenAddr=:4242 -retentionPeriod=5 -memory.allowedBytes=30MiB -search.maxConcurrentRequests=1 -search.maxMemoryPerQuery=4MiB
2024-05-15T10:13:54.082Z	info	VictoriaMetrics/lib/logger/flag.go:12	build version: victoria-metrics-20240301-013527-tags-v1.99.0-0-g9cd4b0537
2024-05-15T10:13:54.085Z	info	VictoriaMetrics/lib/logger/flag.go:13	command-line flags
2024-05-15T10:13:54.089Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -memory.allowedBytes="30MiB"
2024-05-15T10:13:54.091Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -opentsdbHTTPListenAddr=":4242"
2024-05-15T10:13:54.095Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -retentionPeriod="5"
2024-05-15T10:13:54.097Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -search.maxConcurrentRequests="1"
2024-05-15T10:13:54.098Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -search.maxMemoryPerQuery="4MiB"
2024-05-15T10:13:54.099Z	info	VictoriaMetrics/app/victoria-metrics/main.go:73	starting VictoriaMetrics at "[:8428]"...
2024-05-15T10:13:54.101Z	info	VictoriaMetrics/app/vmstorage/main.go:106	opening storage at "victoria-metrics-data" with -retentionPeriod=5
2024-05-15T10:13:54.130Z	info	VictoriaMetrics/lib/memory/memory.go:46	limiting caches to 31457280 bytes, leaving 492752896 bytes to the OS according to -memory.allowedBytes=30MiB
2024-05-15T10:13:55.855Z	info	VictoriaMetrics/lib/storage/storage.go:958	discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/curr_hour_metric_ids, since it contains outdated hour; got 476583; want 476602
2024-05-15T10:13:55.859Z	info	VictoriaMetrics/lib/storage/storage.go:958	discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/prev_hour_metric_ids, since it contains outdated hour; got 476582; want 476601
2024-05-15T10:13:56.138Z	info	VictoriaMetrics/lib/storage/storage.go:919	discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/next_day_metric_ids_v2, since it contains data for stale date; got 19857; want 19858
2024-05-15T10:13:56.834Z	info	VictoriaMetrics/app/vmstorage/main.go:120	successfully opened storage "victoria-metrics-data" in 2.731 seconds; partsCount: 34; blocksCount: 4998; rowsCount: 1817056; sizeBytes: 1351094
2024-05-15T10:13:56.852Z	info	VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:126	loading rollupResult cache from "victoria-metrics-data/cache/rollupResult"...
2024-05-15T10:13:58.504Z	info	VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:155	loaded rollupResult cache from "victoria-metrics-data/cache/rollupResult" in 1.644 seconds; entriesCount: 459, sizeBytes: 20119552
2024-05-15T10:13:58.508Z	info	VictoriaMetrics/lib/ingestserver/opentsdbhttp/server.go:35	starting HTTP OpenTSDB server at ":4242"
2024-05-15T10:13:58.516Z	info	VictoriaMetrics/app/victoria-metrics/main.go:84	started VictoriaMetrics in 4.415 seconds
2024-05-15T10:13:58.526Z	info	VictoriaMetrics/lib/httpserver/httpserver.go:118	starting server at http://127.0.0.1:8428/
2024-05-15T10:13:58.528Z	info	VictoriaMetrics/lib/httpserver/httpserver.go:119	pprof handlers are exposed at http://127.0.0.1:8428/debug/pprof/
2024/05/15 10:30:33 ERROR: metrics: cannot read process_io_* metrics from "/proc/self/io", so these metrics won't be updated until the error is fixed; see https://github.com/VictoriaMetrics/metrics/issues/42 ; The error: open /proc/self/io: no such file or directory
Killed

The typical output of our Benchmark looks like this:

VictoriaMetrics Benchmark
The system clock ticks at 0.001 µs, steadiness false. The steady clock ticks at 0.001 µs.

10000 Queries for   1 measurements over     5 minutes took (ms) min/avg/max:     2.63/    7.87/  496.86, median:     6.80, standard deviation:     6.73. Resultcount was (pts) min/avg/max:        0/      0.96/       1, median:        1, standard deviation:     0.20.
10000 Queries for   1 measurements over    60 minutes took (ms) min/avg/max:     2.89/    8.19/   66.47, median:     7.15, standard deviation:     4.24. Resultcount was (pts) min/avg/max:        0/     11.54/      17, median:       12, standard deviation:     2.41.
10000 Queries for   1 measurements over  1440 minutes took (ms) min/avg/max:     3.32/   15.31/  153.49, median:    13.12, standard deviation:     8.42. Resultcount was (pts) min/avg/max:        0/    275.96/     333, median:      288, standard deviation:    58.26.
  100 Queries for   1 measurements over 86400 minutes took (ms) min/avg/max:   527.22/ 1109.86/ 1741.76, median:  1107.97, standard deviation:   220.34. Resultcount was (pts) min/avg/max:        1/  11438.42/   12778, median:    11771, standard deviation:  2015.65.
10000 Queries for   4 measurements over     5 minutes took (ms) min/avg/max:     8.77/   23.99/  147.99, median:    19.02, standard deviation:    14.52. Resultcount was (pts) min/avg/max:        1/      3.83/       4, median:        4, standard deviation:     0.40.
10000 Queries for   4 measurements over    60 minutes took (ms) min/avg/max:     9.22/   27.19/  233.38, median:    21.39, standard deviation:    16.00. Resultcount was (pts) min/avg/max:       12/     45.90/      53, median:       48, standard deviation:     5.08.
 5871 - 1152Request error: (1) Failed to connect to localhost port 8428: Connection refused

The Benchmark starts with tiny queries for only one series over the minimum timespan of 5min. It then raises the timespan and series count step by step.

Phase RSS
startup 59 MByte
after 10.000 Queries 1 series over last 5 min 64 MByte
after 10.000 Queries 1 series over last 1 hour 67 MByte
after 10.000 Queries 1 series over last 1 day 78 MByte
after 100 Queries 1 series over last 1.5 month1 87 MByte
after 10.000 Queries 4 series over last 5 min 128 MByte
after 10.000 Queries 4 series over last 1 hour 168 MByte
after 10.000 Queries 4 series over last 1 day 178 MByte

1 divided in 45 smaller consecutive 1 day queries

Observed VM Metrics

I was asked for the following metrics to add to the issue description. I'm happy to provide more if necessary.

Metric Value
vm_allowed_memory_bytes 31457280
vm_available_memory_bytes 524210176

Expected Behaviour

I understand that the real memory consumption does not alone depend on the provided command line flags but what definitely was unxepected to see was the RSS raising indefinitely until no free memory pages are available anymore.

I have expected VM would

  • reject queries that can't be handled with the provided memory, or
  • garbage collect / clear caches when reaching a certain memory allocation.

Further comments

  • I even tried disabling the Cache with -search.disableCache, but even that didn't change anything in VMs behaviour memory wise, despite the query duration went up in average.
  • I'm not familiar with go programming and go runtime behaviour - although I read about the greedy allocation scheme. I tried a run with environment variables set GOMEMLIMIT=60MiB and GOGC=100 but again to no vavail, VM behaviour was again the same.

Epilog

I'm out of ideas how to tame VM regarding memory consumption. I'd say I don't expect too much from it limiting all queries to 1day timespans and only a hand full of series. Even the biggest queries result in around 300 data points.

If anyone has an idea or even a comment that maybe I'm in vain since it won't run with that fistful of RAM is appreciated!

Troubleshooting docs

@aprospero aprospero added the question The question issue label May 15, 2024
@AndrewChubatiuk
Copy link
Contributor

hey @aprospero
Thanks for a question
Could you please share a memory profile?

@aprospero
Copy link
Author

Hey AndrewChubatiuk, thanks for having a look into it!

Could you specify more in detail what you need? Do you mean a certain VM profiler info? I'm not familiar with golang, so how can I extract that info?

@aprospero
Copy link
Author

@AndrewChubatiuk Bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question The question issue
Projects
None yet
Development

No branches or pull requests

2 participants