Attempts to configure VM for small memory footprint don't yield expected results #6276

aprospero · 2024-05-15T11:22:43Z

Is your question request related to a specific component?

VictoriaMetrics

Describe the question in detail

Abstract

I am evaluating VM on a embedded device with limited resources. My ultimate goal is to only allocate 20-30MiB RAM to VM.

Setup

Platform:

armv7 single core
512 MB RAM
1GB Flash

OS:

yocto dunfell
kernel: 5.4.219
go runtime 1.4

VM Version:

victoria-metrics-20240301-013527-tags-v1.99.0-0-g9cd4b0537

Test data:

1.5 month
160 series
5min sample rate
retention period 5 month

Test setup:

no ingestion
single queries for varying series over varying timespans
OpenTSDB over HTTP via query_range API and raw data as json line protocol via export API (both yield similr results)

Command line flags:

varying, always in conjunction with -opentsdbHTTPListenAddr=:4242 -retentionPeriod=5
-memory.allowedPercent=5
-memory.allowedBytes=30MiB
-search.maxMemoryPerQuery=4MiB -search.maxConcurrentRequests=1

Also tried but only sporadic:

-http.disableResponseCompression
-internStringDisableCache
-loggerLevel="PANIC"
-prevCacheRemovalPercent=0.8
-search.maxConcurrentRequests=1
-search.maxExportSeries=1000
-search.queryStats.lastQueriesCount=0
-search.maxWorkersPerQuery=1
-search.maxUniqueTimeseries=200
-search.maxTSDBStatusSeries=1

Regardless of the combination of listed command line flags the result is always pretty much the same (see below).

Observed behavior

the RSS page count allocated by vm starts after startup at around 60MB (which is already way more than expected) and begins to continually rise when proceeding with the benchmark. This behaviour goes on until the system runs out of free memory pages and the kernel kills the vm process.

A typical VM startup log looks like this:

 /usr/bin/vm -opentsdbHTTPListenAddr=:4242 -retentionPeriod=5 -memory.allowedBytes=30MiB -search.maxConcurrentRequests=1 -search.maxMemoryPerQuery=4MiB
2024-05-15T10:13:54.082Z	info	VictoriaMetrics/lib/logger/flag.go:12	build version: victoria-metrics-20240301-013527-tags-v1.99.0-0-g9cd4b0537
2024-05-15T10:13:54.085Z	info	VictoriaMetrics/lib/logger/flag.go:13	command-line flags
2024-05-15T10:13:54.089Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -memory.allowedBytes="30MiB"
2024-05-15T10:13:54.091Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -opentsdbHTTPListenAddr=":4242"
2024-05-15T10:13:54.095Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -retentionPeriod="5"
2024-05-15T10:13:54.097Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -search.maxConcurrentRequests="1"
2024-05-15T10:13:54.098Z	info	VictoriaMetrics/lib/logger/flag.go:20	  -search.maxMemoryPerQuery="4MiB"
2024-05-15T10:13:54.099Z	info	VictoriaMetrics/app/victoria-metrics/main.go:73	starting VictoriaMetrics at "[:8428]"...
2024-05-15T10:13:54.101Z	info	VictoriaMetrics/app/vmstorage/main.go:106	opening storage at "victoria-metrics-data" with -retentionPeriod=5
2024-05-15T10:13:54.130Z	info	VictoriaMetrics/lib/memory/memory.go:46	limiting caches to 31457280 bytes, leaving 492752896 bytes to the OS according to -memory.allowedBytes=30MiB
2024-05-15T10:13:55.855Z	info	VictoriaMetrics/lib/storage/storage.go:958	discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/curr_hour_metric_ids, since it contains outdated hour; got 476583; want 476602
2024-05-15T10:13:55.859Z	info	VictoriaMetrics/lib/storage/storage.go:958	discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/prev_hour_metric_ids, since it contains outdated hour; got 476582; want 476601
2024-05-15T10:13:56.138Z	info	VictoriaMetrics/lib/storage/storage.go:919	discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/next_day_metric_ids_v2, since it contains data for stale date; got 19857; want 19858
2024-05-15T10:13:56.834Z	info	VictoriaMetrics/app/vmstorage/main.go:120	successfully opened storage "victoria-metrics-data" in 2.731 seconds; partsCount: 34; blocksCount: 4998; rowsCount: 1817056; sizeBytes: 1351094
2024-05-15T10:13:56.852Z	info	VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:126	loading rollupResult cache from "victoria-metrics-data/cache/rollupResult"...
2024-05-15T10:13:58.504Z	info	VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:155	loaded rollupResult cache from "victoria-metrics-data/cache/rollupResult" in 1.644 seconds; entriesCount: 459, sizeBytes: 20119552
2024-05-15T10:13:58.508Z	info	VictoriaMetrics/lib/ingestserver/opentsdbhttp/server.go:35	starting HTTP OpenTSDB server at ":4242"
2024-05-15T10:13:58.516Z	info	VictoriaMetrics/app/victoria-metrics/main.go:84	started VictoriaMetrics in 4.415 seconds
2024-05-15T10:13:58.526Z	info	VictoriaMetrics/lib/httpserver/httpserver.go:118	starting server at http://127.0.0.1:8428/
2024-05-15T10:13:58.528Z	info	VictoriaMetrics/lib/httpserver/httpserver.go:119	pprof handlers are exposed at http://127.0.0.1:8428/debug/pprof/
2024/05/15 10:30:33 ERROR: metrics: cannot read process_io_* metrics from "/proc/self/io", so these metrics won't be updated until the error is fixed; see https://github.com/VictoriaMetrics/metrics/issues/42 ; The error: open /proc/self/io: no such file or directory
Killed

The typical output of our Benchmark looks like this:

VictoriaMetrics Benchmark
The system clock ticks at 0.001 µs, steadiness false. The steady clock ticks at 0.001 µs.

10000 Queries for   1 measurements over     5 minutes took (ms) min/avg/max:     2.63/    7.87/  496.86, median:     6.80, standard deviation:     6.73. Resultcount was (pts) min/avg/max:        0/      0.96/       1, median:        1, standard deviation:     0.20.
10000 Queries for   1 measurements over    60 minutes took (ms) min/avg/max:     2.89/    8.19/   66.47, median:     7.15, standard deviation:     4.24. Resultcount was (pts) min/avg/max:        0/     11.54/      17, median:       12, standard deviation:     2.41.
10000 Queries for   1 measurements over  1440 minutes took (ms) min/avg/max:     3.32/   15.31/  153.49, median:    13.12, standard deviation:     8.42. Resultcount was (pts) min/avg/max:        0/    275.96/     333, median:      288, standard deviation:    58.26.
  100 Queries for   1 measurements over 86400 minutes took (ms) min/avg/max:   527.22/ 1109.86/ 1741.76, median:  1107.97, standard deviation:   220.34. Resultcount was (pts) min/avg/max:        1/  11438.42/   12778, median:    11771, standard deviation:  2015.65.
10000 Queries for   4 measurements over     5 minutes took (ms) min/avg/max:     8.77/   23.99/  147.99, median:    19.02, standard deviation:    14.52. Resultcount was (pts) min/avg/max:        1/      3.83/       4, median:        4, standard deviation:     0.40.
10000 Queries for   4 measurements over    60 minutes took (ms) min/avg/max:     9.22/   27.19/  233.38, median:    21.39, standard deviation:    16.00. Resultcount was (pts) min/avg/max:       12/     45.90/      53, median:       48, standard deviation:     5.08.
 5871 - 1152Request error: (1) Failed to connect to localhost port 8428: Connection refused

The Benchmark starts with tiny queries for only one series over the minimum timespan of 5min. It then raises the timespan and series count step by step.

Phase	RSS
startup	59 MByte
after 10.000 Queries 1 series over last 5 min	64 MByte
after 10.000 Queries 1 series over last 1 hour	67 MByte
after 10.000 Queries 1 series over last 1 day	78 MByte
after 100 Queries 1 series over last 1.5 month¹	87 MByte
after 10.000 Queries 4 series over last 5 min	128 MByte
after 10.000 Queries 4 series over last 1 hour	168 MByte
after 10.000 Queries 4 series over last 1 day	178 MByte

¹ divided in 45 smaller consecutive 1 day queries

Observed VM Metrics

I was asked for the following metrics to add to the issue description. I'm happy to provide more if necessary.

Metric	Value
vm_allowed_memory_bytes	31457280
vm_available_memory_bytes	524210176

Expected Behaviour

I understand that the real memory consumption does not alone depend on the provided command line flags but what definitely was unxepected to see was the RSS raising indefinitely until no free memory pages are available anymore.

I have expected VM would

reject queries that can't be handled with the provided memory, or
garbage collect / clear caches when reaching a certain memory allocation.

Further comments

I even tried disabling the Cache with -search.disableCache, but even that didn't change anything in VMs behaviour memory wise, despite the query duration went up in average.
I'm not familiar with go programming and go runtime behaviour - although I read about the greedy allocation scheme. I tried a run with environment variables set GOMEMLIMIT=60MiB and GOGC=100 but again to no vavail, VM behaviour was again the same.

Epilog

I'm out of ideas how to tame VM regarding memory consumption. I'd say I don't expect too much from it limiting all queries to 1day timespans and only a hand full of series. Even the biggest queries result in around 300 data points.

If anyone has an idea or even a comment that maybe I'm in vain since it won't run with that fistful of RAM is appreciated!

Troubleshooting docs

General - https://docs.victoriametrics.com/troubleshooting/
vmagent - https://docs.victoriametrics.com/vmagent/#troubleshooting
vmalert - https://docs.victoriametrics.com/vmalert/#troubleshooting

The text was updated successfully, but these errors were encountered:

AndrewChubatiuk · 2024-05-16T08:25:18Z

hey @aprospero
Thanks for a question
Could you please share a memory profile?

aprospero · 2024-05-16T13:33:37Z

Hey AndrewChubatiuk, thanks for having a look into it!

Could you specify more in detail what you need? Do you mean a certain VM profiler info? I'm not familiar with golang, so how can I extract that info?

aprospero · 2024-05-27T07:28:54Z

@AndrewChubatiuk Bump

aprospero added the question The question issue label May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempts to configure VM for small memory footprint don't yield expected results #6276

Attempts to configure VM for small memory footprint don't yield expected results #6276

aprospero commented May 15, 2024

AndrewChubatiuk commented May 16, 2024

aprospero commented May 16, 2024

aprospero commented May 27, 2024

Attempts to configure VM for small memory footprint don't yield expected results #6276

Attempts to configure VM for small memory footprint don't yield expected results #6276

Comments

aprospero commented May 15, 2024

Is your question request related to a specific component?

Describe the question in detail

Abstract

Setup

Observed behavior

Observed VM Metrics

Expected Behaviour

Further comments

Epilog

Troubleshooting docs

AndrewChubatiuk commented May 16, 2024

aprospero commented May 16, 2024

aprospero commented May 27, 2024