Our Clickhouse nodes are suffering from OOM Kill frequently, and we have observed that the Clickhouse process's RSS is higher than our max_server_memory_usage_to_ram_ratio = 0.9 configuration.
I conducted some experiments with a Clickhouse Server running in a Docker container and scraped the Clickhouse process's RSS using top command.
This would lead to Docker container killed by the OS.
Following are error logs from our production cluster.
{"CODE_FILE":"src/core/unit.c","CODE_FUNC":"unit_log_failure","CODE_LINE":"5564","INVOCATION_ID":"4baa083c21b8431b84bc6c10261f651b","MESSAGE_ID":"d9b373ed55a64feb8242e02dbe79a49c","PRIORITY":"4","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"systemd","TID":"1","UNIT":"clickhouse-server.service","UNIT_RESULT":"oom-kill","_BOOT_ID":"f9ad45b850bc4ff5b0ccf0fe309d1394","_CAP_EFFECTIVE":"1ffffffffff","_CMDLINE":"/sbin/init","_COMM":"systemd","_EXE":"/usr/lib/systemd/systemd","_GID":"0","_MACHINE_ID":"0aabf60b01e14472a0138d6cae5fc8d5","_PID":"1","_SELINUX_CONTEXT":"unconfined\n","_SOURCE_REALTIME_TIMESTAMP":"1742086974123395","_SYSTEMD_CGROUP":"/init.scope","_SYSTEMD_SLICE":"-.slice","_SYSTEMD_UNIT":"init.scope","_TRANSPORT":"journal","_UID":"0","__MONOTONIC_TIMESTAMP":"2112531994843","__REALTIME_TIMESTAMP":"1742086974123678","message":"clickhouse-server.service: Failed with result 'oom-kill'."}
{"CODE_FILE":"src/core/unit.c","CODE_FUNC":"unit_log_failure","CODE_LINE":"5564","INVOCATION_ID":"4baa083c21b8431b84bc6c10261f651b","MESSAGE_ID":"d9b373ed55a64feb8242e02dbe79a49c","PRIORITY":"4","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"systemd","TID":"1","UNIT":"clickhouse-server.service","UNIT_RESULT":"oom-kill","_BOOT_ID":"f9ad45b850bc4ff5b0ccf0fe309d1394","_CAP_EFFECTIVE":"1ffffffffff","_CMDLINE":"/sbin/init","_COMM":"systemd","_EXE":"/usr/lib/systemd/systemd","_GID":"0","_MACHINE_ID":"0aabf60b01e14472a0138d6cae5fc8d5","_PID":"1","_SELINUX_CONTEXT":"unconfined\n","_SOURCE_REALTIME_TIMESTAMP":"1742086974123395","_SYSTEMD_CGROUP":"/init.scope","_SYSTEMD_SLICE":"-.slice","_SYSTEMD_UNIT":"init.scope","_TRANSPORT":"journal","_UID":"0","__MONOTONIC_TIMESTAMP":"2112531994843","__REALTIME_TIMESTAMP":"1742086974123678","message":"clickhouse-server.service: Failed with result 'oom-kill'."}
Company or project name
DeepL SE
Describe the unexpected behaviour
Our Clickhouse nodes are suffering from OOM Kill frequently, and we have observed that the Clickhouse process's RSS is higher than our
max_server_memory_usage_to_ram_ratio = 0.9configuration.How to reproduce
I conducted some experiments with a Clickhouse Server running in a Docker container and scraped the Clickhouse process's RSS using
topcommand.Prepare test data
Run lightweight delete
This would lead to Docker container killed by the OS.
Plot the Clickhouse memory consumption with data from
topcommand. It is observed that Clickhouse RSS was frequently above the 90%max_server_memory_usage_to_ram_ratio.Expected behavior
The Clickhouse process should honor the
max_server_memory_usage_to_ram_ratioand kill memory-intensive queries or background processes to control its memory consumption.Error message and/or stacktrace
Following are error logs from our production cluster.
Additional context
Clickhouse Version:
24.12.5.81