Skip to content

Clickhouse nodes OOM Kill #77705

@Magicbeanbuyer

Description

@Magicbeanbuyer

Company or project name

DeepL SE

Describe the unexpected behaviour

Our Clickhouse nodes are suffering from OOM Kill frequently, and we have observed that the Clickhouse process's RSS is higher than our max_server_memory_usage_to_ram_ratio = 0.9 configuration.

How to reproduce

I conducted some experiments with a Clickhouse Server running in a Docker container and scraped the Clickhouse process's RSS using top command.

Prepare test data

client.query(
"""CREATE OR REPLACE TABLE test_merge_parts 
(
  id UInt64,
  value String
)
ENGINE=MergeTree 
ORDER BY id"""
)

client.query("SYSTEM STOP MERGES test_merge_parts")

for i in range(680):
    client.query(
        f"""INSERT INTO test_merge_parts 
        SELECT
            number as id,
            toString(number) as value
        FROM numbers(700000)"""
    )

Run lightweight delete

client.command("SET optimize_throw_if_noop=1")
client.command("SYSTEM START MERGES test_merge_parts")
client.query("DELETE FROM test_merge_parts WHERE modulo(id, 2) != 0;")

This would lead to Docker container killed by the OS.

Plot the Clickhouse memory consumption with data from top command. It is observed that Clickhouse RSS was frequently above the 90% max_server_memory_usage_to_ram_ratio.

Image

Expected behavior

The Clickhouse process should honor the max_server_memory_usage_to_ram_ratio and kill memory-intensive queries or background processes to control its memory consumption.

Error message and/or stacktrace

Following are error logs from our production cluster.

{"CODE_FILE":"src/core/unit.c","CODE_FUNC":"unit_log_failure","CODE_LINE":"5564","INVOCATION_ID":"4baa083c21b8431b84bc6c10261f651b","MESSAGE_ID":"d9b373ed55a64feb8242e02dbe79a49c","PRIORITY":"4","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"systemd","TID":"1","UNIT":"clickhouse-server.service","UNIT_RESULT":"oom-kill","_BOOT_ID":"f9ad45b850bc4ff5b0ccf0fe309d1394","_CAP_EFFECTIVE":"1ffffffffff","_CMDLINE":"/sbin/init","_COMM":"systemd","_EXE":"/usr/lib/systemd/systemd","_GID":"0","_MACHINE_ID":"0aabf60b01e14472a0138d6cae5fc8d5","_PID":"1","_SELINUX_CONTEXT":"unconfined\n","_SOURCE_REALTIME_TIMESTAMP":"1742086974123395","_SYSTEMD_CGROUP":"/init.scope","_SYSTEMD_SLICE":"-.slice","_SYSTEMD_UNIT":"init.scope","_TRANSPORT":"journal","_UID":"0","__MONOTONIC_TIMESTAMP":"2112531994843","__REALTIME_TIMESTAMP":"1742086974123678","message":"clickhouse-server.service: Failed with result 'oom-kill'."}

{"CODE_FILE":"src/core/unit.c","CODE_FUNC":"unit_log_failure","CODE_LINE":"5564","INVOCATION_ID":"4baa083c21b8431b84bc6c10261f651b","MESSAGE_ID":"d9b373ed55a64feb8242e02dbe79a49c","PRIORITY":"4","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"systemd","TID":"1","UNIT":"clickhouse-server.service","UNIT_RESULT":"oom-kill","_BOOT_ID":"f9ad45b850bc4ff5b0ccf0fe309d1394","_CAP_EFFECTIVE":"1ffffffffff","_CMDLINE":"/sbin/init","_COMM":"systemd","_EXE":"/usr/lib/systemd/systemd","_GID":"0","_MACHINE_ID":"0aabf60b01e14472a0138d6cae5fc8d5","_PID":"1","_SELINUX_CONTEXT":"unconfined\n","_SOURCE_REALTIME_TIMESTAMP":"1742086974123395","_SYSTEMD_CGROUP":"/init.scope","_SYSTEMD_SLICE":"-.slice","_SYSTEMD_UNIT":"init.scope","_TRANSPORT":"journal","_UID":"0","__MONOTONIC_TIMESTAMP":"2112531994843","__REALTIME_TIMESTAMP":"1742086974123678","message":"clickhouse-server.service: Failed with result 'oom-kill'."}

Additional context

Clickhouse Version: 24.12.5.81

Metadata

Metadata

Assignees

No one assigned

    Labels

    unexpected behaviourResult is unexpected, but not entirely wrong at the same time.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions