Skip to content

[Bug] IotDB pods crash with OOM because we calculate the memory based on the node not pod resources #17764

@MileaRobertStefan

Description

@MileaRobertStefan

Search before asking

  • I searched in the issues and found nothing similar.

Version

latest

Describe the bug and provide the minimal reproduce step

Start IOTDB datanode and confignode pods with memory limits, for example 8 GB, and allocate 8 GB of resources to each pod.

Image

Pods keep crashing with OOM errors because the JVM is trying to allocate 16 GB of memory.

What did you expect to see?

            # When running in a container/pod, use cgroup memory limit instead of host memory
            if [ -f /sys/fs/cgroup/memory.max ]; then
                # cgroup v2
                cgroup_mem=`cat /sys/fs/cgroup/memory.max`
                if [ "$cgroup_mem" != "max" ]; then
                    cgroup_mem_in_mb=`expr $cgroup_mem / 1024 / 1024`
                    if [ "$cgroup_mem_in_mb" -lt "$system_memory_in_mb" ]; then
                        system_memory_in_mb=$cgroup_mem_in_mb
                    fi
                fi
            elif [ -f /sys/fs/cgroup/memory/memory.limit_in_bytes ]; then
                # cgroup v1
                cgroup_mem=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes`
                cgroup_mem_in_mb=`expr $cgroup_mem / 1024 / 1024`
                if [ "$cgroup_mem_in_mb" -lt "$system_memory_in_mb" ]; then
                    system_memory_in_mb=$cgroup_mem_in_mb
                fi
            fi

8GB

I would expect the memory to be auto-calculated based on the pod resources (8 GB), not the node resources (32 GB).

# scripts\conf\datanode-env.sh
system_memory_in_mb=`free -m | sed -n '2p' | awk '{print \$2}'` returns 32 GB.

What did you see instead?

32 GB and a lot of pod restarts

Anything else?

# iotdb\WORKING_CONFIGS.md

## 2) JVM Memory (Linux)

Edit these files:
- conf/confignode-env.sh
- conf/datanode-env.sh

Set MEMORY_SIZE explicitly to avoid auto-sizing surprises.

### ConfigNode memory

```bash
# conf/confignode-env.sh
MEMORY_SIZE=2G

DataNode memory

# conf/datanode-env.sh
MEMORY_SIZE=8G

Why are there no env varibales for this setting?
Do you expect the clouad env to manualy go and change this limit?

This is a hack, and we should not have to do this in a pod

- IOTDB_JMX_OPTS=-Xmx4G

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions