Skip to content

Pusher OOM: glibc malloc fragmentation — switch to jemalloc #6938

@beastoin

Description

@beastoin

Problem

Pusher pods hit the 4.5GB memory limit and get OOMKilled. After the deque fix (#6762), Python object queues are properly bounded, but RSS still grows linearly until OOM.

Root Cause

Live-process profiling via gdb injection into PID 1 confirmed:

  • 0 bytearrays and 0 bytes objects leaking — Python properly frees audio buffers via reference counting
  • glibc malloc fragmentation is the actual cause:
    • malloc arena: ~580MB allocated from OS
    • Actually in-use: ~270MB
    • ~305MB (53%) is freed but trapped in fragmented arena chunks glibc can't return to OS
    • malloc_trim(0) only reclaims ~6MB (topmost chunk) — interior fragments are stuck
    • Arena keeps expanding because new allocations can't reuse fragmented free space

The pattern: each WebSocket connection continuously allocates → extends → copies → frees bytearray audio buffers. glibc's per-thread arenas fragment under this high-churn pattern, and freed memory is never returned to the OS.

Solution

LD_PRELOAD=libjemalloc.so.2 — industry standard fix for this exact pattern.

jemalloc uses thread-local caches + size-class slabs that eliminate the fragmentation glibc creates. Used by Redis (default since 2.4), Firefox, and most long-running servers with high-churn allocations.

Implementation

One line in the pusher Dockerfile:

RUN apt-get install -y libjemalloc2
ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2

No code changes needed. Drop-in replacement at the allocator level.

Expected result

RSS stays flat near actual in-use (~300MB) instead of growing unbounded to 4.5GB.

Verification

After deploy, monitor pod RSS over 24h — should plateau instead of linear growth. Check with:

kubectl exec <pod> -- python3 -c "
with open('/proc/self/status') as f:
    for l in f:
        if 'VmRSS' in l: print(l.strip())
"

Evidence

  • Profiling method: gdb -batch -p 1PyGILState_EnsurePyRun_SimpleFile (gc.get_objects + mallinfo2)
  • Two snapshots 10 min apart: GC-tracked objects grew only ~10MB, but RSS grew ~60MB — all in malloc arena expansion
  • mallinfo2() confirmed 53% fragmentation ratio
  • All deque queues bounded (maxlen=20), coroutine/task counts stable and proportional to connection count

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions