Skip to content

O(n²) string concatenation in readLines causes OOM on large stdout (>1 MB) #251

@mohan-garimella

Description

@mohan-garimella

Bug: O(n²) string concatenation in readLines causes OOM on large stdout

Summary

The readLines() function in @e2b/code-interpreter uses buffer += chunk to accumulate the HTTP response body. This is O(n²) string concatenation in JavaScript — each += copies the entire existing buffer plus the new chunk into a fresh string. For large stdout outputs (>1 MB), this causes massive memory amplification and multi-second event loop stalls, leading to OOM kills on the host process.

Environment

  • @e2b/code-interpreter: 2.3.3 (also confirmed on 2.4.0 — same code)
  • Node.js: v22
  • OS: Linux (Kubernetes pods, 4 GB memory limit)

Reproduction

  1. Create a sandbox and run code that produces ~20 MB of stdout:
import { Sandbox } from '@e2b/code-interpreter';

const sandbox = await Sandbox.create();
// Generate ~22 MB of stdout
const execution = await sandbox.runCode(`print("x" * 22_000_000)`);
  1. Monitor the host Node.js process memory. The heap will spike to 1–1.5 GB and the event loop will stall for 10–20 seconds.

Root Cause

In js/src/utils.ts, readLines():

buffer += new TextDecoder().decode(value);  // line 14

Each iteration creates a new string of size len(buffer) + len(chunk) while the old buffer is still referenced. For a 22 MB response arriving in ~1,400 chunks of ~16 KB:

  • Total bytes copied: Σ(i × 16KB) for i = 1..1400 ≈ 15.7 GB of string allocations
  • Peak heap: 1.5 GB+ (V8 can't GC fast enough under allocation pressure)
  • Event loop stalls: 10–20 seconds (GC pauses)

Evidence

We captured a V8 heap snapshot on a production worker after processing 22 MB of stdout. A single retained string — {"type":"stdout","text":"..."} — consumed 119,920 kB (117 MB, 37% of the heap). The retainer chain traces directly to the readLines async generator's parameters_and_registers (the buffer local variable), held through the ReadableStream reader → Promise chain → fetch Request body.

Measured Impact

Metric Current (buffer +=) Fixed (array + join)
Peak heap 211 MB 20 MB
Peak RSS 329 MB 95 MB
Elapsed time 29.8s 4.1s
Memory amplification 9x 0.9x

(Standalone benchmark with 22 MB stdout. Production workers with existing heap pressure show 80x amplification.)

Suggested Fix

Replace quadratic string concatenation with array-based buffering:

// js/src/utils.ts – readLines()
export async function* readLines(stream: ReadableStream<Uint8Array>): AsyncGenerator<string> {
  const reader = stream.getReader()
  const decoder = new TextDecoder()
  const chunks: string[] = []    // ← array instead of string
  let searchStart = 0

  try {
    while (true) {
      const { done, value } = await reader.read()
      if (value !== undefined) {
        chunks.push(decoder.decode(value, { stream: true }))
      }
      if (done) {
        const remaining = chunks.join('')
        if (remaining.length > 0) {
          yield remaining
        }
        break
      }

      // Check for newlines in accumulated data
      const buffer = chunks.join('')
      let newlineIdx: number
      let start = 0
      while ((newlineIdx = buffer.indexOf('\n', start)) !== -1) {
        yield buffer.slice(start, newlineIdx)
        start = newlineIdx + 1
      }
      // Keep only the remainder after the last newline
      chunks.length = 0
      if (start < buffer.length) {
        chunks.push(buffer.slice(start))
      }
    }
  } finally {
    reader.releaseLock()
  }
}

This changes the complexity from O(n²) to O(n) and eliminates the OOM risk for large outputs.

Happy to open a PR if this approach looks good.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions