Skip to content

Unbounded goroutine fan-out exhausts file descriptors and memory on large trees #7

@dolph

Description

@dolph

Summary

WalkDir spawns one goroutine per directory entry, which in turn calls WalkDir on each subdirectory, recursively:

for _, file := range files {
    childFile := NewFile(filepath.Join(f.Path, file.Name()))
    wg.Add(1)
    go func() {
        defer wg.Done()
        fr.HandleFile(childFile)
    }()
}

There is no concurrency cap. Each leaf goroutine may hold a *os.File, a temp file, and a strings.Builder containing the file's contents in memory.

Impact (Reliability: High)

  • On a large tree, the process can hit EMFILE (open files limit, default 1024) and fatally abort all in-progress writes.
  • Memory usage scales with the size of the largest concurrently-open files times the parallelism, not just the largest file.
  • Massive parallelism actually slows the tool on rotating disks (and even on SSDs in many cases) due to seek thrashing — this is a perf bug as well.
  • The tool is supposed to be "fast" but a single misuse on a 1M-file tree is likely to OOM or hit FD limits.

Suggested Fix

Use a bounded worker pool sized by either runtime.NumCPU() or a small multiple thereof, with a shared queue of paths. Walk synchronously per directory and dispatch only file-content work to the pool, keeping FD usage proportional to pool size.

Files

  • find_replace.go:54-73 (WalkDir)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions