Summary
WalkDir spawns one goroutine per directory entry, which in turn calls WalkDir on each subdirectory, recursively:
for _, file := range files {
childFile := NewFile(filepath.Join(f.Path, file.Name()))
wg.Add(1)
go func() {
defer wg.Done()
fr.HandleFile(childFile)
}()
}
There is no concurrency cap. Each leaf goroutine may hold a *os.File, a temp file, and a strings.Builder containing the file's contents in memory.
Impact (Reliability: High)
- On a large tree, the process can hit
EMFILE (open files limit, default 1024) and fatally abort all in-progress writes.
- Memory usage scales with the size of the largest concurrently-open files times the parallelism, not just the largest file.
- Massive parallelism actually slows the tool on rotating disks (and even on SSDs in many cases) due to seek thrashing — this is a perf bug as well.
- The tool is supposed to be "fast" but a single misuse on a 1M-file tree is likely to OOM or hit FD limits.
Suggested Fix
Use a bounded worker pool sized by either runtime.NumCPU() or a small multiple thereof, with a shared queue of paths. Walk synchronously per directory and dispatch only file-content work to the pool, keeping FD usage proportional to pool size.
Files
find_replace.go:54-73 (WalkDir)
Summary
WalkDirspawns one goroutine per directory entry, which in turn callsWalkDiron each subdirectory, recursively:There is no concurrency cap. Each leaf goroutine may hold a
*os.File, a temp file, and astrings.Buildercontaining the file's contents in memory.Impact (Reliability: High)
EMFILE(open files limit, default 1024) and fatally abort all in-progress writes.Suggested Fix
Use a bounded worker pool sized by either
runtime.NumCPU()or a small multiple thereof, with a shared queue of paths. Walk synchronously per directory and dispatch only file-content work to the pool, keeping FD usage proportional to pool size.Files
find_replace.go:54-73(WalkDir)