Skip to content

pi-fff / FileFinder({ aiMode: true }) on macOS holds ~1 fd per watched directory, breaks posix_spawn in large trees #439

@dbachelder

Description

@dbachelder

Summary

Filing this against fff because the trigger is @ff-labs/fff-node, but I encountered it through the @ff-labs/pi-fff extension for @mariozechner/pi-coding-agent — happy to re-route or cross-post if you'd prefer it live elsewhere.

In a large monorepo (~4,400 subdirectories, 100k+ files), the extension's call to FileFinder.create({ basePath, aiMode: true }) + waitForScan causes the host Node process to retain ~1 file descriptor per watched directory for the entire session on macOS. In our case the host accumulated ~7,100 open DIR handles, eventually wedging child_process.spawn with EBADF and breaking unrelated tooling for the rest of the session. Removing the pi-fff extension (and therefore the fff-node finder) drops the host to ~31 total fds and the wedge disappears entirely.

Environment

  • macOS (Apple Silicon)

  • Host: Node.js 25.3.0, running pi-coding-agent with the @ff-labs/pi-fff extension loaded

  • @ff-labs/pi-fff calls @ff-labs/fff-node like this:

    import { FileFinder } from "@ff-labs/fff-node";
    
    const result = FileFinder.create({
      basePath: cwd,
      frecencyDbPath,
      historyDbPath,
      aiMode: true,
    });
    finder = result.value;
    await finder.waitForScan(15000);

    destroyFinder() is only invoked on session_shutdown, so the finder lives for the entire multi-hour session.

  • ulimit -n: 1,048,575 (so this is not RLIMIT exhaustion)

  • Working directory: a ~100k-file, ~4,400-subdirectory monorepo

Observed behavior

lsof against the host Node process after waitForScan completes, with pi-fff loaded:

Total fds:                      7,148
  DIR     7,117
  REG         8
  CHR         7
  KQUEUE      5
  PIPE        4
  unix        3
  systm       1
  NPOLICY     1
  IPv4        1

Of the 7,117 DIR entries:

  • 4,360 are distinct subdirectories under the monorepo root — i.e. roughly one per directory in the tree.
  • The remaining ~2,780 are common ancestor paths (/Users/dan/, /System/Volumes/, etc.) duplicated by lsof's path-resolution display, all referencing the same kernel handle table.

After unloading pi-fff (and therefore fff-node) and restarting the host, the same process holds ~31 total fds with 4 DIR and 3 KQUEUE entries — a ~230× drop. No other variables changed.

How it manifests downstream

This part is not a bug in fff itself, but explains why it became visible:

  1. With ~7k fds inherited, macOS posix_spawn eventually returns EBADF (not EMFILE) for child_process.spawn. The most plausible explanation is that one of the inherited kqueue/watcher fds has been auto-closed by the kernel (e.g. on a watched directory being deleted/unmounted) while libuv still holds a stale mirror; posix_spawn walking the parent fd table then trips on it.
  2. Once a single spawn fails this way, libuv's stdio pipe pair is half-allocated, and every subsequent spawn in that process fails with EBADF — including trivial echo / pwd. The session never recovers without a process restart.

So a single heavy grep -r (which spikes transient fds and apparently nudges things over the edge) permanently bricks child_process.spawn for the rest of the session, even though fff's own NAPI tools (ffgrep, fffind, etc.) keep working fine because they never fork.

Reproduction sketch

# 1. Start any Node host that loads @ff-labs/fff-node with aiMode: true
#    over a directory tree with several thousand subdirectories.
PID=<host pid>
lsof -p "$PID" | awk '$5=="DIR"' | wc -l
# → roughly equal to the number of subdirectories under basePath

# 2. From inside the host, run a few hundred child_process.spawn calls,
#    or one heavy `grep -r` over the same tree. Eventually:
#    Error: spawn EBADF
# 3. Every subsequent spawn in that process also fails with EBADF.

What I'm reporting

  • Per-directory fd retention scales linearly with directory count under basePath on macOS when aiMode: true.
  • For trees in the low thousands of directories this is invisible; somewhere in the few-thousand range it starts interacting badly with posix_spawn in the same process.
  • I don't have visibility into whether the watcher uses kqueue, FSEvents, or something else under the hood — happy to run any diagnostics that would be useful (e.g. dtruss, fs_usage, or a debug build).

Not asking for a specific fix; just putting the data on the record in case it's useful for prioritization or for anyone else hitting spawn EBADF near fff-node (especially when reached via pi-fff).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions