Skip to content

fix: watcher FD leak — respect .gitignore + unified skip list#129

Merged
jamestexas merged 4 commits intomainfrom
fix/watcher-fd-leak
Mar 24, 2026
Merged

fix: watcher FD leak — respect .gitignore + unified skip list#129
jamestexas merged 4 commits intomainfrom
fix/watcher-fd-leak

Conversation

@jamestexas
Copy link
Copy Markdown
Contributor

Summary

  • Watcher's shouldIgnoreDir had a diverged skip list from the engine's ShouldSkipDir — didn't skip target/, dist/, or build/
  • On macOS (kqueue), each watched directory consumes an FD — watching ley-line's 11K target/ subdirs cascaded to 129K leaked FDs
  • Fix: watcher now uses ShouldSkipDir (canonical list) + .gitignore rules via WithGitignore, so project-specific ignores are respected automatically

What changed

File Change
gitignore.go Export LoadGitignore
engine.go Add Gitignore() accessor, add vendor to ShouldSkipDir
watcher.go WithGitignore option, shouldIgnoreDir/shouldIgnorePath as methods using gitignore + canonical skip list
serve.go Wire engine.Gitignore() into watcher
watcher_test.go TestWatcher_TargetIgnored, TestWatcher_GitignoreSkipsDirs + updated existing tests

Test plan

  • TestWatcher_TargetIgnored — target/, dist/, build/ dirs don't trigger callbacks
  • TestWatcher_GitignoreSkipsDirs — custom gitignore patterns respected by watcher
  • TestWatcher_VendorIgnored — existing test still passes
  • TestShouldIgnoreDir — updated to verify target/dist/build in canonical list
  • Full task test green

…reset schemas

Add curated preset schemas for every compiled-in tree-sitter grammar so
auto-detection produces language-aware projections instead of falling
back to generic FCA inference.

New schemas: javascript, typescript, java, c, cpp, ruby, php, kotlin,
swift, scala, elixir, yaml (12 new).

Improved: rust (added use imports), terraform (added terraform{} and
moved{} blocks).

Wired: sourceCodePresets now maps all 18 detected languages to their
preset schemas. presetSchemas registry updated (21 total presets).

Closes: mache-6bb5e7
…YAML depth

- Align langForExt to return "terraform" (not "hcl") matching
  DetectLanguageFromExt, so multi-language namespace filtering works.
  Also updates GetLanguage, RegisterAddressRefQuery, and tests.
- Remove moved{} block from terraform schema (no unique name → collision).
  Keep terraform{} which is typically singleton per module.
- YAML: anchor query to document root via stream>document>block_node
  path so only top-level mapping pairs are captured.
- Filed mache-a21b69 for selector compilation test (follow-up).
The file watcher's shouldIgnoreDir had a diverged skip list from the
engine's ShouldSkipDir. It didn't skip target/, dist/, or build/.
On macOS (kqueue), each watched directory consumes an FD — watching
ley-line's 11K target/ subdirs cascaded to 129K leaked FDs.

Fix: make the watcher use the engine's canonical ShouldSkipDir list
as a baseline, then layer on .gitignore rules via WithGitignore so
project-specific ignores (target/, .terraform/, custom build dirs)
are respected automatically without maintaining a hardcoded list.

Changes:
- Export LoadGitignore + add Engine.Gitignore() accessor
- Add WithGitignore WatcherOption, wire in buildServeGraph
- Unify shouldIgnoreDir/shouldIgnorePath as Watcher methods
- Add vendor to ShouldSkipDir canonical list
- Regression tests: TestWatcher_TargetIgnored, TestWatcher_GitignoreSkipsDirs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent runaway file-descriptor usage in the filesystem watcher (notably on macOS/kqueue) by unifying directory-skip behavior with the ingestion engine and honoring .gitignore rules, while also broadening language preset/schema support (including a Terraform/HCL language-name unification).

Changes:

  • Watcher now uses the engine’s canonical ShouldSkipDir list and can optionally apply .gitignore rules via WithGitignore.
  • Engine exports access to its loaded gitignore matcher and expands ShouldSkipDir (adds vendor); Terraform/HCL language naming is unified to "terraform" in ingestion/query registration.
  • Adds many new embedded preset schemas and updates preset mappings to cover more tree-sitter languages.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/ingest/watcher.go Adds gitignore-aware skipping and switches ignore logic to use canonical ShouldSkipDir.
internal/ingest/watcher_test.go Adds regression tests for skipping target/dist/build and honoring .gitignore.
internal/ingest/gitignore.go Exports LoadGitignore for reuse by watcher/engine wiring.
internal/ingest/engine.go Unifies .tf/.hcl language name to "terraform", adds vendor to ShouldSkipDir, exposes Gitignore() accessor.
internal/ingest/engine_languages.go Updates Terraform/HCL address-ref query registration key.
internal/ingest/address_refs_test.go Updates tests/examples to use "terraform" language name.
cmd/serve.go Wires engine.Gitignore() into watcher via WithGitignore.
cmd/schemas.go Expands embedded preset schema registry to many additional languages.
cmd/infer.go Expands sourceCodePresets to match the broader preset schema set.
cmd/schemas/yaml.json Adds preset schema for YAML.
cmd/schemas/typescript.json Adds preset schema for TypeScript.
cmd/schemas/terraform.json Extends Terraform preset schema (adds terraform block extraction).
cmd/schemas/swift.json Adds preset schema for Swift.
cmd/schemas/scala.json Adds preset schema for Scala.
cmd/schemas/rust.json Extends Rust preset schema (adds import extraction).
cmd/schemas/ruby.json Adds preset schema for Ruby.
cmd/schemas/php.json Adds preset schema for PHP.
cmd/schemas/kotlin.json Adds preset schema for Kotlin.
cmd/schemas/javascript.json Adds preset schema for JavaScript.
cmd/schemas/java.json Adds preset schema for Java.
cmd/schemas/elixir.json Adds preset schema for Elixir.
cmd/schemas/cpp.json Adds preset schema for C++.
cmd/schemas/c.json Adds preset schema for C.
Comments suppressed due to low confidence (1)

internal/ingest/gitignore.go:33

  • LoadGitignore’s docstring says it “Returns nil if no .gitignore exists at all”, but the current implementation always returns a non-nil matcher even when the root .gitignore is missing and no nested patterns are found. Either update the docstring, or (preferably) return nil when both the root patterns and m.nested are empty so callers can skip extra Rel/Match work.
// LoadGitignore reads .gitignore from rootDir and discovers nested .gitignore
// files in the tree. Returns nil if no .gitignore exists at all.
func LoadGitignore(rootDir string) *gitignoreMatcher {
	m := &gitignoreMatcher{
		rootDir: rootDir,
		nested:  make(map[string][]gitignorePattern),
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 74 to 78
// HCL: variable "VAR_NAME" { ... } → env:VAR_NAME
RegisterAddressRefQuery("hcl", "env", `
RegisterAddressRefQuery("terraform", "env", `
(block
(identifier) @_type
(string_lit) @ref
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HCL/Terraform files now use langName="terraform" (via langForExt), but the non-default Tree-sitter ref/call query is still only registered under "hcl". That makes ExtractCalls(..., "terraform") fall back to defaultCallQuery, which doesn’t compile against the HCL grammar (repeated query compile errors + missing refs). Register the same RegisterRefQuery under "terraform" as well (and optionally keep "hcl" for backwards compatibility).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7a6a282 — changed RegisterRefQuery("hcl", ...) to RegisterRefQuery("terraform", ...) to match the langForExt rename.

Comment thread internal/ingest/engine.go Outdated
Comment on lines +287 to +290
// Gitignore returns the gitignore matcher loaded during Ingest, or nil if none
// was loaded. Pass this to WithGitignore when creating a Watcher so the watcher
// skips the same directories the engine does.
func (e *Engine) Gitignore() *gitignoreMatcher {
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Engine.Gitignore() is exported but returns an unexported concrete type (*gitignoreMatcher). This makes it hard for other packages to store/forward the value except via type inference and passing it straight back into ingest APIs. Consider exporting a GitignoreMatcher type (or returning a small interface like Match(rel string, isDir bool) bool) and updating WithGitignore / Engine.Gitignore to use that exported surface.

Suggested change
// Gitignore returns the gitignore matcher loaded during Ingest, or nil if none
// was loaded. Pass this to WithGitignore when creating a Watcher so the watcher
// skips the same directories the engine does.
func (e *Engine) Gitignore() *gitignoreMatcher {
// GitignoreMatcher describes the minimal interface for matching paths against
// .gitignore-style rules. The concrete implementation is internal.
type GitignoreMatcher interface {
Match(rel string, isDir bool) bool
}
// Gitignore returns the gitignore matcher loaded during Ingest, or nil if none
// was loaded. Pass this to WithGitignore when creating a Watcher so the watcher
// skips the same directories the engine does.
func (e *Engine) Gitignore() GitignoreMatcher {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7a6a282 — exported GitignoreMatcher as an interface (Match(rel string, isDir bool) bool). WithGitignore and Engine.Gitignore() now use the exported surface.

Comment thread internal/ingest/watcher.go Outdated
Comment on lines +247 to +252
// Check gitignore for files inside ignored directories.
if w.gitignore != nil {
rel, err := filepath.Rel(w.rootDir, path)
if err == nil {
rel = filepath.ToSlash(rel)
if w.gitignore.Match(rel, false) {
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "Check gitignore for files inside ignored directories" is narrower than the actual behavior: Match(rel, false) will ignore any path matched by gitignore (including file patterns like *.log), not just files under ignored directories. Consider updating the comment to reflect the real semantics so future changes/tests don’t encode the wrong expectation.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7a6a282 — updated comment to accurately reflect that it matches any gitignore rule, not just directory-scoped patterns.

Comment thread cmd/schemas.go
Comment on lines 15 to +20
var presetSchemas = map[string]string{
"go": "schemas/go.json",
"python": "schemas/python.json",
"rust": "schemas/rust.json",
"terraform": "schemas/terraform.json",
"sql": "schemas/sql.json",
"toml": "schemas/toml.json",
// Source-code languages (mapped from DetectLanguageFromExt)
"go": "schemas/go.json",
"python": "schemas/python.json",
"rust": "schemas/rust.json",
"terraform": "schemas/terraform.json",
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description’s “What changed” table doesn’t mention the large expansion of embedded preset schemas / auto-detection mappings added here (many new schemas/*.json plus new entries in presetSchemas). If this is intentional, please update the PR description (or split into a separate PR) so the review scope and rollout risk are clear.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema files are from prior commits on this branch (29476ce, 0c5aa3c) that were already reviewed. This PR's new work is the watcher fix + regression tests only. The diff includes them because the base is main.

…, comment

- RegisterRefQuery("hcl" → "terraform") to match langForExt rename
- Export GitignoreMatcher as interface for cross-package use
- Fix misleading comment in shouldIgnorePath
@jamestexas jamestexas merged commit 98a2f86 into main Mar 24, 2026
14 checks passed
@jamestexas jamestexas deleted the fix/watcher-fd-leak branch March 24, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants