fix: watcher FD leak — respect .gitignore + unified skip list#129
fix: watcher FD leak — respect .gitignore + unified skip list#129jamestexas merged 4 commits intomainfrom
Conversation
…reset schemas
Add curated preset schemas for every compiled-in tree-sitter grammar so
auto-detection produces language-aware projections instead of falling
back to generic FCA inference.
New schemas: javascript, typescript, java, c, cpp, ruby, php, kotlin,
swift, scala, elixir, yaml (12 new).
Improved: rust (added use imports), terraform (added terraform{} and
moved{} blocks).
Wired: sourceCodePresets now maps all 18 detected languages to their
preset schemas. presetSchemas registry updated (21 total presets).
Closes: mache-6bb5e7
…YAML depth
- Align langForExt to return "terraform" (not "hcl") matching
DetectLanguageFromExt, so multi-language namespace filtering works.
Also updates GetLanguage, RegisterAddressRefQuery, and tests.
- Remove moved{} block from terraform schema (no unique name → collision).
Keep terraform{} which is typically singleton per module.
- YAML: anchor query to document root via stream>document>block_node
path so only top-level mapping pairs are captured.
- Filed mache-a21b69 for selector compilation test (follow-up).
The file watcher's shouldIgnoreDir had a diverged skip list from the engine's ShouldSkipDir. It didn't skip target/, dist/, or build/. On macOS (kqueue), each watched directory consumes an FD — watching ley-line's 11K target/ subdirs cascaded to 129K leaked FDs. Fix: make the watcher use the engine's canonical ShouldSkipDir list as a baseline, then layer on .gitignore rules via WithGitignore so project-specific ignores (target/, .terraform/, custom build dirs) are respected automatically without maintaining a hardcoded list. Changes: - Export LoadGitignore + add Engine.Gitignore() accessor - Add WithGitignore WatcherOption, wire in buildServeGraph - Unify shouldIgnoreDir/shouldIgnorePath as Watcher methods - Add vendor to ShouldSkipDir canonical list - Regression tests: TestWatcher_TargetIgnored, TestWatcher_GitignoreSkipsDirs
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent runaway file-descriptor usage in the filesystem watcher (notably on macOS/kqueue) by unifying directory-skip behavior with the ingestion engine and honoring .gitignore rules, while also broadening language preset/schema support (including a Terraform/HCL language-name unification).
Changes:
- Watcher now uses the engine’s canonical
ShouldSkipDirlist and can optionally apply.gitignorerules viaWithGitignore. - Engine exports access to its loaded gitignore matcher and expands
ShouldSkipDir(addsvendor); Terraform/HCL language naming is unified to"terraform"in ingestion/query registration. - Adds many new embedded preset schemas and updates preset mappings to cover more tree-sitter languages.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/ingest/watcher.go | Adds gitignore-aware skipping and switches ignore logic to use canonical ShouldSkipDir. |
| internal/ingest/watcher_test.go | Adds regression tests for skipping target/dist/build and honoring .gitignore. |
| internal/ingest/gitignore.go | Exports LoadGitignore for reuse by watcher/engine wiring. |
| internal/ingest/engine.go | Unifies .tf/.hcl language name to "terraform", adds vendor to ShouldSkipDir, exposes Gitignore() accessor. |
| internal/ingest/engine_languages.go | Updates Terraform/HCL address-ref query registration key. |
| internal/ingest/address_refs_test.go | Updates tests/examples to use "terraform" language name. |
| cmd/serve.go | Wires engine.Gitignore() into watcher via WithGitignore. |
| cmd/schemas.go | Expands embedded preset schema registry to many additional languages. |
| cmd/infer.go | Expands sourceCodePresets to match the broader preset schema set. |
| cmd/schemas/yaml.json | Adds preset schema for YAML. |
| cmd/schemas/typescript.json | Adds preset schema for TypeScript. |
| cmd/schemas/terraform.json | Extends Terraform preset schema (adds terraform block extraction). |
| cmd/schemas/swift.json | Adds preset schema for Swift. |
| cmd/schemas/scala.json | Adds preset schema for Scala. |
| cmd/schemas/rust.json | Extends Rust preset schema (adds import extraction). |
| cmd/schemas/ruby.json | Adds preset schema for Ruby. |
| cmd/schemas/php.json | Adds preset schema for PHP. |
| cmd/schemas/kotlin.json | Adds preset schema for Kotlin. |
| cmd/schemas/javascript.json | Adds preset schema for JavaScript. |
| cmd/schemas/java.json | Adds preset schema for Java. |
| cmd/schemas/elixir.json | Adds preset schema for Elixir. |
| cmd/schemas/cpp.json | Adds preset schema for C++. |
| cmd/schemas/c.json | Adds preset schema for C. |
Comments suppressed due to low confidence (1)
internal/ingest/gitignore.go:33
LoadGitignore’s docstring says it “Returns nil if no .gitignore exists at all”, but the current implementation always returns a non-nil matcher even when the root.gitignoreis missing and no nested patterns are found. Either update the docstring, or (preferably) returnnilwhen both the root patterns andm.nestedare empty so callers can skip extraRel/Matchwork.
// LoadGitignore reads .gitignore from rootDir and discovers nested .gitignore
// files in the tree. Returns nil if no .gitignore exists at all.
func LoadGitignore(rootDir string) *gitignoreMatcher {
m := &gitignoreMatcher{
rootDir: rootDir,
nested: make(map[string][]gitignorePattern),
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // HCL: variable "VAR_NAME" { ... } → env:VAR_NAME | ||
| RegisterAddressRefQuery("hcl", "env", ` | ||
| RegisterAddressRefQuery("terraform", "env", ` | ||
| (block | ||
| (identifier) @_type | ||
| (string_lit) @ref |
There was a problem hiding this comment.
HCL/Terraform files now use langName="terraform" (via langForExt), but the non-default Tree-sitter ref/call query is still only registered under "hcl". That makes ExtractCalls(..., "terraform") fall back to defaultCallQuery, which doesn’t compile against the HCL grammar (repeated query compile errors + missing refs). Register the same RegisterRefQuery under "terraform" as well (and optionally keep "hcl" for backwards compatibility).
There was a problem hiding this comment.
Fixed in 7a6a282 — changed RegisterRefQuery("hcl", ...) to RegisterRefQuery("terraform", ...) to match the langForExt rename.
| // Gitignore returns the gitignore matcher loaded during Ingest, or nil if none | ||
| // was loaded. Pass this to WithGitignore when creating a Watcher so the watcher | ||
| // skips the same directories the engine does. | ||
| func (e *Engine) Gitignore() *gitignoreMatcher { |
There was a problem hiding this comment.
Engine.Gitignore() is exported but returns an unexported concrete type (*gitignoreMatcher). This makes it hard for other packages to store/forward the value except via type inference and passing it straight back into ingest APIs. Consider exporting a GitignoreMatcher type (or returning a small interface like Match(rel string, isDir bool) bool) and updating WithGitignore / Engine.Gitignore to use that exported surface.
| // Gitignore returns the gitignore matcher loaded during Ingest, or nil if none | |
| // was loaded. Pass this to WithGitignore when creating a Watcher so the watcher | |
| // skips the same directories the engine does. | |
| func (e *Engine) Gitignore() *gitignoreMatcher { | |
| // GitignoreMatcher describes the minimal interface for matching paths against | |
| // .gitignore-style rules. The concrete implementation is internal. | |
| type GitignoreMatcher interface { | |
| Match(rel string, isDir bool) bool | |
| } | |
| // Gitignore returns the gitignore matcher loaded during Ingest, or nil if none | |
| // was loaded. Pass this to WithGitignore when creating a Watcher so the watcher | |
| // skips the same directories the engine does. | |
| func (e *Engine) Gitignore() GitignoreMatcher { |
There was a problem hiding this comment.
Fixed in 7a6a282 — exported GitignoreMatcher as an interface (Match(rel string, isDir bool) bool). WithGitignore and Engine.Gitignore() now use the exported surface.
| // Check gitignore for files inside ignored directories. | ||
| if w.gitignore != nil { | ||
| rel, err := filepath.Rel(w.rootDir, path) | ||
| if err == nil { | ||
| rel = filepath.ToSlash(rel) | ||
| if w.gitignore.Match(rel, false) { |
There was a problem hiding this comment.
The comment "Check gitignore for files inside ignored directories" is narrower than the actual behavior: Match(rel, false) will ignore any path matched by gitignore (including file patterns like *.log), not just files under ignored directories. Consider updating the comment to reflect the real semantics so future changes/tests don’t encode the wrong expectation.
There was a problem hiding this comment.
Fixed in 7a6a282 — updated comment to accurately reflect that it matches any gitignore rule, not just directory-scoped patterns.
| var presetSchemas = map[string]string{ | ||
| "go": "schemas/go.json", | ||
| "python": "schemas/python.json", | ||
| "rust": "schemas/rust.json", | ||
| "terraform": "schemas/terraform.json", | ||
| "sql": "schemas/sql.json", | ||
| "toml": "schemas/toml.json", | ||
| // Source-code languages (mapped from DetectLanguageFromExt) | ||
| "go": "schemas/go.json", | ||
| "python": "schemas/python.json", | ||
| "rust": "schemas/rust.json", | ||
| "terraform": "schemas/terraform.json", |
There was a problem hiding this comment.
The PR description’s “What changed” table doesn’t mention the large expansion of embedded preset schemas / auto-detection mappings added here (many new schemas/*.json plus new entries in presetSchemas). If this is intentional, please update the PR description (or split into a separate PR) so the review scope and rollout risk are clear.
…, comment
- RegisterRefQuery("hcl" → "terraform") to match langForExt rename
- Export GitignoreMatcher as interface for cross-package use
- Fix misleading comment in shouldIgnorePath
Summary
shouldIgnoreDirhad a diverged skip list from the engine'sShouldSkipDir— didn't skiptarget/,dist/, orbuild/target/subdirs cascaded to 129K leaked FDsShouldSkipDir(canonical list) +.gitignorerules viaWithGitignore, so project-specific ignores are respected automaticallyWhat changed
gitignore.goLoadGitignoreengine.goGitignore()accessor, addvendortoShouldSkipDirwatcher.goWithGitignoreoption,shouldIgnoreDir/shouldIgnorePathas methods using gitignore + canonical skip listserve.goengine.Gitignore()into watcherwatcher_test.goTestWatcher_TargetIgnored,TestWatcher_GitignoreSkipsDirs+ updated existing testsTest plan
TestWatcher_TargetIgnored— target/, dist/, build/ dirs don't trigger callbacksTestWatcher_GitignoreSkipsDirs— custom gitignore patterns respected by watcherTestWatcher_VendorIgnored— existing test still passesTestShouldIgnoreDir— updated to verify target/dist/build in canonical listtask testgreen