Skip to content

refactor: unified language registry — one source of truth, zero drift#130

Merged
jamestexas merged 7 commits intomainfrom
feat/unified-lang-registry
Mar 25, 2026
Merged

refactor: unified language registry — one source of truth, zero drift#130
jamestexas merged 7 commits intomainfrom
feat/unified-lang-registry

Conversation

@jamestexas
Copy link
Copy Markdown
Contributor

Summary

Replaces 10 independent language/extension switch statements with a single internal/lang registry. Adding a new language now means editing ONE file.

  • -45 net lines (580 added, 625 deleted) — added a whole package and still deleted more
  • 9 bugs fixed across watcher, writeback, mount, config, and ingest dedup
  • 18 languages fully supported everywhere (was 3-14 depending on which switch you hit)

What changed

File Change
internal/lang/lang.go NEW — Language struct, Registry, derived lookups (ForExt, ForName, ForPath, IsSourceExt, Extensions)
internal/lang/lang_test.go NEW — exhaustive tests (completeness, aliases, case-insensitivity, no duplicate extensions)
internal/ingest/engine.go langForExt + GetLanguage → thin wrappers over lang; processTreeSitterResult extracted (dedup)
internal/ingest/language.go DetectLanguageFromExt → 4-line wrapper; deleted GetLanguageProfile, enrichHCLNode, LanguageProfile
internal/ingest/sitter_flatten.go Uses lang.ForName().EnrichNode instead of deleted GetLanguageProfile
internal/ingest/watcher.go Deleted sourceExtensions map; isSourceFilelang.IsSourceExt()
internal/writeback/validate.go LanguageForPath → 4-line wrapper over lang.ForPath() (was 8 langs, now 18)
cmd/schemas.go presetSchemas derived from lang.Registry in init()
cmd/infer.go sourceCodePresets derived from lang.Registry; uses lang.ForExt
cmd/mount.go 40-line infer switch → 3-line lang.ForExt() lookup
cmd/config.go detectProjectType + sentinelFiles derived from lang.Registry (was 3 langs, now 18)
cmd/build.go golang.GetLanguage()lang.ForName("go").Grammar()
internal/linter/linter.go golang.GetLanguage()lang.ForName("go").Grammar()

Bugs fixed

  1. Watcher missing 8 languages — Java/C/C++/Ruby/PHP/Kotlin/Swift/Scala file changes silently ignored
  2. Writeback missing 10 languages — validation broken for most non-original languages
  3. mount.go missing elixir, toml--infer broken for these
  4. detectProjectType missing 15 languages — only detected Go/Python/SQL
  5. Broken file ID collision — sequential ingest used basename (collision), now SHA256
  6. Missing _project_files fallback — sequential path silently dropped unmatched files
  7. Missing RecordFile — sequential path broke incremental re-ingestion
  8. HCL ref query under wrong key — registered as "hcl" but langForExt returns "terraform" (fixed in fix: watcher FD leak — respect .gitignore + unified skip list #129)
  9. langForExt / DetectLanguageFromExt / GetLanguage — three copies of the same switch

Test plan

  • internal/lang — 13 tests (completeness, all extensions, aliases, case-insensitivity, sorted output, no duplicates, EnrichNode)
  • All existing tests pass unchanged
  • grep -r 'case ".go"' internal/ cmd/ returns zero matches (no more switches)
  • lang.ForName("hcl") returns terraform (backward compat alias)
  • task check passes (fmt + vet + lint + test + validate)

Bead

mache-7gp

…switches

- langForExt (engine.go) → thin wrapper over lang.ForExt
- GetLanguage (engine.go) → thin wrapper over lang.ForName (deprecated)
- DetectLanguageFromExt (language.go) → thin wrapper over lang.ForExt
- GetLanguageProfile + LanguageProfile + enrichHCLNode (language.go) → deleted,
  sitter_flatten.go uses lang.ForName().EnrichNode directly
- sourceExtensions (watcher.go) → deleted, isSourceFile uses lang.IsSourceExt
- mount.go newCallExtractor → uses lang.ForName instead of ingest.GetLanguage
- All 18 tree-sitter grammar imports removed from engine.go (now only in internal/lang)
Replaces 8-language switch with lang.ForPath() — adds toml, elixir,
java, c, cpp, ruby, php, kotlin, swift, scala to write-back validation.
Removes 8 direct tree-sitter grammar imports from validate.go.
…rdcoded

- schemas.go: presetSchemas built from lang.Registry at init
- infer.go: sourceCodePresets built from lang.Registry at init,
  detectProjectLanguages + inferLanguages use lang.ForExt
- mount.go: 40-line infer switch → 3-line lang.ForExt lookup,
  18 tree-sitter grammar imports removed
- config.go: sentinelFiles built from lang.Registry at init,
  detectProjectType uses lang.ForExt for all languages (was 3)
…sing fallback, missing RecordFile

Extract processTreeSitterResult() shared by both parallel and sequential
tree-sitter ingestion paths. Fixes three bugs in the sequential path:

1. Broken file IDs now use SHA256(path) instead of basename (no collision)
2. Unmatched files route to _project_files/ instead of being silently dropped
3. RecordFile is called for incremental re-ingestion caching
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors language detection/mapping across the codebase to use a single internal/lang registry, eliminating duplicated extension/name switch statements and aligning ingestion, watcher behavior, schema presets, and project detection on one canonical source of truth.

Changes:

  • Introduces internal/lang with a Language registry and derived lookup helpers (ForExt, ForName, ForPath, IsSourceExt, Extensions) plus tests.
  • Updates ingest pipeline, watcher, writeback validation, and AST flattening/enrichment to delegate language/grammar selection to internal/lang.
  • Derives CLI preset schema lists and project-type detection sentinel files from the registry instead of hardcoded maps/switches.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
internal/lang/lang.go New central language registry + lookup helpers; migrates HCL enrichment here.
internal/lang/lang_test.go Adds tests to validate registry completeness, aliases, extensions, sorting, and enrichment hooks.
internal/ingest/engine.go Replaces per-language switches with registry lookups; extracts processTreeSitterResult to dedupe logic.
internal/ingest/language.go Collapses DetectLanguageFromExt into a thin wrapper over the registry; removes profile/enrichment types.
internal/ingest/sitter_flatten.go Switches enrichment lookup to lang.ForName(...).EnrichNode.
internal/ingest/watcher.go Replaces local sourceExtensions set with lang.IsSourceExt.
internal/writeback/validate.go Delegates extension→grammar mapping to lang.ForPath(...).Grammar().
internal/linter/linter.go Uses registry-provided Go grammar instead of direct tree-sitter import.
cmd/schemas.go Builds presetSchemas from lang.Registry at init time.
cmd/infer.go Builds source-code preset mapping from lang.Registry; uses registry in language detection and parsing.
cmd/mount.go Replaces large infer switch with lang.ForExt lookup; updates call extractor to use registry.
cmd/config.go Derives sentinel files and extension counting from lang.Registry for project-type detection.
cmd/build.go Uses lang.ForName("go").Grammar() instead of direct Go grammar import.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/lang/lang.go Outdated
@jamestexas jamestexas merged commit c3ab4f9 into main Mar 25, 2026
14 checks passed
@jamestexas jamestexas deleted the feat/unified-lang-registry branch March 25, 2026 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants