refactor: unified language registry — one source of truth, zero drift#130
Merged
jamestexas merged 7 commits intomainfrom Mar 25, 2026
Merged
refactor: unified language registry — one source of truth, zero drift#130jamestexas merged 7 commits intomainfrom
jamestexas merged 7 commits intomainfrom
Conversation
…switches - langForExt (engine.go) → thin wrapper over lang.ForExt - GetLanguage (engine.go) → thin wrapper over lang.ForName (deprecated) - DetectLanguageFromExt (language.go) → thin wrapper over lang.ForExt - GetLanguageProfile + LanguageProfile + enrichHCLNode (language.go) → deleted, sitter_flatten.go uses lang.ForName().EnrichNode directly - sourceExtensions (watcher.go) → deleted, isSourceFile uses lang.IsSourceExt - mount.go newCallExtractor → uses lang.ForName instead of ingest.GetLanguage - All 18 tree-sitter grammar imports removed from engine.go (now only in internal/lang)
Replaces 8-language switch with lang.ForPath() — adds toml, elixir, java, c, cpp, ruby, php, kotlin, swift, scala to write-back validation. Removes 8 direct tree-sitter grammar imports from validate.go.
…rdcoded - schemas.go: presetSchemas built from lang.Registry at init - infer.go: sourceCodePresets built from lang.Registry at init, detectProjectLanguages + inferLanguages use lang.ForExt - mount.go: 40-line infer switch → 3-line lang.ForExt lookup, 18 tree-sitter grammar imports removed - config.go: sentinelFiles built from lang.Registry at init, detectProjectType uses lang.ForExt for all languages (was 3)
…sing fallback, missing RecordFile Extract processTreeSitterResult() shared by both parallel and sequential tree-sitter ingestion paths. Fixes three bugs in the sequential path: 1. Broken file IDs now use SHA256(path) instead of basename (no collision) 2. Unmatched files route to _project_files/ instead of being silently dropped 3. RecordFile is called for incremental re-ingestion caching
…l via internal/lang
There was a problem hiding this comment.
Pull request overview
Refactors language detection/mapping across the codebase to use a single internal/lang registry, eliminating duplicated extension/name switch statements and aligning ingestion, watcher behavior, schema presets, and project detection on one canonical source of truth.
Changes:
- Introduces
internal/langwith aLanguageregistry and derived lookup helpers (ForExt,ForName,ForPath,IsSourceExt,Extensions) plus tests. - Updates ingest pipeline, watcher, writeback validation, and AST flattening/enrichment to delegate language/grammar selection to
internal/lang. - Derives CLI preset schema lists and project-type detection sentinel files from the registry instead of hardcoded maps/switches.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
internal/lang/lang.go |
New central language registry + lookup helpers; migrates HCL enrichment here. |
internal/lang/lang_test.go |
Adds tests to validate registry completeness, aliases, extensions, sorting, and enrichment hooks. |
internal/ingest/engine.go |
Replaces per-language switches with registry lookups; extracts processTreeSitterResult to dedupe logic. |
internal/ingest/language.go |
Collapses DetectLanguageFromExt into a thin wrapper over the registry; removes profile/enrichment types. |
internal/ingest/sitter_flatten.go |
Switches enrichment lookup to lang.ForName(...).EnrichNode. |
internal/ingest/watcher.go |
Replaces local sourceExtensions set with lang.IsSourceExt. |
internal/writeback/validate.go |
Delegates extension→grammar mapping to lang.ForPath(...).Grammar(). |
internal/linter/linter.go |
Uses registry-provided Go grammar instead of direct tree-sitter import. |
cmd/schemas.go |
Builds presetSchemas from lang.Registry at init time. |
cmd/infer.go |
Builds source-code preset mapping from lang.Registry; uses registry in language detection and parsing. |
cmd/mount.go |
Replaces large infer switch with lang.ForExt lookup; updates call extractor to use registry. |
cmd/config.go |
Derives sentinel files and extension counting from lang.Registry for project-type detection. |
cmd/build.go |
Uses lang.ForName("go").Grammar() instead of direct Go grammar import. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces 10 independent language/extension switch statements with a single
internal/langregistry. Adding a new language now means editing ONE file.What changed
internal/lang/lang.goLanguagestruct,Registry, derived lookups (ForExt,ForName,ForPath,IsSourceExt,Extensions)internal/lang/lang_test.gointernal/ingest/engine.golangForExt+GetLanguage→ thin wrappers overlang;processTreeSitterResultextracted (dedup)internal/ingest/language.goDetectLanguageFromExt→ 4-line wrapper; deletedGetLanguageProfile,enrichHCLNode,LanguageProfileinternal/ingest/sitter_flatten.golang.ForName().EnrichNodeinstead of deletedGetLanguageProfileinternal/ingest/watcher.gosourceExtensionsmap;isSourceFile→lang.IsSourceExt()internal/writeback/validate.goLanguageForPath→ 4-line wrapper overlang.ForPath()(was 8 langs, now 18)cmd/schemas.gopresetSchemasderived fromlang.Registryininit()cmd/infer.gosourceCodePresetsderived fromlang.Registry; useslang.ForExtcmd/mount.golang.ForExt()lookupcmd/config.godetectProjectType+sentinelFilesderived fromlang.Registry(was 3 langs, now 18)cmd/build.gogolang.GetLanguage()→lang.ForName("go").Grammar()internal/linter/linter.gogolang.GetLanguage()→lang.ForName("go").Grammar()Bugs fixed
--inferbroken for these_project_filesfallback — sequential path silently dropped unmatched fileslangForExt/DetectLanguageFromExt/GetLanguage— three copies of the same switchTest plan
internal/lang— 13 tests (completeness, all extensions, aliases, case-insensitivity, sorted output, no duplicates, EnrichNode)grep -r 'case ".go"' internal/ cmd/returns zero matches (no more switches)lang.ForName("hcl")returns terraform (backward compat alias)task checkpasses (fmt + vet + lint + test + validate)Bead
mache-7gp