fix(extraction): index nested non-submodule git repos (#193)#217
Merged
Conversation
`codegraph init -i` from a git super-repo containing independent nested git repositories (not submodules) reported "No files found to index": git ls-files reports an embedded repo only as an opaque `subdir/` entry and never lists its files. Detect embedded repos via that trailing-slash signal and recurse `git ls-files` into each, indexing tracked + untracked source and honoring each repo's own .gitignore. Reported by @timxx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #193 —
codegraph init -ireported "No files found to index" when run from a top-level workspace that is itself a git repo and contains nested independent git repositories (separate clones, not submodules — a common CMake "super-repo" layout). Indexing each sub-repo individually worked; indexing from the workspace root did not.Root cause
git ls-filestreats an embedded (nested, non-submodule) repo as opaque: tracked output skips it, and untracked output reports it only as asubdir/directory entry, never its files. The filtered result was empty, so the git fast-path ingetGitVisibleFilesreturned zero source files.Fix
Extracted a recursive
collectGitFiles()that detects embedded repos via git's trailing-slash signal (a normal untracked dir expands to its files; only an embedded repo shows assubdir/), guards with a.gitexistence check, and re-runsgit ls-filesinside each — indexing tracked + untracked source with correct prefixed paths, honoring each repo's own.gitignore. Composes with the existing--recurse-submoduleshandling (#147).Per the documented "respect
.gitignoreat all levels" principle, a sub-repo that the parent repo's.gitignoreexcludes is intentionally still skipped.Tests
Nested non-submodule git reposblock: embedded repos discovered from a git super-repo (committed and untracked source), and per-repo.gitignorerespected.codegraph init -iagainst the issue's exact layout now indexes the nested files.Reported by @timxx.
🤖 Generated with Claude Code