Skip to content

fix(extraction): index nested non-submodule git repos (#193)#217

Merged
colbymchenry merged 1 commit into
mainfrom
fix/193-nested-git-repos
May 20, 2026
Merged

fix(extraction): index nested non-submodule git repos (#193)#217
colbymchenry merged 1 commit into
mainfrom
fix/193-nested-git-repos

Conversation

@colbymchenry
Copy link
Copy Markdown
Owner

Summary

Fixes #193codegraph init -i reported "No files found to index" when run from a top-level workspace that is itself a git repo and contains nested independent git repositories (separate clones, not submodules — a common CMake "super-repo" layout). Indexing each sub-repo individually worked; indexing from the workspace root did not.

Root cause

git ls-files treats an embedded (nested, non-submodule) repo as opaque: tracked output skips it, and untracked output reports it only as a subdir/ directory entry, never its files. The filtered result was empty, so the git fast-path in getGitVisibleFiles returned zero source files.

Fix

Extracted a recursive collectGitFiles() that detects embedded repos via git's trailing-slash signal (a normal untracked dir expands to its files; only an embedded repo shows as subdir/), guards with a .git existence check, and re-runs git ls-files inside each — indexing tracked + untracked source with correct prefixed paths, honoring each repo's own .gitignore. Composes with the existing --recurse-submodules handling (#147).

Per the documented "respect .gitignore at all levels" principle, a sub-repo that the parent repo's .gitignore excludes is intentionally still skipped.

Tests

  • New Nested non-submodule git repos block: embedded repos discovered from a git super-repo (committed and untracked source), and per-repo .gitignore respected.
  • Full extraction + security suites green (268 tests).
  • Verified end-to-end: real codegraph init -i against the issue's exact layout now indexes the nested files.

Reported by @timxx.

🤖 Generated with Claude Code

`codegraph init -i` from a git super-repo containing independent nested
git repositories (not submodules) reported "No files found to index":
git ls-files reports an embedded repo only as an opaque `subdir/` entry
and never lists its files. Detect embedded repos via that trailing-slash
signal and recurse `git ls-files` into each, indexing tracked + untracked
source and honoring each repo's own .gitignore.

Reported by @timxx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

codegraph init -i does not detect source files when run from a top-level CMake workspace containing multiple nested repos

1 participant