merging with upstream#3
Merged
Merged
Conversation
…ing prebuilds (#44) Resolves #43. On Linux/Node combinations where one ast-grep grammar package's prebuilt parser binary is missing for the host architecture, the v1.8.3 loader silently failed to register every dynamic grammar in the batch, not just the broken one. registerDynamicLanguage iterates and accesses each module's lazy libraryPath getter; one throwing getter aborts the call atomically and zero grammars end up registered. Fix: pre-validate each grammar's libraryPath getter inside the per- grammar try/catch so a missing prebuild is contained to that grammar. Build the batch object with only the survivors and make ONE atomic registerDynamicLanguage call. Standard environments are unaffected because all grammars pass pre-validation. Affected environments lose only the unloadable grammar, the rest register cleanly. Also captures the actual error reason (the previous empty `catch {}` discarded it), bumps symbol- and import-extraction failure logs from debug to warn with one-shot dedupe per language, exposes loaded/failed grammars via a new getDynamicLanguageStatus() API, and renders an "AST grammars" block in codebase_graph_status output so users see loader state without enabling debug logging. Empirically verified against @ast-grep/napi@0.40.5 in a clean Node environment. Two probes confirmed the napi semantics: sequential register({A}); register({B}) calls are REPLACING (so per-language registration is broken), and batch register with one bad getter is ATOMIC (so pre-validation before the batch call is the only correct pattern). All 721 existing unit tests continue to pass unchanged. Adds 9 new tests for the loader status API; total 730 pass. typecheck and biome clean. CodeRabbit returned no findings on this diff. Co-authored-by: X-Adam <X-Adam@users.noreply.github.com>
…repos Resolves #46. Reported by @mrsuit92. Python projects where each top-level directory is a runnable application root (a common service-style monorepo layout) had `import config` from `service-a/main.py` produce 0 dependency edges, even when `service-a/config.py` sits next to the importer. At runtime Python resolves this correctly because the importer's directory is sys.path[0] when the file is run as `python main.py` from inside its own directory. The static resolver did not check that path. The Python case in graph-resolution.ts only tried: <projectPath>/<module>.py <projectPath>/src/<module>.py <projectPath>/lib/<module>.py It did not try `<sourceDir>/<module>.py`, so non-relative sibling imports never resolved. Relative imports (`from .config import ...`) already used sourceDir and worked. Fix: add `<sourceDir>/<module>.py` as the LAST fallback, after the existing project-root and src/lib checks. Tried last to preserve project-root precedence, so any layout that resolved before this PR continues to resolve to the same file. resolveRelativePath also handles the `<sourceDir>/<module>/__init__.py` package case via its built-in Python init fallback, so package-style sibling imports work too. Tests: 5 new cases in tests/unit/graph-resolution.test.ts covering sibling-flat resolution, dotted module paths, package via __init__.py, project-root precedence preservation, and the negative case (no match anywhere). Existing 730 tests continue to pass; total now 735. typecheck, biome, and CodeRabbit local review all clean. Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
fix(graph): resolve Python sibling-flat imports in service-style monorepos
Resolves #45. Reported by @mrsuit92. Go projects produced 0 dependency edges in codebase_graph_query and codebase_graph_stats even though import extraction worked correctly. The Go case in resolveImport returned null unconditionally, with a comment that resolution required go.mod analysis. This patch adds that analysis and wires it into the resolver, mirroring the existing buildJvmSuffixMap and buildCsNamespaceMap patterns. Mechanism: - buildGoModuleInfo reads <projectPath>/go.mod once at graph-build time, parses the `module <path>` directive, and walks the file set to build a directory-to-representative-file map for every Go package. _test.go files are excluded from representative selection because Go does not allow them to be imported from non-test code in other packages. Files are sorted lexicographically for deterministic representative selection across machines and runs. Returns null when go.mod is missing or has no parseable module directive; the resolver treats null as "no Go resolution available" and behaves exactly as before this patch in those cases. - The Go case in resolveImport now strips the module path prefix from the import (handling the bare-module-path root case as well as subpackage paths) and looks up the resulting directory in the package map. Imports outside the module path return null and are treated as external dependencies (or stdlib already filtered upstream by isExternalModule). - Map keys are forward-slash paths, not OS-native, so resolution works on Windows: Go imports are always forward-slash regardless of host OS, but path.dirname produces backslashes on Windows for nested directories. Normalising the key to forward slashes at build time keeps the lookup correct across platforms. Limitations (deferred to follow-up issues if any user reports them): - The parenthesised module ( path ) form in go.mod is not parsed. Not used by any mainstream Go project (verified against cobra, gin-gonic/gin, uber-go/zap real-world go.mod files). - vendor/ directory shadowing of external imports is not honoured. - replace directives in go.mod are not honoured. - go.work multi-module workspaces are not handled (each workspace module would need its own go.mod read and prefix matching). These are real Go features but each one widens the patch and narrowly affects specific user populations. They can be added as separate small PRs if a real user hits them. Tests: 16 new cases in tests/unit/graph-resolution.test.ts covering the new buildGoModuleInfo function (parses simple go.mod, handles leading whitespace and trailing content, returns null on missing or malformed go.mod, excludes _test.go from representative selection, omits test-only directories, uses forward-slash keys for nested packages) and the Go resolveImport case (back-compat null without goModuleInfo, subpackage import resolves to lex-smallest non-test .go file, root-package import resolves to a project-root .go file, external imports return null, missing or malformed go.mod returns null, _test.go excluded from representative selection, lexically smallest file picked deterministically, similar-prefix imports do not falsely resolve, nested-package imports work cross-platform). Existing 735 unit tests continue to pass unchanged. Total: 751. typecheck, biome, and CodeRabbit local review all clean. CodeRabbit caught a real Windows path-separator bug in the first iteration; the fix and a regression test for it are included. Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
… paths Address CodeRabbit review on PR #48. The early `isExternalModule` check in resolveImport was filtering out any import starting with `golang.org/` before the Go case had a chance to match it against the local module path. This blocked legitimate local imports for any project whose own module path starts with `golang.org/` (the Go team's own packages like golang.org/x/sync, golang.org/x/net, etc., where each one's go.mod declares `module golang.org/x/<name>`). Skip the early external check for Go specifically. The Go case in resolveImport already does its own module-path-aware classification and returns null for everything outside the local module, including stdlib and third-party deps. No regression in those cases. New regression test asserts that `module golang.org/x/custom` + `import "golang.org/x/custom/internal"` resolves to the local internal/ package. Confirmed the test fails without the fix and passes with it. Total: 752 unit tests pass. Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
fix(graph): resolve Go imports via go.mod module path
Resolves #49. Reported by @awbait. When sharing a single Qdrant server across multiple applications (SocratiCode + Open-WebUI + custom RAG, etc.) or across multiple SocratiCode instances (per-project, per-environment, per-user), the fixed `codebase_<id>` / `codegraph_<id>` / `context_<id>` / `<id>_symgraph_*` / `socraticode_metadata` collection names risk colliding with other apps and prevent isolation between SocratiCode instances. This patch adds an optional QDRANT_COLLECTION_PREFIX env var that, when set, is prepended verbatim to every Qdrant collection name SocratiCode creates, queries, lists, or deletes. Default empty string preserves the existing collection names exactly: fully backwards compatible. Touchpoints (mechanical, no logic changes): - src/constants.ts: new QDRANT_COLLECTION_PREFIX export with eager validation. Qdrant accepts only [a-zA-Z0-9_-] in collection names; an invalid prefix throws at module load with a message naming the offending value, before any Qdrant call is attempted. - src/config.ts: all six collection-name generators (collectionName, graphCollectionName, contextCollectionName, symgraphMetaCollectionName, symgraphFileCollectionName, symgraphIndexCollectionName) prepend the prefix. Generator semantics are otherwise unchanged. - src/services/qdrant.ts: METADATA_COLLECTION (the global socraticode_metadata collection used for cross-project state) also honours the prefix, so two SocratiCode instances on one Qdrant keep their metadata isolated as well as their per-project collections. The two startsWith() filters in listCodebaseCollections — used by codebase_list_projects to discover this instance's collections — build the match prefix from QDRANT_COLLECTION_PREFIX so a prefixed instance only sees its own collections, not those of co-tenants. - src/tools/manage-tools.ts: codebase_list_projects similarly uses the prefix in its filters. The projectId extraction (formerly c.replace("codebase_", "")) now slices the full ${prefix}codebase_ token so the recovered id is correct under any prefix; the codegraph cross-reference uses the same prefixed name. Tests: 20 new test cases in tests/unit/qdrant-collection-prefix.test.ts covering: - Default empty prefix preserves the legacy collection-name forms for all six generators (regression guard against backward-compat break). - Empty-string env var is treated identically to unset. - Non-empty prefix prepends correctly to all six generators, including the suffix-style symgraph names. - Two different prefixes produce disjoint collection-name sets for the same projectId (the multi-instance isolation property). - Validation rejects whitespace, slash, colon, and unicode characters. - The error message includes the offending value for discoverability. - Validation accepts the full set of legal characters. Existing 752 unit tests continue to pass unchanged. Total: 772. typecheck, biome, and CodeRabbit local review all clean. README updated to document the new env var alongside the other QDRANT_* settings, including the user-side responsibility to remove old collections when changing prefix between runs. Co-authored-by: awbait <awbait@users.noreply.github.com>
feat(qdrant): add QDRANT_COLLECTION_PREFIX env var for shared instances
#52) indexAllArtifacts and ensureArtifactsIndexed previously called saveContextMetadata only once, after the entire indexing pass completed. When the underlying loop took longer than the MCP client's tool-call timeout, completed artifacts appeared unindexed because their state was never persisted, and partial progress was lost. This patch saves the metadata snapshot after every successfully indexed artifact, so each artifact's success is durable as soon as the indexing for it returns. It also seeds the in-flight stateMap from the previously-loaded existingStates so that interrupted runs can preserve completed work for artifacts already finished, and uses that same original snapshot to identify orphan artifacts that need cleanup when the config has changed. Backwards compatible: a successful full run produces exactly the same final on-disk state as before. The only behavioural difference is in the interrupted-mid-run case, where the new code retains more state instead of losing everything since the last full pass. Tests: 3 new cases in tests/unit/context-artifacts-checkpoint.test.ts covering the checkpointing path during full indexing, preservation of earlier successes when a later artifact fails, and preservation of up-to-date states while re-indexing stale ones. Existing unit tests continue to pass unchanged. Co-authored-by: jackblackjack chugarev@gmail.com
…indexes (#53) Adds an optional `projectId` field to `.socraticode.json` so teams can commit a stable project identifier to the repo. Without this field the project ID is derived from the SHA-256 of the absolute checkout path, which means the same project resolves to a different Qdrant collection on every machine, OS user, filesystem layout, or worktree. With it, every checkout addresses the same `codebase_*`, `codegraph_*`, and `context_*` collections regardless of where the working tree lives on disk. This is the path-independent, multi-project complement to the existing `SOCRATICODE_PROJECT_ID` env var. The env var is process-scoped and global to all projects in a host, so it does not scale to a developer who works on several projects on one laptop. The file is per-project and shared across teammates via git. Resolution precedence (highest first): 1. `SOCRATICODE_PROJECT_ID` env var (per-machine override) 2. `projectId` in .socraticode.json (committed, team-wide) 3. SHA-256 prefix of the absolute path (existing default) Both override paths trim whitespace, validate against `[a-zA-Z0-9_-]+`, and throw on invalid characters so a misconfigured value cannot silently route a project to the wrong (or empty) collection. Malformed JSON, missing fields, wrong types, and empty/whitespace-only values fall through to the next precedence level so the MCP server stays resilient against hand-edited config files. Branch-aware mode is suppressed for either explicit override since explicit identifiers are stable by intent. Also fixes a pre-existing bug in `resolveLinkedCollections`: linked projects were resolved via `coreProjectId(linkedPath)` (path hash only), so a linked project that pinned its own `projectId` in `.socraticode.json` would silently miss its actual data during cross-project search. Linked-project resolution now goes through a new `effectiveBaseProjectId` helper that honors the committed value, preserving symmetry: a project addresses the same Qdrant collection whether it is the current root or a linked dependency. Dedup is tightened to use the same effective base ID, so two paths pinning the same shared identifier collapse to a single result. The env var deliberately does not leak into linked-project collection names. It is process-scoped and applying it as a single value to every linked path would collapse them onto the env-var collection, silently losing per-project isolation. Tests: 16 new cases in tests/unit/config.test.ts, written TDD-style (RED to GREEN). Coverage: - `projectIdFromPath` (13): file resolution, ignores path differences when file projectId is set, whitespace trimming, throws on invalid characters, falls back to hash on empty/whitespace/wrong-type/null/missing-file/malformed-JSON, env-var precedence over file, branch-suffix suppression, and coexistence with `linkedProjects` in the same file. - `resolveLinkedCollections` (3): linked project's committed projectId honored, dedup on shared committed projectId, env var does not leak into linked-project collection names. The new branch-aware-suppression test explicitly disables git `commit.gpgsign` and `tag.gpgsign` in its throwaway-repo fixture so the test is robust against the developer's global git config. Backwards compatible: zero behaviour change for users who do not adopt the new field. The `SocratiCodeConfig` interface gains an optional field; existing `linkedProjects` parsing is functionally identical (routed through the new shared `loadSocratiCodeConfig` helper). Composes cleanly with the recently-added `QDRANT_COLLECTION_PREFIX`: prefix + projectId combine into `<prefix>codebase_<projectId>` as expected. README and DEVELOPER documentation updated: new "Team-Shared Index (committed `projectId`)" section in README between Git Worktrees and Cross-Project Search, and the env-var table notes the new precedence. DEVELOPER's "Project ID & Collection Naming" section now documents the three-level precedence and explains why both override paths suppress the branch-aware suffix. Co-authored-by: airmonitor <tomasz.szuster@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Type of change
Testing
npm run test:unit)npm run test:integration) — if applicablenpx tsc --noEmit)Checklist
Related issues