Skip to content

merging with upstream#3

Merged
airmonitor merged 16 commits into
airmonitor:mainfrom
giancarloerra:main
May 6, 2026
Merged

merging with upstream#3
airmonitor merged 16 commits into
airmonitor:mainfrom
giancarloerra:main

Conversation

@airmonitor
Copy link
Copy Markdown
Owner

Summary

Changes

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Test coverage improvement

Testing

  • Unit tests pass (npm run test:unit)
  • Integration tests pass (npm run test:integration) — if applicable
  • TypeScript compiles cleanly (npx tsc --noEmit)
  • New tests added for new/changed functionality

Checklist

  • My code follows the existing code style and conventions
  • I have added/updated JSDoc comments where appropriate
  • I have updated documentation (README.md / DEVELOPER.md) if needed
  • I have addressed all CodeRabbit review comments (or marked as resolved with explanation)
  • I have read the Contributing Guide
  • I agree to the Contributor License Agreement

Related issues

giancarloerra and others added 16 commits May 4, 2026 19:14
…ing prebuilds (#44)

Resolves #43.

On Linux/Node combinations where one ast-grep grammar package's prebuilt
parser binary is missing for the host architecture, the v1.8.3 loader
silently failed to register every dynamic grammar in the batch, not just
the broken one. registerDynamicLanguage iterates and accesses each
module's lazy libraryPath getter; one throwing getter aborts the call
atomically and zero grammars end up registered.

Fix: pre-validate each grammar's libraryPath getter inside the per-
grammar try/catch so a missing prebuild is contained to that grammar.
Build the batch object with only the survivors and make ONE atomic
registerDynamicLanguage call. Standard environments are unaffected
because all grammars pass pre-validation. Affected environments lose
only the unloadable grammar, the rest register cleanly.

Also captures the actual error reason (the previous empty `catch {}`
discarded it), bumps symbol- and import-extraction failure logs from
debug to warn with one-shot dedupe per language, exposes loaded/failed
grammars via a new getDynamicLanguageStatus() API, and renders an "AST
grammars" block in codebase_graph_status output so users see loader
state without enabling debug logging.

Empirically verified against @ast-grep/napi@0.40.5 in a clean Node
environment. Two probes confirmed the napi semantics: sequential
register({A}); register({B}) calls are REPLACING (so per-language
registration is broken), and batch register with one bad getter is
ATOMIC (so pre-validation before the batch call is the only correct
pattern). All 721 existing unit tests continue to pass unchanged. Adds
9 new tests for the loader status API; total 730 pass. typecheck and
biome clean. CodeRabbit returned no findings on this diff.

Co-authored-by: X-Adam <X-Adam@users.noreply.github.com>
…repos

Resolves #46. Reported by @mrsuit92.

Python projects where each top-level directory is a runnable application
root (a common service-style monorepo layout) had `import config` from
`service-a/main.py` produce 0 dependency edges, even when
`service-a/config.py` sits next to the importer. At runtime Python
resolves this correctly because the importer's directory is sys.path[0]
when the file is run as `python main.py` from inside its own directory.
The static resolver did not check that path.

The Python case in graph-resolution.ts only tried:

  <projectPath>/<module>.py
  <projectPath>/src/<module>.py
  <projectPath>/lib/<module>.py

It did not try `<sourceDir>/<module>.py`, so non-relative sibling
imports never resolved. Relative imports (`from .config import ...`)
already used sourceDir and worked.

Fix: add `<sourceDir>/<module>.py` as the LAST fallback, after the
existing project-root and src/lib checks. Tried last to preserve
project-root precedence, so any layout that resolved before this PR
continues to resolve to the same file. resolveRelativePath also handles
the `<sourceDir>/<module>/__init__.py` package case via its built-in
Python init fallback, so package-style sibling imports work too.

Tests: 5 new cases in tests/unit/graph-resolution.test.ts covering
sibling-flat resolution, dotted module paths, package via __init__.py,
project-root precedence preservation, and the negative case (no match
anywhere). Existing 730 tests continue to pass; total now 735.

typecheck, biome, and CodeRabbit local review all clean.

Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
fix(graph): resolve Python sibling-flat imports in service-style monorepos
Resolves #45. Reported by @mrsuit92.

Go projects produced 0 dependency edges in codebase_graph_query and
codebase_graph_stats even though import extraction worked correctly.
The Go case in resolveImport returned null unconditionally, with a
comment that resolution required go.mod analysis. This patch adds
that analysis and wires it into the resolver, mirroring the existing
buildJvmSuffixMap and buildCsNamespaceMap patterns.

Mechanism:

- buildGoModuleInfo reads <projectPath>/go.mod once at graph-build
  time, parses the `module <path>` directive, and walks the file set
  to build a directory-to-representative-file map for every Go
  package. _test.go files are excluded from representative selection
  because Go does not allow them to be imported from non-test code in
  other packages. Files are sorted lexicographically for
  deterministic representative selection across machines and runs.
  Returns null when go.mod is missing or has no parseable module
  directive; the resolver treats null as "no Go resolution available"
  and behaves exactly as before this patch in those cases.

- The Go case in resolveImport now strips the module path prefix
  from the import (handling the bare-module-path root case as well
  as subpackage paths) and looks up the resulting directory in the
  package map. Imports outside the module path return null and are
  treated as external dependencies (or stdlib already filtered
  upstream by isExternalModule).

- Map keys are forward-slash paths, not OS-native, so resolution
  works on Windows: Go imports are always forward-slash regardless
  of host OS, but path.dirname produces backslashes on Windows for
  nested directories. Normalising the key to forward slashes at
  build time keeps the lookup correct across platforms.

Limitations (deferred to follow-up issues if any user reports them):

- The parenthesised module ( path ) form in go.mod is not parsed.
  Not used by any mainstream Go project (verified against cobra,
  gin-gonic/gin, uber-go/zap real-world go.mod files).
- vendor/ directory shadowing of external imports is not honoured.
- replace directives in go.mod are not honoured.
- go.work multi-module workspaces are not handled (each workspace
  module would need its own go.mod read and prefix matching).

These are real Go features but each one widens the patch and
narrowly affects specific user populations. They can be added as
separate small PRs if a real user hits them.

Tests: 16 new cases in tests/unit/graph-resolution.test.ts covering
the new buildGoModuleInfo function (parses simple go.mod, handles
leading whitespace and trailing content, returns null on missing or
malformed go.mod, excludes _test.go from representative selection,
omits test-only directories, uses forward-slash keys for nested
packages) and the Go resolveImport case (back-compat null without
goModuleInfo, subpackage import resolves to lex-smallest non-test
.go file, root-package import resolves to a project-root .go file,
external imports return null, missing or malformed go.mod returns
null, _test.go excluded from representative selection, lexically
smallest file picked deterministically, similar-prefix imports do
not falsely resolve, nested-package imports work cross-platform).

Existing 735 unit tests continue to pass unchanged. Total: 751.

typecheck, biome, and CodeRabbit local review all clean. CodeRabbit
caught a real Windows path-separator bug in the first iteration; the
fix and a regression test for it are included.

Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
… paths

Address CodeRabbit review on PR #48. The early `isExternalModule` check
in resolveImport was filtering out any import starting with `golang.org/`
before the Go case had a chance to match it against the local module
path. This blocked legitimate local imports for any project whose own
module path starts with `golang.org/` (the Go team's own packages like
golang.org/x/sync, golang.org/x/net, etc., where each one's go.mod
declares `module golang.org/x/<name>`).

Skip the early external check for Go specifically. The Go case in
resolveImport already does its own module-path-aware classification
and returns null for everything outside the local module, including
stdlib and third-party deps. No regression in those cases.

New regression test asserts that
`module golang.org/x/custom` + `import "golang.org/x/custom/internal"`
resolves to the local internal/ package. Confirmed the test fails
without the fix and passes with it. Total: 752 unit tests pass.

Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
fix(graph): resolve Go imports via go.mod module path
Resolves #49. Reported by @awbait.

When sharing a single Qdrant server across multiple applications
(SocratiCode + Open-WebUI + custom RAG, etc.) or across multiple
SocratiCode instances (per-project, per-environment, per-user), the
fixed `codebase_<id>` / `codegraph_<id>` / `context_<id>` /
`<id>_symgraph_*` / `socraticode_metadata` collection names risk
colliding with other apps and prevent isolation between SocratiCode
instances.

This patch adds an optional QDRANT_COLLECTION_PREFIX env var that, when
set, is prepended verbatim to every Qdrant collection name SocratiCode
creates, queries, lists, or deletes. Default empty string preserves the
existing collection names exactly: fully backwards compatible.

Touchpoints (mechanical, no logic changes):

- src/constants.ts: new QDRANT_COLLECTION_PREFIX export with eager
  validation. Qdrant accepts only [a-zA-Z0-9_-] in collection names; an
  invalid prefix throws at module load with a message naming the
  offending value, before any Qdrant call is attempted.
- src/config.ts: all six collection-name generators
  (collectionName, graphCollectionName, contextCollectionName,
  symgraphMetaCollectionName, symgraphFileCollectionName,
  symgraphIndexCollectionName) prepend the prefix. Generator semantics
  are otherwise unchanged.
- src/services/qdrant.ts: METADATA_COLLECTION (the global
  socraticode_metadata collection used for cross-project state) also
  honours the prefix, so two SocratiCode instances on one Qdrant keep
  their metadata isolated as well as their per-project collections.
  The two startsWith() filters in listCodebaseCollections — used by
  codebase_list_projects to discover this instance's collections —
  build the match prefix from QDRANT_COLLECTION_PREFIX so a prefixed
  instance only sees its own collections, not those of co-tenants.
- src/tools/manage-tools.ts: codebase_list_projects similarly uses the
  prefix in its filters. The projectId extraction (formerly
  c.replace("codebase_", "")) now slices the full
  ${prefix}codebase_ token so the recovered id is correct under any
  prefix; the codegraph cross-reference uses the same prefixed name.

Tests: 20 new test cases in tests/unit/qdrant-collection-prefix.test.ts
covering:

- Default empty prefix preserves the legacy collection-name forms for
  all six generators (regression guard against backward-compat break).
- Empty-string env var is treated identically to unset.
- Non-empty prefix prepends correctly to all six generators, including
  the suffix-style symgraph names.
- Two different prefixes produce disjoint collection-name sets for the
  same projectId (the multi-instance isolation property).
- Validation rejects whitespace, slash, colon, and unicode characters.
- The error message includes the offending value for discoverability.
- Validation accepts the full set of legal characters.

Existing 752 unit tests continue to pass unchanged. Total: 772.

typecheck, biome, and CodeRabbit local review all clean. README
updated to document the new env var alongside the other QDRANT_*
settings, including the user-side responsibility to remove old
collections when changing prefix between runs.

Co-authored-by: awbait <awbait@users.noreply.github.com>
feat(qdrant): add QDRANT_COLLECTION_PREFIX env var for shared instances
#52)

indexAllArtifacts and ensureArtifactsIndexed previously called saveContextMetadata only once, after the entire indexing pass completed. When the underlying loop took longer than the MCP client's tool-call timeout, completed artifacts appeared unindexed because their state was never persisted, and partial progress was lost.

This patch saves the metadata snapshot after every successfully indexed artifact, so each artifact's success is durable as soon as the indexing for it returns. It also seeds the in-flight stateMap from the previously-loaded existingStates so that interrupted runs can preserve completed work for artifacts already finished, and uses that same original snapshot to identify orphan artifacts that need cleanup when the config has changed.

Backwards compatible: a successful full run produces exactly the same final on-disk state as before. The only behavioural difference is in the interrupted-mid-run case, where the new code retains more state instead of losing everything since the last full pass.

Tests: 3 new cases in tests/unit/context-artifacts-checkpoint.test.ts covering the checkpointing path during full indexing, preservation of earlier successes when a later artifact fails, and preservation of up-to-date states while re-indexing stale ones. Existing unit tests continue to pass unchanged.

Co-authored-by: jackblackjack chugarev@gmail.com
…indexes (#53)

Adds an optional `projectId` field to `.socraticode.json` so teams can
commit a stable project identifier to the repo. Without this field the
project ID is derived from the SHA-256 of the absolute checkout path,
which means the same project resolves to a different Qdrant collection
on every machine, OS user, filesystem layout, or worktree. With it,
every checkout addresses the same `codebase_*`, `codegraph_*`, and
`context_*` collections regardless of where the working tree lives on
disk.

This is the path-independent, multi-project complement to the existing
`SOCRATICODE_PROJECT_ID` env var. The env var is process-scoped and
global to all projects in a host, so it does not scale to a developer
who works on several projects on one laptop. The file is per-project
and shared across teammates via git.

Resolution precedence (highest first):

  1. `SOCRATICODE_PROJECT_ID` env var (per-machine override)
  2. `projectId` in .socraticode.json (committed, team-wide)
  3. SHA-256 prefix of the absolute path (existing default)

Both override paths trim whitespace, validate against `[a-zA-Z0-9_-]+`,
and throw on invalid characters so a misconfigured value cannot
silently route a project to the wrong (or empty) collection. Malformed
JSON, missing fields, wrong types, and empty/whitespace-only values
fall through to the next precedence level so the MCP server stays
resilient against hand-edited config files. Branch-aware mode is
suppressed for either explicit override since explicit identifiers
are stable by intent.

Also fixes a pre-existing bug in `resolveLinkedCollections`: linked
projects were resolved via `coreProjectId(linkedPath)` (path hash
only), so a linked project that pinned its own `projectId` in
`.socraticode.json` would silently miss its actual data during
cross-project search. Linked-project resolution now goes through a
new `effectiveBaseProjectId` helper that honors the committed value,
preserving symmetry: a project addresses the same Qdrant collection
whether it is the current root or a linked dependency. Dedup is
tightened to use the same effective base ID, so two paths pinning the
same shared identifier collapse to a single result.

The env var deliberately does not leak into linked-project collection
names. It is process-scoped and applying it as a single value to every
linked path would collapse them onto the env-var collection, silently
losing per-project isolation.

Tests: 16 new cases in tests/unit/config.test.ts, written TDD-style
(RED to GREEN). Coverage:

  - `projectIdFromPath` (13): file resolution, ignores path
    differences when file projectId is set, whitespace trimming,
    throws on invalid characters, falls back to hash on
    empty/whitespace/wrong-type/null/missing-file/malformed-JSON,
    env-var precedence over file, branch-suffix suppression, and
    coexistence with `linkedProjects` in the same file.
  - `resolveLinkedCollections` (3): linked project's committed
    projectId honored, dedup on shared committed projectId, env var
    does not leak into linked-project collection names.

The new branch-aware-suppression test explicitly disables git
`commit.gpgsign` and `tag.gpgsign` in its throwaway-repo fixture so
the test is robust against the developer's global git config.

Backwards compatible: zero behaviour change for users who do not adopt
the new field. The `SocratiCodeConfig` interface gains an optional
field; existing `linkedProjects` parsing is functionally identical
(routed through the new shared `loadSocratiCodeConfig` helper).
Composes cleanly with the recently-added `QDRANT_COLLECTION_PREFIX`:
prefix + projectId combine into `<prefix>codebase_<projectId>` as
expected.

README and DEVELOPER documentation updated: new "Team-Shared Index
(committed `projectId`)" section in README between Git Worktrees and
Cross-Project Search, and the env-var table notes the new precedence.
DEVELOPER's "Project ID & Collection Naming" section now documents
the three-level precedence and explains why both override paths
suppress the branch-aware suffix.

Co-authored-by: airmonitor <tomasz.szuster@gmail.com>
@airmonitor airmonitor merged commit a081400 into airmonitor:main May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants