Skip to content

Releases: ClayGendron/vfs

v0.0.22

23 Apr 00:34

Choose a tag to compare

What's Changed

Added

  • PostgresFileSystem (story 003) — native Postgres backend with pgvector-backed embeddings, native lexical (tsvector + plainto_tsquery) and semantic search, regex pushdown, and explicit schema verification. Respects the model-declared pgvector operator class (vector_cosine_ops, vector_ip_ops, vector_l2_ops) for both distance operator selection (<=>, <#>, <->) and score normalization.
  • Native Postgres pattern-search contract — partial trigram GIN + B-tree text_pattern_ops indexes on vfs_objects.path and .content; verify_native_search_schema fails fast if the required artifacts are missing.
  • Native Postgres meeting_subgraph (story 004) — server-side PL/pgSQL Steiner-tree traversal installed via install_native_graph_schema() and verified at startup.
  • Native MSSQL meeting_subgraph — T-SQL stored procedure with a TVP (GroverSeedList) seed-list input; override routes through a raw aioodbc cursor since SQLAlchemy bindparam can't plumb TVPs.
  • Native MSSQL pushdown for predecessors, successors, ancestors, descendants, neighborhood — previously delegated to the in-memory rustworkx path. Seed/exclusion sets bind as JSON strings unpacked via OPENJSON with an explicit NVARCHAR(450) schema; multi-hop traversals drive a Python BFS loop over single-hop SQL queries to avoid MSSQL's recursive-CTE UNION ALL cycle hazard.
  • /.vfs sidecar namespace (story 002) — canonical /.vfs/<path>/__meta__/... layout for edge storage and metadata. Replaces the prior child-of-file metadata namespace; mkconn is removed.

Changed

  • Result schema unified — the Candidate / Detail chain is collapsed into a flat Entry row with operation metadata hoisted onto the envelope. Backends and the query executor now reason in terms of column projection instead of shape dispatch.
  • Constitution rewritten around four primitives (Namespace, Entry, Revision, Operation) with RFC 2119 precedence language.
  • in_degree / out_degree dropped from VFSObjectBase, Entry projection, and result defaults — graph degrees are no longer part of the entry payload.
  • path / parent_path max length restored to 4096 so SQLModel.metadata.create_all targets SQL Server's VARCHAR(8000) ceiling; deepest sidecar paths (/.vfs/<path>/__meta__/versions) now have headroom.
  • MSSQLFileSystem._grep_impl — WHERE body extracted so the two FROM / ORDER BY branches no longer duplicate the filter body.
  • context/ workspace — research/ migrated into context/standards/, context/stories/, context/learnings/; legacy docs and demo DB fixtures pruned.

Fixed

  • VectorType.process_result_value — reject str / bytes before the iterable fallback so the expected ValueError("expected iterable pgvector value") is raised instead of a downstream per-character float parse error.
  • CI coverage restored above the 99% gate with focused tests for Postgres inverse-edge projections, path/vector helper error handling, model/result rendering, and routing/parser/base/permission edge cases.
  • CI lint / format / type-check regressions from the native search backend work — touched files now pass ruff check, ruff format --check, and ty check src/.

Full Changelog: v0.0.21...v0.0.22

v0.0.21

19 Apr 02:41

Choose a tag to compare

What's Changed

Changed

  • Python package renamed from grover to vfs — imports change from from grover import ... to from vfs import .... Matches the vfs-py PyPI distribution name. This is a breaking change with no compatibility shim.
  • Class identifiers renamed to a VFS* / VirtualFileSystem scheme:
    • GroverFileSystemVirtualFileSystem (in vfs.base)
    • GroverVFSClient (sync facade)
    • GroverAsyncVFSClientAsync (async router)
    • GroverResultVFSResult
    • GroverObject / GroverObjectBaseVFSObject / VFSObjectBase
    • GroverErrorVFSError
  • DB table renamed grover_objectsvfs_objects (and the ix_grover_objects_ext_kind index → ix_vfs_objects_ext_kind). No migration script is shipped — existing databases need their tables recreated by the consumer.
  • scripts/bump_version.py — updated to point at src/vfs/__init__.py.
  • README.md — examples, install commands, badges, and class names updated for the new package and identifiers.

Full Changelog: v0.0.20...v0.0.21

v0.0.20

17 Apr 20:34

Choose a tag to compare

What's Changed

Changed

  • PyPI package renamed from grover to vfs-py — install with pip install vfs-py. Python imports are unchanged (import grover). Versions 0.0.18 and earlier remain available on PyPI under the grover name; new releases publish to vfs-py.

Full Changelog: v0.0.18...v0.0.20

v0.0.18

10 Apr 22:38

Choose a tag to compare

What's Changed

Changed

  • DB read paths now hydrate Candidate.contentls, glob, delete, move, tree, and lexical_search always populate content on the candidates they emit. The underlying select(self._model) was already pulling the column over the wire; the include_content=False default on to_candidate was discarding it during projection. Removed the parameter entirely so every read-path projection returns content, eliminating the redundant follow-up read(...) round trip when a downstream stage needs the content.
  • MSSQL _grep_impl, _glob_impl, and _lexical_search_impl delegate to the base class when candidates is supplied — Once the candidate set has been transferred to Python (and now carries content), there is nothing left for SQL Server to do. The base class runs the regex via _collect_line_matches and BM25 via BM25Scorer over the in-memory content with zero round trips. Full-tree pushdowns (CONTAINSTABLE, REGEXP_LIKE) are unchanged for the no-candidates path.
  • MSSQL _glob_impl no-candidates branchSELECT path, kind, content instead of SELECT path, kind, so glob results carry content directly out of the pushdown.
  • MSSQL _lexical_search_impl no-candidates branch — Follows the CONTAINSTABLE pushdown with one small batched SELECT path, content WHERE path IN (top_k) so the top-k results return hydrated. k is bounded (default 15), so the second round trip is tiny.
  • Base _lexical_search_impl — Threads content through _LexicalDoc into the result candidates (previously dropped during result construction).
  • Base _glob_impl upstream-candidates branch — Preserves prior content and metrics via Candidate.model_copy(...) instead of constructing a fresh Candidate that drops them.
  • _read_impl — Skips the SELECT for already-hydrated candidates and only fetches the gaps. Makes read(candidates=...) cheap when content is already on the candidates from a prior stage.

Removed

  • include_content parameter on GroverObjectBase.to_candidate — Always populates content now. The 5 callers that explicitly passed include_content=True (_write_impl, _read_impl, bulk write) drop the redundant kwarg.
  • MSSQL _grep_with_candidate_chunks helper — Dead after the candidates path delegates to the base class.

Fixed

  • glob | grep on MSSQL drops from three round trips to one — Previously the executor pre-hydrated content via read(...), then MSSQL _grep_impl ignored the hydrated content and re-queried with REGEXP_LIKE against the same paths. Now glob returns hydrated content directly and grep runs the regex in Python on the in-memory candidates.

Full Changelog: v0.0.17...v0.0.18

v0.0.17

10 Apr 21:27

Choose a tag to compare

What's Changed

Fixed

  • glob and grep mount-prefix routing — Absolute patterns and literal paths filters are now stripped of the mount prefix before dispatch to each mount. Previously the non-candidate fanout in _route_fanout forwarded the full pattern to every mount, but mounts store paths mount-relative — so glob('/data/**/*.py') and grep with paths=('/data/src',) silently returned empty while read on the same path worked. New dedicated _route_glob_fanout / _route_grep_fanout use exact-rewrite when provable (literal-prefix or single-segment glob consumption against the mount name) and fall back to a /** superset query plus router-side authoritative re-filter when the pattern's leading segment is **. globs_not is never silently dropped — exclusions that cannot be exactly pushed are enforced at the router after rebase. max_count is deferred to the router whenever any mount uses the post-filter fallback so it cannot truncate pre-filter candidates. Wildcard mount selectors (/*/, /d?ta/, /d[ae]ta/) and multi-hop chains (GroverAsync → router → router → leaf) work correctly. The candidate-input path (glob/grep with candidates=) had the same mismatch and is fixed by filtering at the router with the original absolute pattern before grouping by terminal.

Added

  • grover.routing module — Pure helpers (rewrite_glob_for_mount, rewrite_path_for_mount, first_segment, glob_segment_matches) and plan dataclasses (GlobMountPlan, GrepMountPlan) backing the new fanout strategy.

Changed

  • MSSQLFileSystem glob pushdown_glob_impl now structurally decomposes glob patterns via the new decompose_glob() helper into a literal path prefix and a trailing **/*.<ext> tail, then pushes both into SQL: the prefix becomes a sargable LIKE predicate and the ext narrows the (ext, kind) composite index seek. When the pattern is fully expressible as prefix + **/*.<ext> the authoritative REGEXP_LIKE residual is dropped entirely and kind is narrowed to 'file' so the planner picks ix_grover_objects_ext_kind instead of relying on ext IS NULL for directory exclusion. Caller-supplied ext and paths remain authoritative (intersected with the decomposed values) so explicit narrowing is never broadened by the optimization. Patterns the decomposer doesn't recognize fall through to the existing regex path with no behavior change.

Full Changelog: v0.0.16...v0.0.17

v0.0.16

10 Apr 18:58

Choose a tag to compare

What's Changed

Added

  • Ripgrep-compatible filter surface on grep and glob — Structural filters (ext, positional paths, globs, output modes files/lines/count, context windows -A/-B/-C, case_mode, max_count) now push into SQL through a composable clause builder instead of forcing every search through a full-content scan. DatabaseFileSystem still issues a single query per grep/glob call; MSSQLFileSystem picks between four SQL templates (CONTAINSTABLE/Direct × lines/files) and skips content transfer entirely for -l (files-only) mode.
  • Indexed ext column on grover_objects — Derived from path and indexed so -t py on a million-row corpus becomes an index seek rather than a table scan. Maintained automatically on write.
  • docs/ai_agent_glob_grep_patterns.md — Reference for agents on rg-equivalent query patterns against Grover.

Changed

  • grep / glob kwargs aligned with ripgrepcase_sensitivecase_mode (smart/insensitive/sensitive), max_resultsmax_count. This is a breaking change for callers of the old kwargs.

Full Changelog: v0.0.15...v0.0.16

v0.0.15

10 Apr 14:13

Choose a tag to compare

What's Changed

Fixed

  • MSSQLFileSystem schema resolution for raw text() SQL — Raw text() queries in verify_fulltext_schema, _lexical_search_impl, _grep_impl, _grep_with_candidate_chunks, and _glob_impl used self._model.__tablename__ directly, bypassing SQLAlchemy's schema_translate_map (which only applies when compiling Table references). Mounts pointing at a non-default schema hit Invalid object name 'grover_objects' on every search call. Fixed by adding a schema kwarg to GroverFileSystem that stores self._schema and applies schema_translate_map={None: schema} to every session via _use_session() so ORM queries continue to resolve correctly, plus a _resolve_table() helper on MSSQLFileSystem that qualifies the bare __tablename__ with self._schema for raw SQL. Works uniformly across engine= and session_factory= construction and supports multiple filesystems sharing one factory with different schemas (per-session connection options). Closes #3.
  • verify_fulltext_schema DDL hint key column — The suggested CREATE UNIQUE NONCLUSTERED INDEX referenced (path), but path is max_length=4096 and exceeds SQL Server's 900-byte index key limit, so the DDL would always fail. The Full-Text KEY INDEX now targets (id), the 36-character UUID primary key.

Added

  • schema kwarg on GroverFileSystem — Optional, forwarded through DatabaseFileSystem.__init__ and MSSQLFileSystem.__init__. When set, _use_session() applies schema_translate_map={None: schema} per session so ORM queries resolve unqualified tables, and MSSQLFileSystem raw queries qualify the table name with it.

Full Changelog: v0.0.14...v0.0.15

v0.0.14

10 Apr 01:48

Choose a tag to compare

What's Changed

Added

  • MSSQLFileSystem (alpha) — SQL Server / Azure SQL backend with full-text search and native regex pushdown. Subclass of DatabaseFileSystem that overrides _lexical_search_impl, _grep_impl, and _glob_impl to push work into SQL Server 2025+ via CONTAINSTABLE and REGEXP_LIKE. CRUD, versions, chunks, connections, graph, and vector search are inherited unchanged. Includes verify_fulltext_schema() startup check, a dialect parameter budget of 2000, and a Docker dev environment (SQL Server 2025 + Full-Text Search + ODBC Driver 18) with mssql_up.sh / mssql_down.sh / mssql_test.sh helpers. Install via grover[mssql] (requires aioodbc>=0.5 and pyodbc>=5.0). Operators must provision the Full-Text catalog and index outside the application. Integration tests gated on pytest --mssql; helpers run unconditionally in CI. src/grover/backends/mssql.py is excluded from the coverage gate until a SQL Server 2025 service container is wired into CI.
  • Mount-level permissionsread / read_write flag on add_mount() for coarse-grained access control. Read-only mounts reject all write operations at the facade boundary.
  • Directory-level permissions via PermissionMap — fine-grained per-directory permission rules layered on top of mount permissions. Routing checks both mount and directory permissions before dispatching to the backend.

Full Changelog: v0.0.13...v0.0.14

v0.0.13

08 Apr 03:17

Choose a tag to compare

What's Changed

Added

  • GroverObjectBase.clone() — Fast (~1.7µs) method to create a detached copy of a model instance with independent SQLAlchemy state. Uses shallow copy + fresh InstanceState so clones can be safely added to any session.

Fixed

  • write(objects=...) no longer mutates input objects_group_objects_by_terminal now clones objects before stripping mount prefixes, preserving the caller's original list.
  • add_prefix path normalization — Prefixes are now normalized via normalize_path() before concatenation, ensuring paths always have a leading / regardless of prefix format.
  • strip_prefix safety — Now validates the prefix matches the start of the path and raises ValueError on mismatch instead of blindly slicing. Prefixes are normalized before comparison.
  • _rederive_path_fields normalization — Calls normalize_path() as a safety net, guaranteeing all post-mutation paths are valid before reaching the database.

Full Changelog: v0.0.12...v0.0.13

v0.0.12

03 Apr 18:30

Choose a tag to compare

What's Changed

Changed

  • Unified client API — All Grover sync methods now return GroverResult, matching GroverFileSystem exactly. Single-path CRUD methods (read, write, edit, delete, stat, mkdir, mkconn) no longer unwrap to Candidate.
  • add_mount simplified — Accepts both "data" and "/data", rejects nested paths. No more factory kwargs (engine_url, session_factory, etc.) — construct DatabaseFileSystem explicitly and pass it in.
  • No overrides in facades — Mount normalization, engine disposal, and close() live on GroverFileSystem. GroverAsync is now a one-liner subclass. Grover sync wrapper is a pure delegation layer.
  • Batch parameters added to sync Grovercandidates param on read, stat, edit, delete, ls; edits list on edit; moves/copies batch lists on move/copy; objects on write.

Fixed

  • Path length limit test — Account for /.versions/1 suffix when testing max path length against the 4096-char column limit.

Full Changelog: v0.0.11...v0.0.12