feat: add FK-aware schema indexing with metadata across vector stores by Zay-M3 · Pull Request #9 · Zay-M3/NaturalSQL

Zay-M3 · 2026-04-04T17:36:01Z

Summary by CodeRabbit

Release Notes

New Features
- Database relationship and foreign-key information are now extracted and indexed alongside table definitions for more comprehensive schema awareness.
- Vector search now independently queries tables and relationships, merging results for better relevance.
Improvements
- Prompt generation now uses chat-style message formatting for improved SQL generation and business query handling.
- Metadata filtering added to vector store operations for more granular document organization and retrieval.

coderabbitai · 2026-04-04T17:36:17Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR refactors the vector indexing pipeline to support document-based indexing with metadata and relationship tracking. It introduces structured document payloads containing id, content, and metadata fields; updates vector store interfaces to enable kind-based filtering; refactors schema extraction to return both tables and relationships; and converts prompt generation from string output to role-based message lists.

Changes

Cohort / File(s)	Summary
Schema Extraction & Relationship Discovery `naturalsql/sql/sqlschema.py`	Changed `extract_schema()` return format to bundle `{"tables": {...}, "relationships": [...]}` with relationship edge extraction via new `_parse_relationship_rows()`. Updated `formated_for_ia()` to accept schema bundle and return `list[dict]` with `{id, content, metadata}` entries for both tables and relationships instead of plain strings.
Vector Store Base Interface `naturalsql/vector/stores/base.py`	Extended `upsert()` with optional `metadatas: List[dict[str, Any]] \| None` parameter and `query()` with optional `kind: str \| None` filter parameter for kind-based document retrieval.
Vector Store Implementations `naturalsql/vector/stores/chroma_store.py`, `naturalsql/vector/stores/sqlite_store.py`	Implemented metadata persistence via `metadatas` parameter in `upsert()` and kind-based filtering via `where` clause in `query()`. SQLite additionally added `metadata_json` column, runtime schema migration, and foreign key enforcement.
Vector Manager & Indexing `naturalsql/controller/controllervector.py`	Added new `index_documents(documents_payload: list[dict[str, Any]])` method accepting document dicts with `{id, content, metadata}`. Refactored `index_tables()` as compatibility wrapper delegating to `index_documents()`. Updated `search_relevant_tables()` to query separately by `kind="table"` and `kind="relationship"`, merge/sort results by distance, and enforce limit on collected results.
API & Integration `naturalsql/api.py`	Updated `build_vector_db()` to invoke `vm.index_documents(documents_payload)` instead of `vm.index_tables(formatted)`, computing payload via `extractor.formated_for_ia(schema_bundle)` and reporting indexed document count via `len(documents_payload)`.
Prompt Generation `naturalsql/utils/prompt.py`	Changed `build_prompt()` and `prompt_query()` return types from `str` to `list[dict[str, str]]` (role-based message format). Added schema-wrapping in `<schema>` tags as separate system message, consolidated SQL constraints into "Mandatory rules" block, and updated `prompt_query()` to defensively read response fields via `.get()` with fallback for missing keys.

Sequence Diagram

sequenceDiagram
    participant API as NaturalSQL.build_vector_db()
    participant Extractor as SQLSchemaExtractor
    participant VectorMgr as VectorManager
    participant Store as VectorStore (Chroma/SQLite)
    
    API->>Extractor: extract_schema()
    Extractor-->>API: {"tables": {...}, "relationships": [...]}
    
    API->>Extractor: formated_for_ia(schema_bundle)
    Extractor-->>API: [{id, content, metadata}, ...]<br/>(documents_payload)
    
    API->>VectorMgr: index_documents(documents_payload)
    VectorMgr->>VectorMgr: Build embeddings for each document
    
    VectorMgr->>Store: upsert(documents, ids, embeddings, metadatas)
    Store->>Store: Persist with kind filtering (table/relationship)
    Store-->>VectorMgr: Success
    
    Note over API,Store: Later: search_relevant_tables()
    VectorMgr->>Store: query(embedding, limit, kind="table")
    Store-->>VectorMgr: Table results
    
    VectorMgr->>Store: query(embedding, limit, kind="relationship")
    Store-->>VectorMgr: Relationship results
    
    VectorMgr->>VectorMgr: Merge & sort by distance
    VectorMgr-->>API: Ranked results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

PR #5: Introduces overlapping vector subsystem modifications (VectorStore/Chroma/SQLite upsert/query signatures, VectorManager index_documents flow, document payload patterns).
PR #8: Modifies the same prompt generation functions (build_prompt, prompt_query) with overlapping signature and behavior changes.
PR #2: Shares core flow changes in schema extraction and vector manager indexing (naturalsql/sql/sqlschema.py and naturalsql/controller/controllervector.py).

Poem

🐰 Hops through vectors with glee,
Documents and metadata dance free,
Tables and relationships entwined,
A structured schema, so refined—
The burrow's indexing, redesigned! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding FK-aware (foreign key) schema indexing with metadata support across vector stores, which is reflected in the schema extraction updates, metadata handling, and vector store enhancements throughout all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/fk-aware-schema-indexing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

feat: add FK-aware schema indexing with metadata across vector stores

095dc20

Zay-M3 self-assigned this Apr 4, 2026

Zay-M3 merged commit 588cab3 into main Apr 4, 2026
1 check was pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add FK-aware schema indexing with metadata across vector stores#9

feat: add FK-aware schema indexing with metadata across vector stores#9
Zay-M3 merged 1 commit intomainfrom
feat/fk-aware-schema-indexing

Zay-M3 commented Apr 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 4, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Zay-M3 commented Apr 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Zay-M3 commented Apr 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 4, 2026 •

edited

Loading