VectorDB Service Refactor & Test Modernization with pinecone API integration #23

albertyosef · 2025-09-12T18:10:45Z

Overview

This PR delivers a major refactor of the vector_db_service.py module and a complete overhaul of its corresponding test suite. The goal is to provide a production-ready VectorDBClient implementation, aligned with modern practices, with clear interfaces, strict validations, and reliable automated tests.

Features

New VectorDBClient class
- Unified entry point for Pinecone operations.
- Interactive __main__ block for quick manual testing.
- Encapsulated logic for:
  - Index creation and configuration.
  - Vector upsert with metadata support.
  - Query with configurable parameters and filters.
Environment-driven configuration
- API key and region loaded from .env via settings.py.
- Promotes secure, portable, and environment-specific deployments.

Fixes

Eliminated obsolete references to the deprecated VectorDBService.
Corrected test suite issues:
- Removed reliance on legacy dummy code.
- Standardized on isolated logic with proper mocking.
- Ensured all tests execute reliably in CI without external dependencies.

Refactors

Code Structure
- Introduced strict dimension validation and consistent error handling.
- Improved function signatures for clarity and maintainability.
- Added docstrings and inline comments for better developer guidance.
Testing
- Migrated to modern test patterns.
- Added functional coverage for:
  - Index creation flow.
  - Vector upsert and retrieval.
  - Query behavior under success and failure scenarios.

Acceptance Criteria

VectorDBClient runs in isolation with clean configuration from .env.
Index creation is idempotent and robust.
Ingesting and querying vectors works end-to-end with validation checks.
Test suite fully covers critical workflows with mocks, no external API calls.
CI passes with no dependency on legacy service code.

Summary by CodeRabbit

New Features
- Flexible index creation with sensible defaults, reducing setup steps.
Improvements
- Centralized configuration streamlines setup and reduces manual configuration.
- Simplified initialization using default settings where available.
- More robust validation for vector dimensions with clearer errors.
- More reliable indexing flow by automatically selecting or creating the target index.
Tests
- Test suite updated to reflect the new client interface and simplified setup.

…ts for index creation and vector operations

coderabbitai · 2025-09-12T18:10:51Z

Walkthrough

Refactors VectorDBService into VectorDBClient with a revised constructor and index-creation API, centralizing configuration via settings. Updates tests to target the new client API using a DummyIndex. Adds a main block for manual invocation. No substantive changes to upsert/query logic beyond configuration handling.

Changes

Cohort / File(s)	Summary of Changes
Vector DB client refactor `services/vector_db_service.py`	Renames/exports VectorDBClient, removes api_key/cloud/region/environment params; reads config from `config.settings`. Constructor now `(dimension=None, index_name=None)`. `create_index` now optional args and sets `self.index`. Adds `__main__` usage block. Cleans imports/docstrings. Core upsert/query paths unchanged apart from config wiring.
Tests updated for new API `tests/test_vector_db_service.py`	Rewrites tests to use `VectorDBClient(dimension, index_name)` with a `DummyIndex`. Validates dimension checks, upsert, and query behaviors. Removes Pinecone-wide mocks; uses fixture to inject dummy index. Adjusts to 8-dim vectors and new API surface.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant App
    participant VectorDBClient as Client
    participant Settings
    participant Pinecone as Pinecone SDK
    participant Index

    Note over Client: Initialization
    App->>Client: new VectorDBClient(dimension?, index_name?)
    Client->>Settings: read pinecone_api_key, cloud, region, default index_name, embedding_dimension
    Client-->>App: instance

    rect rgba(220,240,255,0.4)
    Note over Client,Pinecone: Index creation
    App->>Client: create_index(index_name?, dimension?)
    Client->>Pinecone: list_indexes()
    alt index missing
        Client->>Pinecone: create_index(idx_name, dim, ServerlessSpec(cloud, region))
    end
    Client->>Pinecone: get_index(idx_name)
    Pinecone-->>Client: Index handle
    Client->>Index: bind handle
    end

    rect rgba(235,255,235,0.4)
    Note over Client,Index: Upsert/query
    App->>Client: upsert_vectors(vectors, ids)
    Client->>Client: validate dimensions
    Client->>Index: upsert(vectors, ids)

    App->>Client: query(vector)
    Client->>Client: validate dimension
    Client->>Index: query(vector)
    Index-->>Client: matches
    Client-->>App: matches
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Pinecone Client Refactor for v6+ API Compatibility #16 — Also refactors VectorDBClient constructor and index management, overlapping with configuration and dimension/validation logic.

Suggested reviewers

fehranbit

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "VectorDB Service Refactor & Test Modernization with pinecone API integration" accurately summarizes the PR's main changes—refactoring the VectorDB implementation, modernizing tests, and integrating with Pinecone—and is a concise, single-sentence summary that a reviewer can scan for context.

Poem

A rabbit taps keys with vector delight,
Indices sprout in serverless light.
Settings now whisper the where and the when,
Upserts hop in, queries hop again.
New client, same warren—tidy and bright! 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/vector-db-service-refactor-and-tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

services/vector_db_service.py (3)
37-50: Add metadata and namespace support to upserts; keep strict validation.

PR objectives mention “vector upsert with metadata”. Current call drops metadata and namespace.
-    def upsert_vectors(self, vectors: List[List[float]], ids: List[str]):
+    def upsert_vectors(self, vectors: List[List[float]], ids: List[str], metadata: Optional[List[Dict]] = None, namespace: Optional[str] = None):
@@
-        if len(vectors) != len(ids):
+        if len(vectors) != len(ids):
             raise ValueError("vectors and ids must have the same length.")
+        if metadata is not None and len(metadata) != len(ids):
+            raise ValueError("metadata length must match vectors/ids.")
@@
-        self.index.upsert(vectors=[(id, vec) for id, vec in zip(ids, vectors)])
+        payload = []
+        if metadata is None:
+            payload = [{"id": _id, "values": vec} for _id, vec in zip(ids, vectors)]
+        else:
+            payload = [{"id": _id, "values": vec, "metadata": md} for _id, vec, md in zip(ids, vectors, metadata)]
+        self.index.upsert(vectors=payload, namespace=namespace)
51-56: Expose namespace/filter and validate top_k.

Makes queries configurable and safer.
-    def query(self, vector: List[float], top_k: int = 5):
+    def query(self, vector: List[float], top_k: int = 5, namespace: Optional[str] = None, filter: Optional[Dict] = None):
         if not self.index:
             raise RuntimeError("Index is not initialized. Call create_index first.")
         if self.dimension is not None and len(vector) != self.dimension:
             raise ValueError("query vector dimensionality mismatch with index.")
-        return self.index.query(vector=vector, top_k=top_k)
+        if not isinstance(top_k, int) or top_k <= 0:
+            raise ValueError("top_k must be a positive integer.")
+        return self.index.query(vector=vector, top_k=top_k, namespace=namespace, filter=filter)
1-70: Add back-compat VectorDBService.upsert (critical)

api/indexing_router.py imports VectorDBService (line 7), instantiates it (line 28) and calls vdb.upsert(namespace=..., ids=..., vectors=..., metadata=...) (line 71). services/vector_db_service.py currently defines only VectorDBClient.upsert_vectors(vectors, ids) and does not export VectorDBService or an upsert method — this will raise at import/runtime.

Fix: add a temporary compatibility bridge in services/vector_db_service.py (near the bottom) that exposes:

class VectorDBService(VectorDBClient)

def upsert(self, *, namespace=None, ids=None, vectors=None, metadata=None): implement validation and call the underlying Pinecone upsert (or delegate to an updated upsert_vectors that accepts metadata/namespace).

Alternative: change api/indexing_router.py to call the new upsert_vectors API and adjust its call sites.

🧹 Nitpick comments (6)

tests/test_vector_db_service.py (5)
10-14: Silence unused-arg warnings in DummyIndex.query.

Prefix unused parameters to satisfy Ruff (ARG002).
-    def query(self, vector, top_k=5):
+    def query(self, _vector, _top_k=5):
         return {'matches': [{'id': 'id1', 'score': 0.99}]}
16-20: Consider DI to avoid importing real Pinecone in tests.

Right now, constructing VectorDBClient imports/instantiates Pinecone before you monkeypatch. Allow passing a fake pc into the client to fully decouple tests from the SDK.

22-29: Tighten mocks and cover “index exists” path.

Rename lambda args to underscores to appease Ruff (ARG005).

Add a second test for the “already exists” branch (no create call, still sets index).
-    monkeypatch.setattr(svc.pc, 'list_indexes', lambda: [])
-    monkeypatch.setattr(svc.pc, 'create_index', lambda **kwargs: None)
-    monkeypatch.setattr(svc.pc, 'Index', lambda name: DummyIndex(8))
+    monkeypatch.setattr(svc.pc, 'list_indexes', lambda: [])
+    monkeypatch.setattr(svc.pc, 'create_index', lambda **_: None)
+    monkeypatch.setattr(svc.pc, 'Index', lambda _name: DummyIndex(8))
If you want, I can add a test_create_index_when_exists that asserts no call to create_index.

30-37: Nice coverage; add a mismatch-length test.

Also assert ValueError when len(ids) != len(vectors) and non-list inputs to exercise validation branches.

39-45: Add a negative test for uninitialized index.

Set vdb.index = None and assert RuntimeError on query, to validate the guard.
services/vector_db_service.py (1)
7-7: Import Optional and friends for proper type hints.

Prevents RUF013 and improves clarity.
-from typing import List
+from typing import List, Optional, Any, Dict

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0261587 and 9fbaf8d.

📒 Files selected for processing (2)

services/vector_db_service.py (2 hunks)
tests/test_vector_db_service.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/test_vector_db_service.py (1)

services/vector_db_service.py (4)

VectorDBClient (10-56)

query (51-56)

create_index (24-35)

upsert_vectors (37-49)

services/vector_db_service.py (3)

api/indexing_router.py (1)

index (27-83)

tests/test_vector_db_service.py (1)

vdb (17-20)

config/settings.py (1)

Settings (18-26)

🪛 Ruff (0.12.2)

tests/test_vector_db_service.py

13-13: Unused method argument: vector

(ARG002)

13-13: Unused method argument: top_k

(ARG002)

25-25: Unused lambda argument: kwargs

(ARG005)

26-26: Unused lambda argument: name

(ARG005)

services/vector_db_service.py

11-11: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

11-11: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

24-24: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

24-24: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

🔇 Additional comments (1)

services/vector_db_service.py (1)

24-35: Verify existing index dimension for idempotency; fail fast on mismatch.

In services/vector_db_service.py -> create_index: when binding to an existing index, fetch its configured dimension and raise a ValueError if it differs from the requested dimension to avoid subtle runtime errors. Handle both IndexModel objects and dict shapes returned by pc.list_indexes(), or call pc.describe_index(name) / pc.has_index(name) if your pinned pinecone client returns only names. Confirm the pinned pinecone client version and adjust the shape-handling accordingly.

services/vector_db_service.py

fehranbit · 2025-09-14T12:43:58Z

tests/test_vector_db_service.py

we are getting rid of this file completely. its testing pinecone client which we dont need

Refactor VectorDBClient to use settings for configuration; update tes…

9fbaf8d

…ts for index creation and vector operations

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

services/vector_db_service.py Show resolved Hide resolved

fehranbit reviewed Sep 14, 2025

View reviewed changes

tests/test_vector_db_service.py

Copy link

Member

fehranbit Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are getting rid of this file completely. its testing pinecone client which we dont need

fehranbit approved these changes Sep 14, 2025

View reviewed changes

fehranbit merged commit af0338c into main Sep 14, 2025
1 check passed

This was referenced Sep 20, 2025

Implement ranking, synthesis, web fetching, and search services #25

Closed

Refactor: Improve Indexing, Environment, and Dependency Management #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VectorDB Service Refactor & Test Modernization with pinecone API integration #23

VectorDB Service Refactor & Test Modernization with pinecone API integration #23

Uh oh!

albertyosef commented Sep 12, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

fehranbit Sep 14, 2025

Uh oh!

Uh oh!

Uh oh!

VectorDB Service Refactor & Test Modernization with pinecone API integration #23

VectorDB Service Refactor & Test Modernization with pinecone API integration #23

Uh oh!

Conversation

albertyosef commented Sep 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Features

Fixes

Refactors

Acceptance Criteria

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks (2 passed, 1 warning)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fehranbit Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albertyosef commented Sep 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 12, 2025 •

edited

Loading