-
Notifications
You must be signed in to change notification settings - Fork 0
VectorDB Service Refactor & Test Modernization with pinecone API integration #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ts for index creation and vector operations
WalkthroughRefactors VectorDBService into VectorDBClient with a revised constructor and index-creation API, centralizing configuration via settings. Updates tests to target the new client API using a DummyIndex. Adds a main block for manual invocation. No substantive changes to upsert/query logic beyond configuration handling. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App
participant VectorDBClient as Client
participant Settings
participant Pinecone as Pinecone SDK
participant Index
Note over Client: Initialization
App->>Client: new VectorDBClient(dimension?, index_name?)
Client->>Settings: read pinecone_api_key, cloud, region, default index_name, embedding_dimension
Client-->>App: instance
rect rgba(220,240,255,0.4)
Note over Client,Pinecone: Index creation
App->>Client: create_index(index_name?, dimension?)
Client->>Pinecone: list_indexes()
alt index missing
Client->>Pinecone: create_index(idx_name, dim, ServerlessSpec(cloud, region))
end
Client->>Pinecone: get_index(idx_name)
Pinecone-->>Client: Index handle
Client->>Index: bind handle
end
rect rgba(235,255,235,0.4)
Note over Client,Index: Upsert/query
App->>Client: upsert_vectors(vectors, ids)
Client->>Client: validate dimensions
Client->>Index: upsert(vectors, ids)
App->>Client: query(vector)
Client->>Client: validate dimension
Client->>Index: query(vector)
Index-->>Client: matches
Client-->>App: matches
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. ✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
services/vector_db_service.py (3)
37-50
: Add metadata and namespace support to upserts; keep strict validation.PR objectives mention “vector upsert with metadata”. Current call drops metadata and namespace.
- def upsert_vectors(self, vectors: List[List[float]], ids: List[str]): + def upsert_vectors(self, vectors: List[List[float]], ids: List[str], metadata: Optional[List[Dict]] = None, namespace: Optional[str] = None): @@ - if len(vectors) != len(ids): + if len(vectors) != len(ids): raise ValueError("vectors and ids must have the same length.") + if metadata is not None and len(metadata) != len(ids): + raise ValueError("metadata length must match vectors/ids.") @@ - self.index.upsert(vectors=[(id, vec) for id, vec in zip(ids, vectors)]) + payload = [] + if metadata is None: + payload = [{"id": _id, "values": vec} for _id, vec in zip(ids, vectors)] + else: + payload = [{"id": _id, "values": vec, "metadata": md} for _id, vec, md in zip(ids, vectors, metadata)] + self.index.upsert(vectors=payload, namespace=namespace)
51-56
: Expose namespace/filter and validate top_k.Makes queries configurable and safer.
- def query(self, vector: List[float], top_k: int = 5): + def query(self, vector: List[float], top_k: int = 5, namespace: Optional[str] = None, filter: Optional[Dict] = None): if not self.index: raise RuntimeError("Index is not initialized. Call create_index first.") if self.dimension is not None and len(vector) != self.dimension: raise ValueError("query vector dimensionality mismatch with index.") - return self.index.query(vector=vector, top_k=top_k) + if not isinstance(top_k, int) or top_k <= 0: + raise ValueError("top_k must be a positive integer.") + return self.index.query(vector=vector, top_k=top_k, namespace=namespace, filter=filter)
1-70
: Add back-compat VectorDBService.upsert (critical)api/indexing_router.py imports VectorDBService (line 7), instantiates it (line 28) and calls vdb.upsert(namespace=..., ids=..., vectors=..., metadata=...) (line 71). services/vector_db_service.py currently defines only VectorDBClient.upsert_vectors(vectors, ids) and does not export VectorDBService or an upsert method — this will raise at import/runtime.
- Fix: add a temporary compatibility bridge in services/vector_db_service.py (near the bottom) that exposes:
- class VectorDBService(VectorDBClient)
- def upsert(self, *, namespace=None, ids=None, vectors=None, metadata=None): implement validation and call the underlying Pinecone upsert (or delegate to an updated upsert_vectors that accepts metadata/namespace).
- Alternative: change api/indexing_router.py to call the new upsert_vectors API and adjust its call sites.
🧹 Nitpick comments (6)
tests/test_vector_db_service.py (5)
10-14
: Silence unused-arg warnings in DummyIndex.query.Prefix unused parameters to satisfy Ruff (ARG002).
- def query(self, vector, top_k=5): + def query(self, _vector, _top_k=5): return {'matches': [{'id': 'id1', 'score': 0.99}]}
16-20
: Consider DI to avoid importing real Pinecone in tests.Right now, constructing VectorDBClient imports/instantiates Pinecone before you monkeypatch. Allow passing a fake
pc
into the client to fully decouple tests from the SDK.
22-29
: Tighten mocks and cover “index exists” path.
- Rename lambda args to underscores to appease Ruff (ARG005).
- Add a second test for the “already exists” branch (no create call, still sets index).
- monkeypatch.setattr(svc.pc, 'list_indexes', lambda: []) - monkeypatch.setattr(svc.pc, 'create_index', lambda **kwargs: None) - monkeypatch.setattr(svc.pc, 'Index', lambda name: DummyIndex(8)) + monkeypatch.setattr(svc.pc, 'list_indexes', lambda: []) + monkeypatch.setattr(svc.pc, 'create_index', lambda **_: None) + monkeypatch.setattr(svc.pc, 'Index', lambda _name: DummyIndex(8))If you want, I can add a
test_create_index_when_exists
that asserts no call tocreate_index
.
30-37
: Nice coverage; add a mismatch-length test.Also assert
ValueError
whenlen(ids) != len(vectors)
and non-list inputs to exercise validation branches.
39-45
: Add a negative test for uninitialized index.Set
vdb.index = None
and assertRuntimeError
onquery
, to validate the guard.services/vector_db_service.py (1)
7-7
: Import Optional and friends for proper type hints.Prevents RUF013 and improves clarity.
-from typing import List +from typing import List, Optional, Any, Dict
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
services/vector_db_service.py
(2 hunks)tests/test_vector_db_service.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_vector_db_service.py (1)
services/vector_db_service.py (4)
VectorDBClient
(10-56)query
(51-56)create_index
(24-35)upsert_vectors
(37-49)
services/vector_db_service.py (3)
api/indexing_router.py (1)
index
(27-83)tests/test_vector_db_service.py (1)
vdb
(17-20)config/settings.py (1)
Settings
(18-26)
🪛 Ruff (0.12.2)
tests/test_vector_db_service.py
13-13: Unused method argument: vector
(ARG002)
13-13: Unused method argument: top_k
(ARG002)
25-25: Unused lambda argument: kwargs
(ARG005)
26-26: Unused lambda argument: name
(ARG005)
services/vector_db_service.py
11-11: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
11-11: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
24-24: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
24-24: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
🔇 Additional comments (1)
services/vector_db_service.py (1)
24-35
: Verify existing index dimension for idempotency; fail fast on mismatch.In services/vector_db_service.py -> create_index: when binding to an existing index, fetch its configured dimension and raise a ValueError if it differs from the requested dimension to avoid subtle runtime errors. Handle both IndexModel objects and dict shapes returned by pc.list_indexes(), or call pc.describe_index(name) / pc.has_index(name) if your pinned pinecone client returns only names. Confirm the pinned pinecone client version and adjust the shape-handling accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are getting rid of this file completely. its testing pinecone client which we dont need
Overview
This PR delivers a major refactor of the
vector_db_service.py
module and a complete overhaul of its corresponding test suite. The goal is to provide a production-readyVectorDBClient
implementation, aligned with modern practices, with clear interfaces, strict validations, and reliable automated tests.Features
New
VectorDBClient
classUnified entry point for Pinecone operations.
Interactive
__main__
block for quick manual testing.Encapsulated logic for:
Environment-driven configuration
.env
viasettings.py
.Fixes
Eliminated obsolete references to the deprecated
VectorDBService
.Corrected test suite issues:
Refactors
Code Structure
Testing
Migrated to modern test patterns.
Added functional coverage for:
Acceptance Criteria
VectorDBClient
runs in isolation with clean configuration from.env
.Summary by CodeRabbit