-
Notifications
You must be signed in to change notification settings - Fork 1
Update Vector Upsert Documentation #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughVector Storage Python docs updated: upsert now accepts a single list of documents (List[VectorUpsertParams]) instead of variadic args; search now takes explicit parameters (query, limit, similarity, metadata) instead of a params object. Examples updated (including async email handling and vector indexing examples) and minor wording edits applied. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Context as context.vector
rect rgb(230,245,255)
Note over Client,Context: Upsert (new list-based API)
Client->>Context: upsert(name, documents: List[VectorUpsertParams])
Context-->>Client: list[str] (ids)
end
rect rgb(245,255,230)
Note over Client,Context: Search (explicit params)
Client->>Context: search(name, query=string, limit=int, similarity=float, metadata=dict)
Context-->>Client: list[VectorSearchResult]
end
rect rgb(255,250,230)
Note over Client,Server: Async email handling in examples
Client->>Server: request arrives
Server->>Server: email = await request.data.email()
Server-->>Client: process attachments / pass to indexing flow
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Deploying with
|
Status | Name | Latest Commit | Preview URL | Updated (UTC) |
---|---|---|---|---|
✅ Deployment successful! View logs |
docs | d507875 | Commit Preview URL Branch Preview URL |
Aug 19 2025, 03:48 PM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (3)
content/SDKs/python/api-reference.mdx (3)
371-379
: Vector upsert signature still shows variadic args; update to list-based parameterThe signature retains
*documents
while the examples and parameter text reflect a list. This inconsistency will confuse users.Apply this diff to align the signature with the new API:
-`async upsert(name: str, *documents: VectorUpsertParams) -> list[str]` +`async upsert(name: str, documents: List[VectorUpsertParams]) -> list[str]`
586-595
: Invalid Python comment syntax in example (bytes([/* image bytes */])
)C-style comments inside a Python list are invalid and will mislead readers.
Apply this diff to make the snippet valid:
-image_data = bytes([/* image bytes */]) +image_data = bytes([0, 1, 2, 3]) # example bytes
369-441
: Update remaining variadic upsert references in documentationPlease replace all occurrences of the old
*documents
signature and call-site with the new list-based API:• content/SDKs/python/api-reference.mdx (≈line 371)
– Change the signature
from
async upsert(name: str, *documents: VectorUpsertParams) -> list[str]
to
async upsert(name: str, documents: List[VectorUpsertParams]) -> List[str]
– Update the “Parameters” section to documentdocuments
as a list.• content/Guides/vector-db.mdx (line 91)
– Change the call
from
ids = await context.vector.upsert("knowledge-base", *documents)
to
ids = await context.vector.upsert("knowledge-base", documents)
🧹 Nitpick comments (2)
content/SDKs/python/api-reference.mdx (2)
425-431
: Examples look good; align metadata key naming for consistencyThe list-based examples are correct. However, metadata keys mix
product_id
(snake_case) andproductId
(camelCase). Use a consistent style to avoid confusion.Apply this diff to standardize on snake_case (or choose camelCase consistently across both examples):
- {"embeddings": [0.1, 0.2, 0.3, 0.4], "metadata": {"productId": "123"}}, - {"embeddings": [0.5, 0.6, 0.7, 0.8], "metadata": {"productId": "456"}} + {"embeddings": [0.1, 0.2, 0.3, 0.4], "metadata": {"product_id": "123"}}, + {"embeddings": [0.5, 0.6, 0.7, 0.8], "metadata": {"product_id": "456"}}Also applies to: 434-441
380-384
: Add a short migration note for the breaking changeA brief callout will help users upgrade existing code from variadic to list-based calls.
Apply this diff to add a migration note just before the example:
**Return Value** Returns a list of string IDs for the upserted vectors. +<Callout type="info" title="Breaking change (v1.x)"> + Prior versions accepted variadic document arguments. The new signature now requires a single list of documents. + Update calls like: <code>upsert("name", doc1, doc2)</code> → <code>upsert("name", [doc1, doc2])</code>. +</Callout> + **Example**
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
content/SDKs/python/api-reference.mdx
(19 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Workers Builds: docs
🔇 Additional comments (3)
content/SDKs/python/api-reference.mdx (3)
378-379
: Parameter description correctly updated to array of documentsGood clarification that
documents
is an array. This aligns with the new API.
414-419
: Example correctly uses list-based upsert invocationSwapping the variadic call for a single
documents
list is correct and matches the new API.
176-182
: Be consistent about awaitingrequest.data.email()
In some examples,
email = request.data.email()
is not awaited, while lateremail = await request.data.email()
is used. Please standardize based on the actual API contract (sync vs async) and align the method signature under “Request Handling > email” accordingly.Would you like me to prepare a patch updating all examples and the method signature to the correct async/sync form once you confirm the intended behavior?
Also applies to: 106-112
{"document": "Ergonomic office chair with lumbar support", "metadata": {"category": "furniture"}}, | ||
{"document": "Wireless noise-cancelling headphones", "metadata": {"category": "electronics"}} | ||
[ | ||
{"document": "Ergonomic office chair with lumbar support", "metadata": {"category": "furniture"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a key for all of these right?
"product-embeddings", | ||
{"embeddings": [0.1, 0.2, 0.3, 0.4], "metadata": {"productId": "123"}}, | ||
{"embeddings": [0.5, 0.6, 0.7, 0.8], "metadata": {"productId": "456"}} | ||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Yeah the key field is required for
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
content/SDKs/python/api-reference.mdx (2)
371-379
: Upsert signature still shows variadic args; update to list and document required “key”The signature contradicts the PR objective and the examples below. Also, per reviewer feedback, each upsert document requires a key/id.
Apply the following edits:
-`async upsert(name: str, *documents: VectorUpsertParams) -> list[str]` +`async upsert(name: str, documents: List[VectorUpsertParams]) -> list[str]`-- `documents`: An array of documents to upsert, each with either embeddings or text +- `documents`: A list of documents to upsert. Each document must include a unique "key" (string) and either "embeddings" (list[float]) or "document" (str). Optional "metadata" (dict) can be provided.
451-453
: Update parameter docs to reflect explicit search parameters (no params dict)The parameters list still mentions a params object. Replace with explicit args.
- `name`: The name of the vector storage -- `params`: Search parameters including query, limit, similarity threshold, and metadata filters +- `query`: The search text +- `limit` (optional): Maximum number of results to return (default: 10) +- `similarity` (optional): Minimum similarity threshold between 0 and 1 (default: 0.5) +- `metadata` (optional): Dict of metadata filters to apply
♻️ Duplicate comments (3)
content/SDKs/python/api-reference.mdx (3)
401-417
: Include required "key" on each document in the upsert exampleUpserts require a key/id per document. Add it to the constructed documents.
documents = [] for product in products: documents.append({ + "key": product["id"], "document": product["description"], "metadata": { "product_id": product["id"], "name": product["name"], "category": product["category"], "price": product["price"] } }) # Upsert documents to vector storage try: - ids = await context.vector.upsert("product-descriptions", documents) + ids = await context.vector.upsert("product-descriptions", documents)
425-431
: Add “key” fields to simple text upsert exampleMake the quick-start example valid by supplying required keys.
ids = await context.vector.upsert( "product-descriptions", [ - {"document": "Ergonomic office chair with lumbar support", "metadata": {"category": "furniture"}}, - {"document": "Wireless noise-cancelling headphones", "metadata": {"category": "electronics"}} + {"key": "id1", "document": "Ergonomic office chair with lumbar support", "metadata": {"category": "furniture"}}, + {"key": "id2", "document": "Wireless noise-cancelling headphones", "metadata": {"category": "electronics"}} ] )
434-441
: Add “key” fields to embeddings upsert exampleEmbeddings upsert should also include keys.
ids2 = await context.vector.upsert( "product-embeddings", [ - {"embeddings": [0.1, 0.2, 0.3, 0.4], "metadata": {"productId": "123"}}, - {"embeddings": [0.5, 0.6, 0.7, 0.8], "metadata": {"productId": "456"}} + {"key": "123", "embeddings": [0.1, 0.2, 0.3, 0.4], "metadata": {"productId": "123"}}, + {"key": "456", "embeddings": [0.5, 0.6, 0.7, 0.8], "metadata": {"productId": "456"}} ] )
🧹 Nitpick comments (1)
content/SDKs/python/api-reference.mdx (1)
466-472
: Use positional args for name and query in search example (per reviewer guidance)This matches the current API and reviewer guidance; keep the rest as named args.
- results = await context.vector.search("product-descriptions", - query="comfortable office chair", - limit=5, - similarity=0.7, - metadata={"category": "furniture"} - ) + results = await context.vector.search( + "product-descriptions", + "comfortable office chair", + limit=5, + similarity=0.7, + metadata={"category": "furniture"}, + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
content/SDKs/python/api-reference.mdx
(19 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Workers Builds: docs
🔇 Additional comments (2)
content/SDKs/python/api-reference.mdx (2)
576-583
: Formatting-only changes acknowledgedThese appear to be line-wrapping/formatting updates in Object Storage examples. No functional concerns.
Also applies to: 591-596, 601-610, 614-627, 686-690, 698-703, 705-710
769-771
: Handoff callout clarification LGTMGood to set expectations about project-local handoff and upcoming remote support.
#### search | ||
|
||
`async search(name: str, params: VectorSearchParams) -> list[VectorSearchResult]` | ||
`async search(name: str, query: str, limit: int = 10, similarity: float = 0.5,metadata: Optional[dict] = {}, -> list[VectorSearchResult]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix search signature syntax and avoid mutable default
There’s a stray comma before the return type, a missing parenthesis, and a missing space after a comma. Also prefer None over {} for default metadata in Python docs.
-`async search(name: str, query: str, limit: int = 10, similarity: float = 0.5,metadata: Optional[dict] = {}, -> list[VectorSearchResult]`
+`async search(name: str, query: str, limit: int = 10, similarity: float = 0.5, metadata: Optional[dict] = None) -> list[VectorSearchResult]`
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
`async search(name: str, query: str, limit: int = 10, similarity: float = 0.5,metadata: Optional[dict] = {}, -> list[VectorSearchResult]` | |
`async search(name: str, query: str, limit: int = 10, similarity: float = 0.5, metadata: Optional[dict] = None) -> list[VectorSearchResult]` |
🤖 Prompt for AI Agents
In content/SDKs/python/api-reference.mdx around line 445, the search function
signature has a stray comma before the return type, a missing closing
parenthesis, no space after a comma, and uses a mutable default for metadata;
change the signature to close the parameter list, add the missing space, replace
the default metadata {} with None and type it as Optional[dict] = None, remove
the extra comma before the return arrow, and ensure the return annotation reads
-> list[VectorSearchResult].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
content/SDKs/python/api-reference.mdx (3)
371-371
: Fix upsert signature to accept a list, not variadic argsThe signature still shows a variadic parameter, which contradicts both the PR objective and the examples below. Change it to take a single list of documents.
-`async upsert(name: str, *documents: VectorUpsertParams) -> list[str]` +`async upsert(name: str, documents: list[VectorUpsertParams]) -> list[str]`
402-412
: Include required “key” in each upserted document in the examplePer the API requirement and other examples, each document needs a key. The index_products example currently omits it, making it inaccurate.
for product in products: documents.append({ + "key": product["id"], "document": product["description"], "metadata": { "product_id": product["id"], "name": product["name"], "category": product["category"], "price": product["price"] } })
451-453
: Update search parameter docs to match the new signatureThe Parameters section still references a params object. Replace with explicit parameters to avoid confusion.
- `name`: The name of the vector storage -- `params`: Search parameters including query, limit, similarity threshold, and metadata filters +- `query`: The search query text +- `limit`: Maximum number of results to return +- `similarity`: Similarity threshold (0.0–1.0) +- `metadata` (optional): Dictionary of metadata filters to apply
♻️ Duplicate comments (2)
content/SDKs/python/api-reference.mdx (2)
109-109
: Await the async email() accessor in the exampleMake this consistent with other examples and the async accessor contract.
- email = request.data.email() + email = await request.data.email()
178-178
: Await the async email() accessor in the attachments exampleMatches prior guidance and usage elsewhere in the doc.
- email = request.data.email() + email = await request.data.email()
🧹 Nitpick comments (1)
content/SDKs/python/api-reference.mdx (1)
466-471
: Prefer positional args for name and query in search exampleAlign the example with the recommended usage (positional for the first two arguments, keywords for the rest).
- results = await context.vector.search("product-descriptions", - query="comfortable office chair", - limit=5, - similarity=0.7, - metadata={"category": "furniture"} - ) + results = await context.vector.search( + "product-descriptions", + "comfortable office chair", + limit=5, + similarity=0.7, + metadata={"category": "furniture"}, + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
content/SDKs/python/api-reference.mdx
(20 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Workers Builds: docs
🔇 Additional comments (4)
content/SDKs/python/api-reference.mdx (4)
425-431
: Good: text upsert example includes keys and list-based inputThis aligns with the new contract (list of documents) and the requirement for a key on each document.
434-440
: Good: embeddings upsert example includes keys and list-based inputConsistent with the updated API and key requirement.
495-496
: Wording “key” is consistent with API usageThe parameter description correctly refers to “key” (not “ID”), matching the upsert examples and search semantics.
769-772
: Clear handoff limitation calloutAccurately communicates current scope and near-term plans. No changes needed.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Nicholas Mirigliani <36427716+NobbyBop@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Nicholas Mirigliani <36427716+NobbyBop@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
content/SDKs/python/api-reference.mdx (3)
371-379
: Update upsert signature to take a list, and document that key is requiredThe signature still shows variadic args, which conflicts with the new API and the examples. Also, the parameter description should state that key is required for each document.
-`async upsert(name: str, *documents: VectorUpsertParams) -> list[str]` +`async upsert(name: str, documents: list[VectorUpsertParams]) -> list[str]` **Parameters** - `name`: The name of the vector storage -- `documents`: An array of documents to upsert, each with either embeddings or text +- `documents`: A list of documents to upsert. Each document must include a unique `key` and either `embeddings` or `document` text, and may include `metadata`.
402-412
: Include a required key for each document in index_products exampleExamples below already include key, but this builder omits it. Make keys explicit and consistent.
documents = [] for product in products: documents.append({ + "key": str(product["id"]), "document": product["description"], "metadata": { "product_id": product["id"], "name": product["name"], "category": product["category"], "price": product["price"] } })
445-453
: Fix search signature and parameters section; add sane defaults and remove ‘params’The signature is updated to explicit params (good), but:
- It lacks default values for limit/similarity/metadata commonly used in examples.
- The parameters list still refers to a params object.
-`async search(name: str, query: str, limit: int, similarity: float, metadata: Optional[dict]) -> list[VectorSearchResult]` +`async search(name: str, query: str, limit: int = 10, similarity: float = 0.5, metadata: Optional[dict] = None) -> list[VectorSearchResult]` **Parameters** - `name`: The name of the vector storage -- `params`: Search parameters including query, limit, similarity threshold, and metadata filters +- `query`: The query text to search against +- `limit` (optional): Maximum number of results to return. Defaults to 10 +- `similarity` (optional): Minimum similarity threshold (0.0–1.0). Defaults to 0.5 +- `metadata` (optional): Exact-match metadata filters applied at search time
🧹 Nitpick comments (2)
content/SDKs/python/api-reference.mdx (2)
466-472
: Prefer positional for query argument in search example (matches reviewer guidance)Aligns with the comment to use positional args rather than a params dict.
-results = await context.vector.search("product-descriptions", - query="comfortable office chair", +results = await context.vector.search("product-descriptions", + "comfortable office chair", limit=5, similarity=0.7, metadata={"category": "furniture"} )
506-510
: Use a key-like value in the delete example to match the parameter nameMinor consistency tweak to avoid confusion between legacy “id” and the now-documented “key”.
-deleted_count = await context.vector.delete("product-descriptions", "id1") +deleted_count = await context.vector.delete("product-descriptions", "key_123")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
content/SDKs/python/api-reference.mdx
(20 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Workers Builds: docs
🔇 Additional comments (6)
content/SDKs/python/api-reference.mdx (6)
178-191
: LGTM: attachments example correctly awaits email() and attachment.data()This aligns with the async accessor contract and previous reviewer guidance.
416-419
: LGTM: upsert now passes a single list of documentsMatches the new API shape.
427-431
: LGTM: simple text upsert example includes key and list inputConsistent with the updated API.
436-440
: LGTM: embeddings upsert example includes key and list inputConsistent with the updated API.
488-496
: LGTM: delete now refers to ‘key’ instead of ‘ID’Terminology matches the upsert examples and API behavior.
769-772
: LGTM: callout text reads clearly and sets expectationNo action needed.
async def handler(request: AgentRequest, response: AgentResponse, context: AgentContext): | ||
# Get the incoming email | ||
email = request.data.email() | ||
|
||
email = await request.data.email() | ||
# Create an attachment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the un-awaited email() call; keep only the awaited line
You now correctly await email(), but the previous un-awaited call remains and returns a coroutine. Drop the duplicate line to avoid confusion.
async def handler(request: AgentRequest, response: AgentResponse, context: AgentContext):
# Get the incoming email
- email = request.data.email()
email = await request.data.email()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def handler(request: AgentRequest, response: AgentResponse, context: AgentContext): | |
# Get the incoming email | |
email = request.data.email() | |
email = await request.data.email() | |
# Create an attachment | |
async def handler(request: AgentRequest, response: AgentResponse, context: AgentContext): | |
# Get the incoming email | |
email = await request.data.email() | |
# Create an attachment |
🤖 Prompt for AI Agents
In content/SDKs/python/api-reference.mdx around lines 106 to 110, there's a
duplicate call to request.data.email(): an un-awaited call followed by the
correct awaited call; remove the first un-awaited line so only "email = await
request.data.email()" remains to avoid leaving a stray coroutine and confusion.
Change docs to reflect change in API: upsert takes an array of documents, not each document as a separate argument.
Summary by CodeRabbit