-
Notifications
You must be signed in to change notification settings - Fork 1
fix: Python SDK and Cloud docs improvements #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Update vector storage (`upsert`, `search`) signatures - Document required "key" field in vector storage - Fix async/await consistency in email examples - Add parameter defaults and improve vector storage examples - Clarify email source trigger behavior in Agents console page
WalkthroughEmail handling moved from synchronous Location-header blocking delivery to asynchronous queued processing; Python docs updated to await email retrieval, Vector Storage upsert now accepts a list of documents with per-document keys, and search APIs return similarity with defaulted parameters. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Sender as Email Sender
participant Mail as Email Source
participant Queue as Processing Queue
participant Agent as Agent
participant Dest as Configured Destinations
Sender->>Mail: Send email
Mail->>Queue: Enqueue message (async)
note right of Queue #f8f3d4: No blocking Location header / no waiting URL
Queue-->>Agent: Dispatch job
Agent->>Agent: Process message
Agent-->>Dest: Deliver replies to configured destinations
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Deploying with
|
Status | Name | Latest Commit | Preview URL | Updated (UTC) |
---|---|---|---|---|
✅ Deployment successful! View logs |
docs | 9781233 | Commit Preview URL Branch Preview URL |
Aug 20 2025, 12:54 AM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
content/SDKs/python/api-reference.mdx (2)
106-129
: Doc correctness: mark sendReply as async in its signature.The example uses
await email.sendReply(...)
, so the method should be documented as asynchronous.Proposed correction (outside this exact range, at the method signature definition above):
-`sendReply(request, context, subject=None, text=None, html=None, attachments=None, from_email=None, from_name=None)` +`async sendReply(request, context, subject=None, text=None, html=None, attachments=None, from_email=None, from_name=None)`
1067-1099
: Doc correctness: make email() explicitly async in the signature.You correctly updated examples to
await request.data.email()
, but the signature still shows a synchronous method. Adjust the signature to reflect its async nature:-`email() -> Email` +`async email() -> Email`Also scan the “Request Handling” section for similar inconsistencies (e.g.,
json
,text
,binary
) and align examples to either properties or async methods consistently (e.g.,await request.data.text()
vsrequest.data.text
).
🧹 Nitpick comments (4)
content/Cloud/agents.mdx (1)
159-159
: Email source behavior clarified; consider adding one more sentence on delivery prerequisites.The async processing note is good and aligns with the broader shift away from synchronous polling. To prevent confusion, explicitly state that replies require at least one outbound destination (e.g., Email Destination) to be configured; otherwise, the agent processes the email without sending a reply.
Apply this small edit:
-After an email is received, the platform will enqueue processing and your agent will handle the message asynchronously. Any replies will be delivered via the configured destination(s). +After an email is received, the platform will enqueue processing and your agent will handle the message asynchronously. Any replies will be delivered via the configured destination(s). If no outbound destinations are configured, the agent will process the email but no reply will be sent.content/SDKs/python/api-reference.mdx (3)
370-378
: Vector upsert signature LGTM; clarify idempotency and return semantics.The signature and “key” requirement look good. Two quick doc improvements:
- Clarify that upserting with an existing key updates that document (idempotent behavior).
- Clarify whether the returned list contains the provided keys or internally generated IDs.
Minimal edits within this block:
-`async upsert(name: str, documents: List[VectorUpsertParams]) -> list[str]` +`async upsert(name: str, documents: List[VectorUpsertParams]) -> list[str]` ... -- `documents`: A list of documents to upsert. Each document must include a unique `key` and either `document` (text) or `embeddings` +- `documents`: A list of documents to upsert. Each document must include a unique `key` and either `document` (text) or `embeddings`. If a document with the same `key` already exists, it will be updated (idempotent).And (outside this specific range) consider changing the return value line to:
- Return Value: “Returns a list of string keys (matching the provided
key
values) for the upserted vectors.” If the API truly returns internal IDs, specify that explicitly.
445-456
: Search signature and defaults look good; fix “similarity” vs “distance” terminology mismatch.Parameters and defaults are clear. However, elsewhere in this section you refer to a “distance score,” while the parameter is a “similarity” threshold. This can confuse users. Align the terms by either:
- Using “similarity” consistently (and expose/print
result.similarity
in examples), or- Using neutral “score,” and clarify whether larger/smaller is better.
Also clarify whether the similarity threshold is inclusive and its exact range semantics (e.g., 0.0 ≤ similarity ≤ 1.0).
Suggested doc wording (outside this range):
- Return Value: “Returns a list of search results, each containing an ID, metadata, and a similarity score (0.0–1.0, higher is more similar).”
And update the example logging lines to match:
# Replace this: print(f"Product ID: {result.id}, Similarity: {result.distance}") # With one of these, depending on actual field name: print(f"Product ID: {result.id}, Similarity: {result.similarity}") # or, if the SDK exposes distance instead of similarity: print(f"Product ID: {result.id}, Distance: {result.distance}")
386-399
: Minor: strengthen example by showing update behavior on duplicate keys.Since upsert now requires a
key
, it’s useful to demonstrate that reusing a key updates an existing vector (idempotency). Consider adding a short follow-up snippet to the example:# Update an existing product by reusing the same key update_ids = await context.vector.upsert( "product-descriptions", [{"key": "key_123", "document": "Updated description for the ergonomic chair"}] ) # update_ids should contain ["key_123"] (or the internal ID if that is what is returned)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
content/Cloud/agents.mdx
(1 hunks)content/SDKs/python/api-reference.mdx
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Workers Builds: docs
🔇 Additional comments (3)
content/SDKs/python/api-reference.mdx (3)
404-404
: Good example addition calling out required per-document key.The inline “REQUIRED: unique key” note will help users avoid common mistakes. No changes requested.
469-475
: Example usage updated to positional query — looks correct.Passing the query text positionally after the name parameter matches the new signature. No changes requested.
512-512
: Delete example updated to key-based deletion — LGTM.The example now aligns with the per-document key approach introduced in upsert. No changes requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
content/SDKs/python/examples/index.mdx (4)
175-181
: Confirm upsert return type; consider renaming ids → keys for clarityIf upsert returns the per-document keys (as the examples now require a "key" on input), consider renaming the variable and response field for consistency. If it truly returns generated IDs, ignore this.
Proposed rename if the SDK returns keys:
- ids = await context.vector.upsert("products", documents) + keys = await context.vector.upsert("products", documents) return response.json({ - "message": f"Indexed {len(ids)} products successfully", - "ids": ids + "message": f"Indexed {len(keys)} products successfully", + "keys": keys })
188-196
: Rely on defaults to simplify the search call (optional)If the SDK now provides sensible defaults for limit, similarity, and metadata, trimming the call improves readability in examples.
Apply if acceptable:
- results = await context.vector.search( - "products", - query, - limit=5, - similarity=0.5, - metadata={} - ) + results = await context.vector.search("products", query)
213-227
: Tighten delete example, fix message grammar, and clarify input shape
- The example reads from products[0]["id"] for deletion, which is slightly confusing in a delete action. If the API expects a key, consider reading data.get("key") (or "id") directly rather than a list of products.
- Message grammar: “Deleted {count} product successfully” should pluralize.
If you keep the current input shape, at least fix pluralization:
- return response.json({ - "message": f"Deleted {count} product successfully", - "id": product_id - }) + noun = "product" if count == 1 else "products" + return response.json({ + "message": f"Deleted {count} {noun} successfully", + "id": product_id + })If the delete API supports multiple keys, consider accepting a list and passing it through; otherwise, ensure the docs clearly specify a single key.
160-173
: Guard against missing fields in example input (optional)The example indexes product["id"], ["name"], ["price"], ["category"], and ["description"] directly. For robustness in examples, consider using .get() or validating required fields to avoid KeyError if a caller passes partial objects.
Example minimal validation:
- documents = [ - { - "key": product["id"], - "document": product["description"], - "metadata": { - "id": product["id"], - "name": product["name"], - "price": product["price"], - "category": product["category"] - } - } - for product in products - ] + required = ("id", "name", "price", "category", "description") + for p in products: + missing = [k for k in required if k not in p] + if missing: + return response.json({"error": f"Missing fields in product: {', '.join(missing)}"}) + documents = [ + { + "key": p["id"], + "document": p["description"], + "metadata": { + "id": p["id"], + "name": p["name"], + "price": p["price"], + "category": p["category"] + } + } + for p in products + ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
content/SDKs/python/examples/index.mdx
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Workers Builds: docs
🔇 Additional comments (1)
content/SDKs/python/examples/index.mdx (1)
198-204
: Reading similarity from result.similarity looks correctThis aligns with the updated Vector Storage search response shape. No issues spotted.
- Remove incorrect unpacking operator in Vector DB guide - Document that Python SDK requires "key" field, JavaScript doesn't - Clarify Python returns `similarity`, JavaScript returns `distance` - Discuss idempotent behavior in Vector DB guide - Update return value descriptions to note internal system IDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
content/Guides/vector-db.mdx (3)
51-54
: Clarify SDK differences (add delete semantics and upsert call shape).Helpful to call out two more differences so readers don’t infer parity:
- JS delete uses internal vector IDs; Python delete uses your provided keys.
- JS upsert accepts variadic document args; Python upsert accepts a single list.
Apply within this block:
**SDK Differences:** - **JavaScript SDK**: No key field required, search returns `distance` (0 = perfect match) - **Python SDK**: Requires `key` field for each document, search returns `similarity` (1.0 = perfect match) + - **Delete semantics**: JavaScript deletes by internal vector ID; Python deletes by your document `key` + - **Upsert call shape**: JavaScript accepts multiple document args; Python requires a single list of documents
55-57
: Polish: use an em dash or colon instead of a hyphen.Minor readability tweak.
-**Idempotent Behavior:** -The upsert operation is idempotent - upserting with an existing key updates the existing vector rather than creating a duplicate. The same internal vector ID is reused, ensuring your vector storage remains clean and efficient. +**Idempotent Behavior:** +The upsert operation is idempotent — upserting with an existing key updates the existing vector rather than creating a duplicate. The same internal vector ID is reused, ensuring your vector storage remains clean and efficient.
89-112
: Python examples look correct; add a quick note about returned IDs vs keys.Code reflects the new list-based upsert and required keys. Consider adding a one-line note after each upsert to avoid confusion between returned internal IDs and user-provided keys.
-ids = await context.vector.upsert("knowledge-base", documents) +ids = await context.vector.upsert("knowledge-base", documents) +# Note: Returned `ids` are internal system IDs. Use your `key` for future operations like delete. ... -ids = await context.vector.upsert("custom-embeddings", embedding_docs) +ids = await context.vector.upsert("custom-embeddings", embedding_docs) +# Note: Returned `ids` are internal system IDs. Use your `key` for future operations like delete.content/SDKs/python/api-reference.mdx (2)
370-384
: Tighten the upsert parameter contract (exactly one of document or embeddings).Current text could be read as “either or both.” Recommend making exclusivity explicit.
-`async upsert(name: str, documents: List[VectorUpsertParams]) -> list[str]` +`async upsert(name: str, documents: List[VectorUpsertParams]) -> list[str]` ... -- `documents`: A list of documents to upsert. Each document must include a unique `key` and either `document` (text) or `embeddings` +- `documents`: A list of documents to upsert. Each document must include a unique `key` and exactly one of `document` (text) or `embeddings`.Also, good call including the idempotency note and clarifying returned IDs are internal.
461-466
: Minor wording: simplify bullets for readability.Replace “e.g.” with “for example” and remove parentheses to reduce punctuation noise in list items.
-Returns a list of search results with the following attributes: -- `id`: Internal vector ID (e.g., "vector_94c98660c8afa3a2") -- `key`: The unique key for this vector -- `similarity`: Similarity score from 0.0 to 1.0 (1.0 = perfect match) -- `metadata`: The associated metadata dictionary +Returns a list of search results with the following attributes: +- `id`: Internal vector ID, for example "vector_94c98660c8afa3a2" +- `key`: The unique key for this vector +- `similarity`: Similarity score from 0.0 to 1.0; 1.0 is a perfect match +- `metadata`: The associated metadata dictionary
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
content/Guides/vector-db.mdx
(3 hunks)content/SDKs/python/api-reference.mdx
(7 hunks)
🧰 Additional context used
🪛 LanguageTool
content/Guides/vector-db.mdx
[grammar] ~51-~51: There might be a mistake here.
Context: ...computed embeddings. SDK Differences: - JavaScript SDK: No key field required,...
(QB_NEW_EN)
[grammar] ~55-~55: There might be a mistake here.
Context: ...perfect match) Idempotent Behavior: The upsert operation is idempotent - ups...
(QB_NEW_EN)
content/SDKs/python/api-reference.mdx
[grammar] ~461-~461: There might be a mistake here.
Context: ...h results with the following attributes: - id
: Internal vector ID (e.g., "vector_94c9...
(QB_NEW_EN)
[grammar] ~462-~462: There might be a mistake here.
Context: ...tor ID (e.g., "vector_94c98660c8afa3a2") - key
: The unique key for this vector - `simi...
(QB_NEW_EN)
[grammar] ~463-~463: There might be a mistake here.
Context: ... - key
: The unique key for this vector - similarity
: Similarity score from 0.0 to 1.0 (1.0 ...
(QB_NEW_EN)
[grammar] ~464-~464: There might be a mistake here.
Context: ...re from 0.0 to 1.0 (1.0 = perfect match) - metadata
: The associated metadata dictionary **...
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Workers Builds: docs
🔇 Additional comments (5)
content/Guides/vector-db.mdx (1)
151-151
: LGTM: direct similarity usage in Python.Using
result.similarity
aligns with the Python SDK behavior documented below.content/SDKs/python/api-reference.mdx (4)
406-406
: LGTM: emphasize required key in example.“REQUIRED” annotation is clear and matches the new contract.
447-458
: LGTM: updated search signature with defaults.Explicit parameters and defaults are clear and match usage in examples.
475-481
: LGTM: example uses positional query and prints similarity.Matches the new signature and the result shape.
Also applies to: 487-487
509-509
: LGTM: delete returns count and example uses key.Consistent with the keys-as-external-IDs model in Python.
Also applies to: 518-518
|
||
The `upsert` operation inserts new documents or updates existing ones. You can provide either text (which gets automatically converted to embeddings) or pre-computed embeddings. | ||
|
||
**SDK Differences:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should fix this fwiw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
content/Guides/vector-db.mdx (1)
51-54
: Tighten wording and fix minor grammar in “SDK Differences”; verify JS upsert varargs example.
- Prefer precise phrasing and numeric formatting for metrics.
- Past reviewer flagged this area; applying a concrete fix below.
Apply:
-**SDK Differences:** -- **JavaScript SDK**: No key field required, search returns `distance` (0 = perfect match) -- **Python SDK**: Requires `key` field for each document, search returns `similarity` (1.0 = perfect match) +**SDK differences** +- **JavaScript SDK**: No key field required; search returns `distance` (0.0 = perfect match) +- **Python SDK**: Requires a `key` per document; search returns `similarity` (1.0 = perfect match)To ensure consistency across the docs with the updated signatures, please run:
#!/bin/bash # Verify upsert/search usage shapes across the docs set -euo pipefail echo "Occurrences of vector.upsert:" rg -nC2 '\bvector\.upsert\s*\(' echo echo "Occurrences of vector.search:" rg -nC2 '\bvector\.search\s*\('
🧹 Nitpick comments (4)
content/Guides/vector-db.mdx (4)
55-57
: Replace hyphen with em dash (or colon) and tighten phrasing in “Idempotent Behavior.”Hyphen reads as a minus; use an em dash or a colon.
-**Idempotent Behavior:** -The upsert operation is idempotent - upserting with an existing key updates the existing vector rather than creating a duplicate. The same internal vector ID is reused, ensuring your vector storage remains clean and efficient. +**Idempotent behavior** +The upsert operation is idempotent—upserting with an existing key updates the existing vector rather than creating a duplicate. The same internal vector ID is reused, keeping your vector storage clean and efficient.
84-101
: Python upsert sample looks correct; consider renaming return variable for clarity.Great to see per-document keys and list-based upsert. Minor clarity nit: these are internal vector IDs, not the document keys.
-ids = await context.vector.upsert("knowledge-base", documents) +vector_ids = await context.vector.upsert("knowledge-base", documents)
103-112
: Embedding upsert sample is aligned; mirror the naming nit for consistency.Same rationale as above—avoid conflating returned IDs with user-provided keys.
-ids = await context.vector.upsert("custom-embeddings", embedding_docs) +vector_ids = await context.vector.upsert("custom-embeddings", embedding_docs)
163-164
: Clarify which identifiers are used for deletion across SDKs.Slightly more explicit to help readers map IDs vs. keys.
-Remove specific vectors from storage using their IDs (JavaScript) or keys (Python). +Remove specific vectors using their vector IDs (JavaScript) or the document keys used at upsert (Python).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
content/Guides/vector-db.mdx
(5 hunks)
🧰 Additional context used
🪛 LanguageTool
content/Guides/vector-db.mdx
[grammar] ~51-~51: There might be a mistake here.
Context: ...computed embeddings. SDK Differences: - JavaScript SDK: No key field required,...
(QB_NEW_EN)
[grammar] ~55-~55: There might be a mistake here.
Context: ...perfect match) Idempotent Behavior: The upsert operation is idempotent - ups...
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Workers Builds: docs
🔇 Additional comments (2)
content/Guides/vector-db.mdx (2)
148-152
: Python search: using result.similarity is correct.The example correctly reflects the Python SDK’s similarity return shape.
179-185
: Python delete example aligns with key-based deletion and single-item support.Looks good and matches the stated behavior.
upsert
,search
) signaturesSummary by CodeRabbit