Skip to content

Conversation

paytoniga
Copy link
Collaborator

@paytoniga paytoniga commented Sep 20, 2025

This PR introduces a safe, no-key fallback for web search and loosens config requirements to improve local/dev workflows.

What Changed

  • core/config/settings.py

    • API keys for search providers are now optional.
    • Default search engine set to dummy when unspecified.
    • Result: app can start and function in dev/fallback scenarios without external keys.
  • core/services/web_search_service.py

    • Added DummySearchProvider returning mock results when no real provider is configured/available.
    • WebSearchService now auto-falls back to the dummy provider to keep the system responsive.

Why It Matters

  • Better developer experience: run the app without provisioning keys.
  • Higher resilience: graceful degradation when a provider is misconfigured or unavailable.

Notes / Usage

  • In production, set a real provider and keys to restore full search behavior.
  • Logs indicate when the dummy provider is active for easier troubleshooting.

Verification

  • Start app with no search keys → dummy provider returns mock results.
  • Start app with valid keys → real provider is used; no functional regression observed.

Summary by CodeRabbit

  • New Features

    • Added web search capability with provider selection (Tavily + dummy), result tuning, optional in-memory caching, and fallback behavior.
    • Improved vector storage upserts to support storing associated metadata.
  • API

    • Mounted internal websearch endpoints under /internal.
  • Configuration

    • New example environment settings for web search engine, API keys, max fetch concurrency, and default top-k results.
  • Performance/Reliability

    • HTTP/2-enabled client and improved retry/backoff for web requests.
  • Dependencies

    • Added web content extraction and numeric processing libraries.

enfayz and others added 2 commits September 20, 2025 21:18
- Added web search configuration variables to .env.example
- Updated .gitignore to include a new line for clarity
- Refactored import statement in indexing_router.py for consistency
- Included web answering router in main.py for improved routing structure
- Updated requirements.txt to include trafilatura and ensure httpx uses HTTP/2
Copy link

coderabbitai bot commented Sep 20, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a provider-agnostic asynchronous web search service (Tavily and Dummy providers) with optional in-memory caching, exposes web-answering endpoints under /internal, extends settings and .env.example for web search, updates requirements, and adds VectorDBClient.upsert plus an aliasing change in api/indexing_router.py.

Changes

Cohort / File(s) Summary of changes
Web Search Service
services/web_search_service.py
New module: SearchResult dataclass, SearchProvider ABC, DummySearchProvider, TavilySearchProvider (HTTP requests, retry/backoff, parsing), and WebSearchService with provider registry, async search(...), and optional TTL LRU-like cache.
API Integration
api/main.py, api/endpoints/web_answering.py
Registers/mounts the web_answering router under the /internal prefix with tag websearch; imports added in api/main.py.
Settings & Env
.env.example, config/settings.py
Adds WEB_SEARCH_ENGINE, TAVILY_API_KEY, MAX_FETCH_CONCURRENCY, DEFAULT_TOP_K_RESULTS to example env; Settings gains web_search_engine, tavily_api_key, bing_api_key, max_fetch_concurrency, default_top_k_results.
Vector DB
services/vector_db_service.py, api/indexing_router.py
Adds VectorDBClient.upsert(namespace, ids, vectors, metadata=None) which ensures index exists, validates lengths, and upserts items; api/indexing_router.py now aliases VectorDBClient as VectorDBService.
Dependencies
requirements.txt
Replaces httpx spec with httpx[http2] and adds trafilatura and numpy.
Git Hygiene
.gitignore
Adds blank line after .env.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Client
    participant API as FastAPI (/internal/websearch)
    participant Svc as WebSearchService
    participant Prov as SearchProvider (Tavily / Dummy)
    participant Ext as External Search API

    Client->>API: GET /internal/search?query=...
    API->>Svc: search(query, k, language, region, timeout, use_cache)
    alt Cache hit
        Svc-->>API: cached List<SearchResult>
    else Cache miss
        Svc->>Prov: search(query, k, region, language, timeout)
        alt Tavily selected
            Prov->>Ext: HTTP request (with API key)
            Ext-->>Prov: JSON response
            Prov-->>Svc: parsed List<SearchResult>
        else Dummy selected / fallback
            Prov-->>Svc: synthesized List<SearchResult>
        end
        Svc-->>API: results (caches response)
    end
    API-->>Client: JSON response

    note over Prov: Tavily provider applies retries with exponential backoff on transient errors (e.g., 429)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

I twitch my whiskers, sniff the net,
New tunnels map where queries set.
Tavily hums, dummy hops in play,
Caches tuck crumbs to lead the way.
Hop—answers sprout at break of day. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add DummySearchProvider Fallback and Settings Improvements" is concise and directly describes the primary changes in the PR: adding a DummySearchProvider fallback and updating settings to support that fallback and optional API keys. It clearly reflects the main intent shown in the changed files (services/web_search_service.py and config/settings.py) without extraneous detail. A teammate scanning history would understand the primary change from this title.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7cc4d3 and 16a8693.

📒 Files selected for processing (2)
  • api/indexing_router.py (1 hunks)
  • services/web_search_service.py (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (12)
.env.example (1)

9-13: Default to dummy provider in example and add trailing newline

To match settings’ default/fallback behavior and avoid noisy fallback logs when no key is set, prefer WEB_SEARCH_ENGINE=dummy in the example and keep the API‑specific line commented until a key is available. Also add a trailing newline.

Apply this diff:

-# Web search configuration
-WEB_SEARCH_ENGINE=tavily
-TAVILY_API_KEY=your_tavily_api_key_here
-MAX_FETCH_CONCURRENCY=4
-DEFAULT_TOP_K_RESULTS=8
+# Web search configuration
+# Set to "dummy" for local/dev without keys; switch to "tavily" when you add TAVILY_API_KEY.
+WEB_SEARCH_ENGINE=dummy
+# TAVILY_API_KEY=your_tavily_api_key_here
+MAX_FETCH_CONCURRENCY=4
+DEFAULT_TOP_K_RESULTS=8
+

Also applies to: 7-7

api/indexing_router.py (2)

37-39: Size check uses characters, not bytes

len(src["text"]) counts chars. Enforce MB against bytes to match max_upload_mb.

Apply this diff:

-        if "text" in src and len(src["text"]) > settings.max_upload_mb * 1024 * 1024:
+        if "text" in src and len(src["text"].encode("utf-8")) > settings.max_upload_mb * 1024 * 1024:

50-53: 415 here may mislead

A chunking failure for plain text isn’t a media-type problem. Consider 422 (unprocessable) or 500 with a more specific detail.

-        except Exception:
-            raise HTTPException(status_code=415, detail="UNSUPPORTED_FILE_TYPE")
+        except Exception:
+            raise HTTPException(status_code=422, detail="CHUNKING_FAILED")
config/settings.py (2)

8-9: Remove unused imports

Field isn’t used.

-from pydantic import SecretStr, ValidationError, Field
+from pydantic import SecretStr, ValidationError

30-34: Let BaseSettings load env; avoid direct os.environ in defaults

Declare defaults and let Pydantic map env (e.g., WEB_SEARCH_ENGINE). The current os.environ.get(...) duplicates config loading and complicates precedence.

-    web_search_engine: str = os.environ.get("WEB_SEARCH_ENGINE", "dummy")  # Default to dummy provider if not specified
+    web_search_engine: str = "dummy"  # overridden by WEB_SEARCH_ENGINE if set

Optional: validate concurrency bounds:

# e.g., max_fetch_concurrency: int = Field(default=4, ge=1, le=64)
services/web_search_service.py (7)

125-127: Align k clamp with Tavily’s documented range

Docs show max_results range 0..20 (default 5). Clamp to 1..20 (or 0..20 if you want to allow zero).

-        # Clamp k between 3 and 15
-        k = max(3, min(15, k))
+        # Clamp k to Tavily's documented range
+        k = max(1, min(20, k))

Reference: Tavily max_results parameter. (docs.tavily.com)


170-176: Use logger.exception for failures

Keep stack traces on failures; helps ops and debugging.

-                        logger.error(f"Tavily search failed after {max_retries} attempts: rate limited")
+                        logger.exception("Tavily search failed after %s attempts: rate limited", max_retries)
@@
-                        logger.error(f"Tavily search failed with status {e.response.status_code}: {e.response.text}")
+                        logger.exception("Tavily search failed with status %s: %s", e.response.status_code, e.response.text)
@@
-                    logger.error(f"Tavily request failed: {str(e)}")
+                    logger.exception("Tavily request failed: %s", e)
@@
-            logger.error(f"Failed to initialize {self.provider_name} search provider: {str(e)}")
+            logger.exception("Failed to initialize %s search provider: %s", self.provider_name, e)
@@
-            logger.error(f"Search failed: {str(e)}")
+            logger.exception("Search failed: %s", e)

Also applies to: 164-168, 237-240, 297-297


52-65: Mark unused parameters in Dummy provider to satisfy linters

Prefix with underscores or use them to mirror real provider behavior.

-    async def search(self, query: str, k: int, region: str, language: str, timeout_seconds: int) -> List[SearchResult]:
+    async def search(self, query: str, k: int, region: str, language: str, timeout_seconds: int) -> List[SearchResult]:
@@
-        for i in range(min(k, 5)):  # Cap at 5 results
+        for i in range(min(k, 20)):  # Cap to a sane upper bound

Or:

-    async def search(self, query: str, k: int, region: str, language: str, timeout_seconds: int) -> List[SearchResult]:
+    async def search(self, query: str, k: int, _region: str, _language: str, _timeout_seconds: int) -> List[SearchResult]:

214-233: Fallback to dummy for unknown provider strings (match PR intent)

Currently an unknown provider raises ValueError; better to warn and fall back to dummy, same as init‑failure path.

-        if self.provider_name not in self._providers:
-            raise ValueError(f"Unsupported search provider: {self.provider_name}")
+        if self.provider_name not in self._providers:
+            logger.warning("Unsupported search provider '%s'. Falling back to DummySearchProvider.", self.provider_name)
+            self.provider_name = "dummy"
+            self.provider = DummySearchProvider()
+            return

242-245: Type Optional for k and keep signature tidy

Satisfy Ruff RUF013 and clarify optional arg.

-    async def search(self, query: str, k: int = None, region: str = "auto", 
+    async def search(self, query: str, k: Optional[int] = None, region: str = "auto", 
@@
-        k = k or settings.default_top_k_results
+        k = k or settings.default_top_k_results

(Import Optional already present.)

Also applies to: 263-264


14-16: Prune unused imports

Protocol isn’t used.

-from typing import Dict, List, Optional, Protocol, Type, ClassVar, Mapping
+from typing import Dict, List, Optional, Type, ClassVar, Mapping

209-213: Cache is process‑local and not concurrency‑safe across workers

Fine for dev; in prod behind multiple workers/nodes it won’t share or lock. Document or consider a TTL cache with locks or an external cache if this becomes hot‑path.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af0338c and 6a0e924.

📒 Files selected for processing (7)
  • .env.example (1 hunks)
  • .gitignore (1 hunks)
  • api/indexing_router.py (1 hunks)
  • api/main.py (1 hunks)
  • config/settings.py (2 hunks)
  • requirements.txt (2 hunks)
  • services/web_search_service.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
api/indexing_router.py (1)
services/vector_db_service.py (1)
  • VectorDBClient (10-56)
🪛 dotenv-linter (3.3.0)
.env.example

[warning] 7-7: [UnorderedKey] The CORS_ALLOW_ORIGINS key should go before the POSTGRES_URI key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The TAVILY_API_KEY key should go before the WEB_SEARCH_ENGINE key

(UnorderedKey)


[warning] 12-12: [UnorderedKey] The MAX_FETCH_CONCURRENCY key should go before the TAVILY_API_KEY key

(UnorderedKey)


[warning] 13-13: [EndingBlankLine] No blank line at the end of the file

(EndingBlankLine)


[warning] 13-13: [UnorderedKey] The DEFAULT_TOP_K_RESULTS key should go before the MAX_FETCH_CONCURRENCY key

(UnorderedKey)

🪛 Ruff (0.13.1)
services/web_search_service.py

55-55: Unused method argument: region

(ARG002)


55-55: Unused method argument: language

(ARG002)


55-55: Unused method argument: timeout_seconds

(ARG002)


103-103: Avoid specifying long messages outside the exception class

(TRY003)


108-108: Unused method argument: region

(ARG002)


123-123: Avoid specifying long messages outside the exception class

(TRY003)


160-160: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


164-164: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


167-167: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


170-170: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


170-170: Use explicit conversion flag

Replace with conversion flag

(RUF010)


172-172: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


231-231: Avoid specifying long messages outside the exception class

(TRY003)


236-236: Do not catch blind exception: Exception

(BLE001)


237-237: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


237-237: Use explicit conversion flag

Replace with conversion flag

(RUF010)


242-242: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


295-295: Consider moving this statement to an else block

(TRY300)


297-297: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


297-297: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (4)
.gitignore (1)

4-4: Whitespace tweak is fine

The blank line after .env improves readability between sections.

requirements.txt (2)

7-7: Good switch to httpx HTTP/2 extra

Enabling httpx[http2] is appropriate given external API calls.


16-18: Remove duplicate pydantic and verify heavy deps (numpy/trafilatura)

Verification script returned no matches; confirm whether requirements.txt (lines 16–18) contains a duplicate pydantic entry and whether numpy/trafilatura are required on the request path (cold-start/size impact). If duplicate pydantic is present, apply this diff to dedupe:

- pydantic>=2.6.0,<3.0.0
 trafilatura>=1.6.0,<2.0.0
 numpy>=1.26.0,<2.0.0
api/main.py (1)

49-49: Resolve missing web_answering router and duplicate "/internal" mounts

  • api/endpoints/web_answering.py is missing; main.py imports it (line ~49) and mounts it with prefix="/internal" (line ~52). Add/restore that module or remove/fix the import and include_router call.
  • api/indexing_router.py and api/endpoints/internal.py both declare APIRouter(prefix="/internal") and are both included in main.py — this results in two routers mounted at /internal; consolidate or change prefixes as appropriate.

@fehranbit fehranbit self-requested a review September 21, 2025 12:48
Copy link
Member

@fehranbit fehranbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paytoniga fixed the reviews

@fehranbit fehranbit merged commit 268dbff into main Sep 21, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
services/vector_db_service.py (4)

51-54: Type the public API (and add batch_size kw-only).

Tighten the contract and make batching configurable.

Apply this diff within the shown range:

-    def upsert(self, namespace, ids, vectors, metadata=None):
+    def upsert(
+        self,
+        namespace: str,
+        ids: list[str],
+        vectors: list[list[float]],
+        metadata: "list[dict] | None" = None,
+        *,
+        batch_size: int = 100,
+    ):

Additionally (supporting change outside this range), update typing imports:

-from typing import List
+from typing import List
+# Python 3.10+: using built-in generics in signature above; no extra imports needed.

70-70: Batch upserts to avoid payload limits and improve resilience.

Large single calls can exceed request limits; batching is safer.

-        self.index.upsert(vectors=items, namespace=namespace)
+        for start in range(0, len(items), batch_size):
+            self.index.upsert(vectors=items[start:start + batch_size], namespace=namespace)

37-49: Unify upsert paths; avoid tuple-style vectors with the v3 Pinecone client.

Use the new upsert to keep one codepath and consistent payload shape.

     def upsert_vectors(self, vectors: List[List[float]], ids: List[str]):
-        # Input validation
-        if vectors is None or ids is None:
-            raise ValueError("vectors and ids must not be None.")
-        if not isinstance(vectors, list) or not isinstance(ids, list):
-            raise TypeError("vectors and ids must be lists.")
-        if len(vectors) != len(ids):
-            raise ValueError("vectors and ids must have the same length.")
-        if self.dimension is not None and any(len(v) != self.dimension for v in vectors):
-            raise ValueError("vector dimensionality mismatch with index.")
-        if not self.index:
-            raise RuntimeError("Index is not initialized. Call create_index first.")
-        self.index.upsert(vectors=[(id, vec) for id, vec in zip(ids, vectors)])
+        # Delegate to unified implementation; default namespace ("") is Pinecone's default.
+        self.upsert(namespace="", ids=ids, vectors=vectors)

51-70: Optional: return the Pinecone response for observability/testing.

Returning the last UpsertResponse (or a list when batched) helps with metrics and tests.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a0e924 and d7cc4d3.

📒 Files selected for processing (1)
  • services/vector_db_service.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
services/vector_db_service.py (2)
tests/test_vector_db_service.py (1)
  • upsert (10-12)
api/indexing_router.py (1)
  • index (27-83)
🪛 Ruff (0.13.1)
services/vector_db_service.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)


60-60: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
services/vector_db_service.py (1)

51-61: Verify module path and upsert call‑site compatibility.

  • Import path OK — repo contains config/settings.py (/home/jailuser/git/config/settings.py) and multiple files import from config.settings (services/vector_db_service.py, services/web_search_service.py, services/embedding_service.py, api/indexing_router.py, tests/test_embedding_batch.py). No references to core.config.settings found.
  • Call-site check incomplete — the ripgrep scan returned "No files were searched", so I cannot confirm callers use the new upsert(namespace, ids, vectors, metadata=...) signature. Re-run a full repo search (example: rg -nP '\bupsert\s*(' -S || true) or allow an AST-based scan to enumerate upsert call sites and flag mismatches.

Comment on lines +55 to +57
if self.index is None:
self.create_index()
if not (len(ids) == len(vectors)):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Guard against missing embedding dimension before auto-creating the index.

Without a dimension, create_index will fail at runtime.

-        if self.index is None:
-            self.create_index()
+        if self.index is None:
+            if self.dimension is None:
+                raise RuntimeError(
+                    "embedding dimension is not configured; set settings.embedding_dimension or pass dimension."
+                )
+            self.create_index()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.index is None:
self.create_index()
if not (len(ids) == len(vectors)):
if self.index is None:
if self.dimension is None:
raise RuntimeError(
"embedding dimension is not configured; set settings.embedding_dimension or pass dimension."
)
self.create_index()
if not (len(ids) == len(vectors)):
🤖 Prompt for AI Agents
In services/vector_db_service.py around lines 55-57, the code auto-calls
create_index() without ensuring an embedding dimension exists which will cause a
runtime failure; before calling create_index(), validate that self.dimension is
set or infer it from the provided vectors (e.g., if vectors is non-empty set
self.dimension = len(vectors[0]) after validating all vectors share that
length), and if neither self.dimension nor inferable from vectors, raise a clear
ValueError asking the caller to provide an embedding dimension; adjust
create_index to accept/use the dimension if it currently does not.

Comment on lines +57 to +70
if not (len(ids) == len(vectors)):
raise ValueError("ids and vectors must have the same length")
if metadata is not None and len(metadata) != len(ids):
raise ValueError("metadata length must match ids/vectors length")
items = []
for i, (id_, vector) in enumerate(zip(ids, vectors)):
item = {
"id": id_,
"values": vector
}
if metadata is not None:
item["metadata"] = metadata[i]
items.append(item)
self.index.upsert(vectors=items, namespace=namespace)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add dimension checks, vector normalization, and metadata type validation.

Parity with upsert_vectors and fewer Pinecone-side errors.

-        if not (len(ids) == len(vectors)):
+        if len(ids) != len(vectors):
             raise ValueError("ids and vectors must have the same length")
-        if metadata is not None and len(metadata) != len(ids):
-            raise ValueError("metadata length must match ids/vectors length")
-        items = []
-        for i, (id_, vector) in enumerate(zip(ids, vectors)):
-            item = {
-                "id": id_,
-                "values": vector
-            }
-            if metadata is not None:
-                item["metadata"] = metadata[i]
-            items.append(item)
-        self.index.upsert(vectors=items, namespace=namespace)
+        if metadata is not None and len(metadata) != len(ids):
+            raise ValueError("metadata length must match ids/vectors length")
+        items: list[dict] = []
+        for i, (id_, vector) in enumerate(zip(ids, vectors)):
+            # Dimension validation + normalization (supports numpy arrays)
+            if self.dimension is not None and len(vector) != self.dimension:
+                raise ValueError(
+                    f"vector[{i}] dimensionality {len(vector)} != expected {self.dimension}"
+                )
+            values = vector.tolist() if hasattr(vector, "tolist") else list(vector)
+            if not all(isinstance(x, (int, float)) for x in values):
+                raise TypeError(f"vector[{i}] must be a sequence of numbers")
+            item = {"id": id_, "values": values}
+            if metadata is not None:
+                md = metadata[i]
+                if not isinstance(md, dict):
+                    raise TypeError("each metadata entry must be a dict")
+                item["metadata"] = md
+            items.append(item)
+        self.index.upsert(vectors=items, namespace=namespace)
🧰 Tools
🪛 Ruff (0.13.1)

58-58: Avoid specifying long messages outside the exception class

(TRY003)


60-60: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In services/vector_db_service.py around lines 57 to 70, add parity with
upsert_vectors by validating each vector's dimension, normalizing vectors, and
validating metadata types before calling upsert: check that each vector is a
sequence of numeric values and its length equals the configured embedding
dimension (e.g., self.dimension) and raise ValueError if not; if normalization
is required, convert each vector to a unit vector (divide by its L2 norm,
guarding against zero norm); ensure metadata (when provided) is the expected
type (e.g., dict or list) and raise TypeError for invalid entries; perform these
checks/normalization inside the loop that builds items so you send only
validated, normalized vectors to self.index.upsert(vectors=items,
namespace=namespace).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants