Skip to content

Conversation

albertyosef
Copy link
Collaborator

@albertyosef albertyosef commented Sep 12, 2025

Overview

This PR refactors the /internal/index endpoint to fully adopt the new VectorDBClient for Pinecone integration. It introduces stricter index management, better error handling, and improved observability, ensuring the indexing pipeline is reliable and production-ready.


Features

  • VectorDBClient Integration

    • Replaced legacy service calls with the new VectorDBClient.
    • Unified index lifecycle: ensure index exists before upserting vectors.
  • Index Name Sanitization

    • Enforced Pinecone-compliant naming:

      • Lowercase letters only
      • Alphanumeric characters and hyphens permitted
    • Prevents runtime failures due to invalid index names.

  • Enhanced Observability

    • Added structured logging for service calls, success, and error states.
    • Provides clearer insights for debugging and monitoring.

Fixes

  • Addressed potential failures in index creation by ensuring idempotency.
  • Prevented silent failures on invalid configurations by surfacing explicit error logs.
  • Eliminated dependency on any legacy index handling code.

Refactors

  • Simplified endpoint workflow:

    1. Sanitize index name.
    2. Ensure index exists.
    3. Upsert vectors with metadata.
  • Streamlined error-handling logic to reduce duplication.

  • Updated internal documentation and inline comments for future maintainers.


Acceptance Criteria

  • Index creation is automatic and idempotent.
  • /internal/index works reliably with VectorDBClient.
  • All vector upserts succeed when given valid input.
  • Invalid index names are sanitized before submission.
  • Logs provide actionable detail for failures and service calls.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed indexing failures for project IDs containing special characters.
    • Standardized HTTP responses for indexing errors to be more consistent and informative.
  • Refactor
    • Streamlined the indexing flow to automatically prepare indexes before data is added, improving stability.
  • Chores
    • Enhanced logging around indexing operations for better observability and troubleshooting.

Copy link

coderabbitai bot commented Sep 12, 2025

Walkthrough

Migrates api/indexing_router.py from VectorDBService to VectorDBClient, adds index preparation (sanitized index name and create_index), updates upsert call to upsert_vectors, and introduces structured logging and refined error handling.

Changes

Cohort / File(s) Summary
Vector DB client migration and indexing flow
api/indexing_router.py
Replace VectorDBService with VectorDBClient; add sanitize_index_name helper; create index before upsert; change upsert to vdb.upsert_vectors(vectors, ids); add logger usage for instantiation, index creation, upsert attempts, and error logs; preserve 422 on ValueError, map others to HTTP 500.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Router as Indexing Router
  participant VDB as VectorDBClient
  participant Util as sanitize_index_name

  Client->>Router: POST /index { project_id, vectors, ids }
  note over Router: Initialize logger and VectorDBClient

  Router->>Util: sanitize(project_id)
  Util-->>Router: index_name

  Router->>VDB: create_index(index_name)
  alt create_index fails
    Router-->>Client: 500 VECTOR_DB_UPSERT_FAILED
  else create_index ok
    Router->>VDB: upsert_vectors(vectors, ids)
    alt upsert raises ValueError
      Router-->>Client: 422 Unprocessable Entity
    else upsert raises other Exception
      Router-->>Client: 500 VECTOR_DB_UPSERT_FAILED
    else upsert ok
      Router-->>Client: 200 OK
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly reflects the primary change—refactoring the indexing router to use VectorDBClient and add Pinecone integration—so it matches the PR objectives and is not misleading, but it is slightly wordy and repeats "integration," which reduces conciseness.

Poem

I thump my paws—new paths align,
Indices named, all safe and fine.
Vectors hop in tidy pairs,
Logs like stars in midnight airs.
If errors loom, we won’t despair—
We sniff, we trace, we swiftly repair. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/indexing-router-pinecone-integration

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
api/indexing_router.py (1)

28-31: Remove inner import logging; define a module-level logger to avoid F811 and duplicate setup.

Re-importing logging inside the handler shadows the module import and triggers Ruff F811. Prefer a module-level logger and reuse it.

Apply:

-    import logging
-    logger = logging.getLogger("indexing_router")
     vdb = VectorDBClient()
-    logger.info("Pinecone VectorDBClient instantiated.")

Then (outside this hunk), add once near the imports:

# at module top, near other imports
logger = logging.getLogger("indexing_router")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9fbaf8d and bc56a74.

📒 Files selected for processing (1)
  • api/indexing_router.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
api/indexing_router.py (2)
services/vector_db_service.py (4)
  • VectorDBClient (10-56)
  • create_index (24-35)
  • upsert_vectors (37-49)
  • __init__ (10-25)
tests/test_vector_db_service.py (1)
  • vdb (17-20)
🪛 Ruff (0.12.2)
api/indexing_router.py

28-28: Redefinition of unused logging from line 11

Remove definition: logging

(F811)


89-89: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


90-90: Do not catch blind exception: Exception

(BLE001)


91-91: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (1)
api/indexing_router.py (1)

7-7: Good swap to the new client.

Switching to VectorDBClient aligns with the refactor direction.

@fehranbit fehranbit merged commit 85b0cb3 into feature/vector-db-service-refactor-and-tests Sep 14, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants