Skip to content

add openAI embedding support#186

Merged
ishaanxgupta merged 1 commit into
mainfrom
develop
May 19, 2026
Merged

add openAI embedding support#186
ishaanxgupta merged 1 commit into
mainfrom
develop

Conversation

@ishaanxgupta
Copy link
Copy Markdown
Member

No description provided.

@github-actions
Copy link
Copy Markdown

Fails
🚫

🔐 This PR modifies sensitive files: src/config/settings.py. These require review by a core maintainer (@ishaanxgupta or @ved015) before merging.

🚫

📋 PR description is too short. Please describe:

  • What changed and Why
  • Any relevant issue links (Closes #NNN)
  • Steps to test manually
Warnings
⚠️

🧪 Source files in src/ were modified but no test files changed. Please add or update tests to cover your changes.

Messages
📖

📝 No CHANGELOG.md update detected. If this PR introduces a user-visible change, please add an entry.

📖

✅ Targeting main. Please squash commits before merging to keep the git history clean.

Generated by 🚫 dangerJS against aa86886

@github-actions
Copy link
Copy Markdown

✅ Staging Deployment Report

Item Value
Branch develop
Commit 7e959f6
Environment Staging
Health http://13.232.74.176:8001/health
API Docs http://13.232.74.176:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates OpenAI embedding models into the ingestion pipeline, adding support for models like text-embedding-3-small and large. Key changes include a new lazy-loaded OpenAI client, auto-detection of OpenAI models, and logic to handle native dimension reduction and analytics tracking. Review feedback identified a potential AttributeError when tracking token usage if the API response lacks usage data, suggesting a safer way to access these attributes to ensure robustness.

Comment thread src/pipelines/ingest.py
embedding = response.data[0].embedding

# Track embedding call for cost analytics
input_tokens = getattr(response.usage, "total_tokens", 0) or len(text.split())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Accessing response.usage.total_tokens directly via getattr on response.usage will raise an AttributeError if response.usage is None. While the OpenAI API typically returns usage information, it's safer to handle cases where it might be missing, especially when using OpenAI-compatible proxies or different library versions that might not guarantee the presence of the usage object.

    usage = getattr(response, "usage", None)
    input_tokens = getattr(usage, "total_tokens", 0) if usage else len(text.split())

@ishaanxgupta ishaanxgupta merged commit 7512003 into main May 19, 2026
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant