Add production RAG embeddings and clear mypy debt#13
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00607129b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not texts: | ||
| return [] | ||
|
|
||
| request: dict[str, Any] = {"model": self.model, "input": texts} |
There was a problem hiding this comment.
When indexing a corpus that produces more than the embeddings endpoint allows in one request (for OpenAI, arrays are limited to 2048 inputs and 300k total tokens), this sends the entire texts list in a single API call via index_documents → Chroma upsert. In that common larger-index scenario the OpenAI-compatible provider rejects the request and the whole indexing operation fails, so this should split the input into bounded batches and concatenate the returned vectors in order.
Useful? React with 👍 / 👎.
Summary
python -m mypy --explicit-package-bases srcis clean locallyVerification
python -m pytest -q-> 277 passedpython -m ruff check .python -m mypy --explicit-package-bases srcdetect-secrets scan --disable-plugin KeywordDetector --exclude-files '.*\.(pyc|jpg|png|zip)$' .github README.md docs src tests tools config templates roles changelog assets .env.examplepython tools\package_project_helper.py . NUL 0git diff --check