Add production RAG embeddings and clear mypy debt by 2002yy · Pull Request #13 · 2002yy/study-agent

2002yy · 2026-06-05T10:39:42Z

Summary

clear the existing mypy soft-check debt so python -m mypy --explicit-package-bases src is clean locally
add configurable RAG embedding providers with a default local_hash path and an OpenAI-compatible provider for production retrieval
wire Chroma upsert/query through the configured embedding provider, with fake-client contract tests
update README/docs/env templates to distinguish default local-first RAG from explicit production embedding/Chroma configuration

Verification

python -m pytest -q -> 277 passed
python -m ruff check .
python -m mypy --explicit-package-bases src
detect-secrets scan --disable-plugin KeywordDetector --exclude-files '.*\.(pyc|jpg|png|zip)$' .github README.md docs src tests tools config templates roles changelog assets .env.example
python tools\package_project_helper.py . NUL 0
git diff --check

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00607129b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T10:42:27Z

+        if not texts:
+            return []
+
+        request: dict[str, Any] = {"model": self.model, "input": texts}


Batch OpenAI embedding inputs

When indexing a corpus that produces more than the embeddings endpoint allows in one request (for OpenAI, arrays are limited to 2048 inputs and 300k total tokens), this sends the entire texts list in a single API call via index_documents → Chroma upsert. In that common larger-index scenario the OpenAI-compatible provider rejects the request and the whole indexing operation fails, so this should split the input into bounded batches and concatenate the returned vectors in order.

Useful? React with 👍 / 👎.

Add production RAG embeddings and clear mypy debt

0060712

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

2002yy merged commit b877145 into main Jun 5, 2026
2 checks passed

2002yy deleted the codex/rag-production-embeddings-mypy branch June 5, 2026 10:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add production RAG embeddings and clear mypy debt#13

Add production RAG embeddings and clear mypy debt#13
2002yy merged 1 commit into
mainfrom
codex/rag-production-embeddings-mypy

2002yy commented Jun 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

2002yy commented Jun 5, 2026

Summary

Verification

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant