Skip to content

fix(rag): register pgvector adapter on get_connection()#64

Merged
cipher813 merged 2 commits into
mainfrom
fix/rag-db-register-vector
Apr 20, 2026
Merged

fix(rag): register pgvector adapter on get_connection()#64
cipher813 merged 2 commits into
mainfrom
fix/rag-db-register-vector

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

  • Silent-broken-since-day-one fix found during tonight's 2026-04-19 RAG recovery run.
  • `rag.pipelines.filing_change_detection` reads `c.embedding` (pgvector `vector` column) and passes to `np.array(dtype=float32)`. Without `pgvector.psycopg2.register_vector(conn)`, psycopg2 returns the vector as a stringified list and the numpy call raises `ValueError: could not convert string to float`. Registered per-connection (psycopg2 scopes type adapters to the connection, not globally).
  • Steps 1-4 (ingest_sec_filings, ingest_8k_filings, ingest_earnings_finnhub, ingest_theses) only write vectors — writes serialize Python lists → pgvector's string format fine. Step 5 is the first reader of vector columns, which is why the bug was latent until tonight's real-run reached it.

Impact

  • Every Saturday SF RAG step has been exiting non-zero at step 5 since filing_change_detection was added. Downstream Research was blocked because RAGIngestion SF step failed, even though steps 1-4 had written the embeddings Research actually queries.
  • Output `rag/filing_changes/latest.json` is not consumed by any code in alpha-engine-research or alpha-engine-data (grepped both). Pure analytic signal for lazy-filing detection. Bug impact = SF state fire, not trading-decision impact.

Changes

  • `rag/db.py` — `register_vector(conn)` after `psycopg2.connect()` in `get_connection()` context manager. One-line import, 5-line comment explaining the per-connection scoping.
  • `requirements.txt` — add `pgvector>=0.2` (explicit pin; was transitively available but not declared).

Test plan

  • `pytest tests/` — 109 passed
  • Post-merge: re-run RAG ingestion end-to-end, verify step 5 (filing_change_detection) completes and writes `s3://alpha-engine-research/rag/filing_changes/latest.json`
  • Verify heartbeat `AlphaEngine/Heartbeat{Process=rag-ingestion}` emits (was suppressed by the step-5 crash under `set -euo pipefail`)

🤖 Generated with Claude Code

cipher813 and others added 2 commits April 19, 2026 19:37
rag.pipelines.filing_change_detection reads the c.embedding column from
Neon pgvector and feeds it to np.array(dtype=float32). Without the
pgvector.psycopg2 register_vector call on the connection, psycopg2 returns
the vector column as the raw stringified list ('[0.1, 0.2, ...]') and
np.array can't parse it, causing ValueError mid-ingestion.

Surfaced tonight during the 2026-04-19 RAG recovery run (--rag-only real
ingestion). Steps 1-4 wrote successfully (SEC filings, 8-Ks, Finnhub,
theses) but step 5 crashed on the first vector read:

    ValueError: could not convert string to float:
    '[-0.064141005,0.095089,0.07256067,...]'

Fix registers the pgvector type codecs per-connection (psycopg2 scopes
adapters to the connection, not globally). Adds pgvector>=0.2 to
requirements.txt — already a transitive dep of voyageai's ecosystem but
pinning explicitly now that we import from it.

Non-blocking fix: filing_change_detection output (rag/filing_changes/
latest.json) is not consumed by research Lambda or any other pipeline
today — it's an analytic output that flags "lazy filings" (consecutive
10-K/10-Q embeddings with change_score < 0.05). Fix closes a
silent-broken-since-day-one step-5 failure that's been returning non-zero
on every Saturday SF RAG run without breaking anything downstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 5167f7a into main Apr 20, 2026
1 check passed
@cipher813 cipher813 deleted the fix/rag-db-register-vector branch April 20, 2026 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant