-
Notifications
You must be signed in to change notification settings - Fork 7
test: add stale index regression fixture #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # Mark test fixtures as generated to exclude from language stats | ||
| tests/fixtures/*.json linguist-generated=true |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # Bugbot Review: PR feat/p2b-fixture-corpus | ||
|
|
||
| ## Summary | ||
| This PR adds a comprehensive test fixture for FTS5 ranking regression detection. The hermetic test fix addresses a **critical network dependency issue** found by @codex. | ||
|
|
||
| --- | ||
|
|
||
| ## 🐛 Critical Bugs Fixed | ||
|
|
||
| ### 1. **Network Dependency in FTS Test (CRITICAL)** | ||
| **Severity: HIGH** ⚠️ **Found by @codex** | ||
| **Location**: `tests/stale_index_query.test.ts` | ||
|
|
||
| **Issue**: The test originally used `uvx --from sqlite-utils` which requires network access to PyPI on every run, breaking in offline/CI environments. | ||
|
|
||
| **Fix**: ✅ **FIXED** - Refactored to use native Bun SQLite (`db.query()`) for FTS assertions. The FTS ranking test is now fully hermetic. The embedding drift test still legitimately requires `uv run python3` for the embedding model. | ||
|
|
||
| --- | ||
|
|
||
| ### 2. **Missing Dependency Check** | ||
| **Severity: MEDIUM** | ||
| **Location**: `scripts/generate-fixtures.sh` | ||
|
|
||
| **Issue**: Script assumes `uvx` is available without checking. | ||
|
|
||
| **Fix**: ✅ **FIXED** - Added preflight check with helpful error message. | ||
|
|
||
| --- | ||
|
|
||
| ### 3. **Missing `.gitattributes`** | ||
| **Severity: LOW** | ||
|
|
||
| **Fix**: ✅ **FIXED** - Added entry to mark fixtures as linguist-generated for better GitHub stats. | ||
|
|
||
| --- | ||
|
|
||
| ## ⚠️ Design Notes | ||
|
|
||
| ### Cosine Similarity Threshold | ||
| The fixture uses 0.999 threshold. Consider relaxing to 0.995 if you see false positives across different CPU architectures (x86 vs ARM) or BLAS implementations. | ||
|
|
||
| ### Test Timeout | ||
| Current 120s timeout may be tight on slow CI. Consider making it configurable via env var if needed. | ||
|
|
||
| --- | ||
|
|
||
| ## ✅ What's Excellent | ||
|
|
||
| 1. **Hermetic FTS Test**: Core ranking assertions now require zero network access | ||
| 2. **Fixture Provenance**: README and embedded metadata make regeneration transparent | ||
| 3. **Dual Language Coverage**: Bun + pytest ensure the fixture works across ecosystems | ||
| 4. **Control Document**: `orchard-ml-004` is a proper negative control | ||
|
|
||
| --- | ||
|
|
||
| ## Test Results | ||
|
|
||
| - ✅ **pytest**: `tests/test_stale_index_fixture.py` passes | ||
| - ✅ **Bun FTS test**: Now hermetic (no network required) | ||
| - ⚠️ **Bun embedding test**: Requires `uv` (acceptable - needs embedding model) | ||
|
|
||
| --- | ||
|
|
||
| ## Verdict | ||
|
|
||
| **✅ READY TO MERGE** - Critical network dependency fixed, test infrastructure is solid. | ||
|
|
||
| --- | ||
|
|
||
| **Reviewed by**: Bugbot (Claude Agent) + @codex | ||
| **Date**: 2026-04-27 | ||
| **Commits**: ea23dcf (+ bugbot fixes) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,193 @@ | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
|
|
||
| if ! command -v uvx &> /dev/null; then | ||
| echo "ERROR: uvx (from uv package manager) is required but not found in PATH" >&2 | ||
| echo "Install with: curl -LsSf https://astral.sh/uv/install.sh | sh" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" | ||
| FIXTURE_PATH="${1:-$ROOT_DIR/tests/fixtures/stale_index_query.json}" | ||
|
|
||
| mkdir -p "$(dirname "$FIXTURE_PATH")" | ||
|
|
||
| cd "$ROOT_DIR" | ||
| export FIXTURE_PATH | ||
|
|
||
| uv run python3 - <<'PY' | ||
| import json | ||
| import os | ||
| import subprocess | ||
| import tempfile | ||
| from datetime import datetime, timezone | ||
| from pathlib import Path | ||
|
|
||
| from brainlayer._helpers import serialize_f32 | ||
| from brainlayer.embeddings import EMBEDDING_DIM, get_embedding_model | ||
| from brainlayer.vector_store import VectorStore | ||
|
|
||
| fixture_path = Path(os.environ["FIXTURE_PATH"]) | ||
| query_match = "apple AND machine" | ||
| sample_text = "apple machine retrieval baseline for deterministic fixture verification" | ||
|
|
||
| seed_chunks = [ | ||
| { | ||
| "id": "orchard-ml-001", | ||
| "content": ( | ||
| "apple machine apple machine retrieval baseline for orchard robotics. " | ||
| "apple machine notes focus on ranking stability and deterministic fixtures." | ||
| ), | ||
| "summary": "apple machine retrieval baseline", | ||
| "tags": ["search", "fixture", "apple-machine"], | ||
| "resolved_query": "apple machine retrieval baseline", | ||
| "key_facts": [ | ||
| "apple and machine both appear multiple times in the content", | ||
| "document is intentionally concise to rank strongly", | ||
| ], | ||
| "resolved_queries": ["apple machine", "deterministic retrieval baseline"], | ||
| }, | ||
| { | ||
| "id": "orchard-ml-002", | ||
| "content": ( | ||
| "machine learning notes for apple sorting lines and orchard quality control. " | ||
| "This memory mentions apple once and machine once with less dense overlap." | ||
| ), | ||
| "summary": "machine learning for apple sorting", | ||
| "tags": ["orchard", "ml"], | ||
| "resolved_query": "apple sorting machine learning", | ||
| "key_facts": ["lower keyword density than orchard-ml-001"], | ||
| "resolved_queries": ["apple machine"], | ||
| }, | ||
| { | ||
| "id": "orchard-ml-003", | ||
| "content": ( | ||
| "machine maintenance checklist for conveyors, sensors, and cooling fans. " | ||
| "An apple crate jam triggered the maintenance runbook once." | ||
| ), | ||
| "summary": "machine maintenance with one apple mention", | ||
| "tags": ["maintenance"], | ||
| "resolved_query": "machine maintenance apple crate", | ||
| "key_facts": ["contains both query terms but weaker topical focus"], | ||
| "resolved_queries": ["machine maintenance"], | ||
| }, | ||
| { | ||
| "id": "orchard-ml-004", | ||
| "content": ( | ||
| "apple orchard tasting notes and seasonal harvest planning without robotics keywords." | ||
| ), | ||
| "summary": "apple only control document", | ||
| "tags": ["orchard", "taste"], | ||
| "resolved_query": "apple harvest planning", | ||
| "key_facts": ["control row should not satisfy the two-term boolean search"], | ||
| "resolved_queries": ["apple harvest"], | ||
| }, | ||
| ] | ||
|
|
||
| with tempfile.TemporaryDirectory(prefix="brainlayer-fixture-") as tmpdir: | ||
| db_path = Path(tmpdir) / "stale-index.db" | ||
| store = VectorStore(db_path) | ||
| try: | ||
| model = get_embedding_model() | ||
| encoder = model._load_model() | ||
| chunk_embeddings = encoder.encode( | ||
| [chunk["content"] for chunk in seed_chunks], | ||
| convert_to_numpy=True, | ||
| show_progress_bar=False, | ||
| ).tolist() | ||
|
|
||
| cursor = store.conn.cursor() | ||
| for chunk, embedding in zip(seed_chunks, chunk_embeddings): | ||
| cursor.execute( | ||
| """ | ||
| INSERT INTO chunks ( | ||
| id, content, metadata, source_file, project, content_type, value_type, | ||
| char_count, source, summary, tags, resolved_query, key_facts, | ||
| resolved_queries, created_at | ||
| ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) | ||
| """, | ||
| ( | ||
| chunk["id"], | ||
| chunk["content"], | ||
| "{}", | ||
| "tests/fixtures/stale_index_query.json", | ||
| "ecosystem-regression-harness", | ||
| "assistant_text", | ||
| "MEDIUM", | ||
| len(chunk["content"]), | ||
| "fixture-generator", | ||
| chunk["summary"], | ||
| json.dumps(chunk["tags"]), | ||
| chunk["resolved_query"], | ||
| json.dumps(chunk["key_facts"]), | ||
| json.dumps(chunk["resolved_queries"]), | ||
| "2026-04-27T00:00:00Z", | ||
| ), | ||
| ) | ||
| cursor.execute( | ||
| "INSERT INTO chunk_vectors (chunk_id, embedding) VALUES (?, ?)", | ||
| (chunk["id"], serialize_f32([float(value) for value in embedding])), | ||
| ) | ||
|
|
||
| fts_sql = ( | ||
| "SELECT chunk_id, bm25(chunks_fts) AS rank " | ||
| "FROM chunks_fts " | ||
| f"WHERE chunks_fts MATCH '{query_match}' " | ||
| "ORDER BY bm25(chunks_fts), chunk_id" | ||
| ) | ||
| proc = subprocess.run( | ||
| [ | ||
| "uvx", | ||
| "--from", | ||
| "sqlite-utils", | ||
| "sqlite-utils", | ||
| "query", | ||
| str(db_path), | ||
| fts_sql, | ||
| ], | ||
| check=True, | ||
| capture_output=True, | ||
| text=True, | ||
| ) | ||
| ranked_rows = json.loads(proc.stdout) | ||
|
|
||
| baseline_embedding = [float(value) for value in model.embed_query(sample_text)] | ||
|
|
||
| payload = { | ||
| "generated_at": datetime.now(timezone.utc).isoformat(), | ||
| "generator": "scripts/generate-fixtures.sh", | ||
| "sqlite_snapshot": { | ||
| "db_backend": "sqlite", | ||
| "fts_table": "chunks_fts", | ||
| "seed_chunk_count": len(seed_chunks), | ||
| "query_sql": fts_sql, | ||
| }, | ||
| "embedding_model": { | ||
| "name": model.model_name, | ||
| "dimension": EMBEDDING_DIM, | ||
| "dtype": "float32", | ||
| }, | ||
| "query": { | ||
| "match": query_match, | ||
| "expected_ids": [row["chunk_id"] for row in ranked_rows], | ||
| "baseline_rows": ranked_rows, | ||
| }, | ||
| "sample_text": { | ||
| "text": sample_text, | ||
| "baseline_embedding": baseline_embedding, | ||
| "min_cosine_similarity": 0.999, | ||
| }, | ||
| "chunks": [ | ||
| { | ||
| **chunk, | ||
| "embedding": [float(value) for value in embedding], | ||
| } | ||
| for chunk, embedding in zip(seed_chunks, chunk_embeddings) | ||
| ], | ||
| } | ||
| finally: | ||
| store.close() | ||
|
|
||
| fixture_path.write_text(json.dumps(payload, indent=2) + "\n") | ||
| print(f"Wrote fixture to {fixture_path}") | ||
| PY | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| # Fixture Provenance | ||
|
|
||
| - `stale_index_query.json` is generated by `scripts/generate-fixtures.sh`. | ||
| - The generator creates a fresh temporary BrainLayer SQLite DB, seeds `chunks` plus `chunk_vectors`, and derives the FTS5 baseline with `sqlite-utils query`. | ||
| - Embedding baselines come from BrainLayer's configured query embedding model (`BAAI/bge-large-en-v1.5`, 1024-d float32 at generation time). | ||
| - Regenerate with: `scripts/generate-fixtures.sh` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing commit before subprocess reads database rows
Medium Severity
After inserting rows via
cursor.execute()onstore.conn, the script callssubprocess.runwithsqlite-utilsto query the same database without first callingstore.conn.commit(). Python'ssqlite3module uses implicit transactions by default, so the subprocess (which opens its own connection) may not see uncommitted rows, potentially producing an empty result forranked_rows.Reviewed by Cursor Bugbot for commit df2d2d2. Configure here.