Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Mark test fixtures as generated to exclude from language stats
tests/fixtures/*.json linguist-generated=true
72 changes: 72 additions & 0 deletions BUGBOT_REVIEW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Bugbot Review: PR feat/p2b-fixture-corpus

## Summary
This PR adds a comprehensive test fixture for FTS5 ranking regression detection. The hermetic test fix addresses a **critical network dependency issue** found by @codex.

---

## 🐛 Critical Bugs Fixed

### 1. **Network Dependency in FTS Test (CRITICAL)**
**Severity: HIGH** ⚠️ **Found by @codex**
**Location**: `tests/stale_index_query.test.ts`

**Issue**: The test originally used `uvx --from sqlite-utils` which requires network access to PyPI on every run, breaking in offline/CI environments.

**Fix**: ✅ **FIXED** - Refactored to use native Bun SQLite (`db.query()`) for FTS assertions. The FTS ranking test is now fully hermetic. The embedding drift test still legitimately requires `uv run python3` for the embedding model.

---

### 2. **Missing Dependency Check**
**Severity: MEDIUM**
**Location**: `scripts/generate-fixtures.sh`

**Issue**: Script assumes `uvx` is available without checking.

**Fix**: ✅ **FIXED** - Added preflight check with helpful error message.

---

### 3. **Missing `.gitattributes`**
**Severity: LOW**

**Fix**: ✅ **FIXED** - Added entry to mark fixtures as linguist-generated for better GitHub stats.

---

## ⚠️ Design Notes

### Cosine Similarity Threshold
The fixture uses 0.999 threshold. Consider relaxing to 0.995 if you see false positives across different CPU architectures (x86 vs ARM) or BLAS implementations.

### Test Timeout
Current 120s timeout may be tight on slow CI. Consider making it configurable via env var if needed.

---

## ✅ What's Excellent

1. **Hermetic FTS Test**: Core ranking assertions now require zero network access
2. **Fixture Provenance**: README and embedded metadata make regeneration transparent
3. **Dual Language Coverage**: Bun + pytest ensure the fixture works across ecosystems
4. **Control Document**: `orchard-ml-004` is a proper negative control

---

## Test Results

- ✅ **pytest**: `tests/test_stale_index_fixture.py` passes
- ✅ **Bun FTS test**: Now hermetic (no network required)
- ⚠️ **Bun embedding test**: Requires `uv` (acceptable - needs embedding model)

---

## Verdict

**✅ READY TO MERGE** - Critical network dependency fixed, test infrastructure is solid.

---

**Reviewed by**: Bugbot (Claude Agent) + @codex
**Date**: 2026-04-27
**Commits**: ea23dcf (+ bugbot fixes)
193 changes: 193 additions & 0 deletions scripts/generate-fixtures.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
#!/usr/bin/env bash
set -euo pipefail

if ! command -v uvx &> /dev/null; then
echo "ERROR: uvx (from uv package manager) is required but not found in PATH" >&2
echo "Install with: curl -LsSf https://astral.sh/uv/install.sh | sh" >&2
exit 1
fi

ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
FIXTURE_PATH="${1:-$ROOT_DIR/tests/fixtures/stale_index_query.json}"

mkdir -p "$(dirname "$FIXTURE_PATH")"

cd "$ROOT_DIR"
export FIXTURE_PATH

uv run python3 - <<'PY'
import json
import os
import subprocess
import tempfile
from datetime import datetime, timezone
from pathlib import Path

from brainlayer._helpers import serialize_f32
from brainlayer.embeddings import EMBEDDING_DIM, get_embedding_model
from brainlayer.vector_store import VectorStore

fixture_path = Path(os.environ["FIXTURE_PATH"])
query_match = "apple AND machine"
sample_text = "apple machine retrieval baseline for deterministic fixture verification"

seed_chunks = [
{
"id": "orchard-ml-001",
"content": (
"apple machine apple machine retrieval baseline for orchard robotics. "
"apple machine notes focus on ranking stability and deterministic fixtures."
),
"summary": "apple machine retrieval baseline",
"tags": ["search", "fixture", "apple-machine"],
"resolved_query": "apple machine retrieval baseline",
"key_facts": [
"apple and machine both appear multiple times in the content",
"document is intentionally concise to rank strongly",
],
"resolved_queries": ["apple machine", "deterministic retrieval baseline"],
},
{
"id": "orchard-ml-002",
"content": (
"machine learning notes for apple sorting lines and orchard quality control. "
"This memory mentions apple once and machine once with less dense overlap."
),
"summary": "machine learning for apple sorting",
"tags": ["orchard", "ml"],
"resolved_query": "apple sorting machine learning",
"key_facts": ["lower keyword density than orchard-ml-001"],
"resolved_queries": ["apple machine"],
},
{
"id": "orchard-ml-003",
"content": (
"machine maintenance checklist for conveyors, sensors, and cooling fans. "
"An apple crate jam triggered the maintenance runbook once."
),
"summary": "machine maintenance with one apple mention",
"tags": ["maintenance"],
"resolved_query": "machine maintenance apple crate",
"key_facts": ["contains both query terms but weaker topical focus"],
"resolved_queries": ["machine maintenance"],
},
{
"id": "orchard-ml-004",
"content": (
"apple orchard tasting notes and seasonal harvest planning without robotics keywords."
),
"summary": "apple only control document",
"tags": ["orchard", "taste"],
"resolved_query": "apple harvest planning",
"key_facts": ["control row should not satisfy the two-term boolean search"],
"resolved_queries": ["apple harvest"],
},
]

with tempfile.TemporaryDirectory(prefix="brainlayer-fixture-") as tmpdir:
db_path = Path(tmpdir) / "stale-index.db"
store = VectorStore(db_path)
try:
model = get_embedding_model()
encoder = model._load_model()
chunk_embeddings = encoder.encode(
[chunk["content"] for chunk in seed_chunks],
convert_to_numpy=True,
show_progress_bar=False,
).tolist()

cursor = store.conn.cursor()
for chunk, embedding in zip(seed_chunks, chunk_embeddings):
cursor.execute(
"""
INSERT INTO chunks (
id, content, metadata, source_file, project, content_type, value_type,
char_count, source, summary, tags, resolved_query, key_facts,
resolved_queries, created_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
chunk["id"],
chunk["content"],
"{}",
"tests/fixtures/stale_index_query.json",
"ecosystem-regression-harness",
"assistant_text",
"MEDIUM",
len(chunk["content"]),
"fixture-generator",
chunk["summary"],
json.dumps(chunk["tags"]),
chunk["resolved_query"],
json.dumps(chunk["key_facts"]),
json.dumps(chunk["resolved_queries"]),
"2026-04-27T00:00:00Z",
),
)
cursor.execute(
"INSERT INTO chunk_vectors (chunk_id, embedding) VALUES (?, ?)",
(chunk["id"], serialize_f32([float(value) for value in embedding])),
)

fts_sql = (
"SELECT chunk_id, bm25(chunks_fts) AS rank "
"FROM chunks_fts "
f"WHERE chunks_fts MATCH '{query_match}' "
"ORDER BY bm25(chunks_fts), chunk_id"
)
proc = subprocess.run(
[
"uvx",
"--from",
"sqlite-utils",
"sqlite-utils",
"query",
str(db_path),
fts_sql,
],
check=True,
capture_output=True,
text=True,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing commit before subprocess reads database rows

Medium Severity

After inserting rows via cursor.execute() on store.conn, the script calls subprocess.run with sqlite-utils to query the same database without first calling store.conn.commit(). Python's sqlite3 module uses implicit transactions by default, so the subprocess (which opens its own connection) may not see uncommitted rows, potentially producing an empty result for ranked_rows.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit df2d2d2. Configure here.

ranked_rows = json.loads(proc.stdout)

baseline_embedding = [float(value) for value in model.embed_query(sample_text)]

payload = {
"generated_at": datetime.now(timezone.utc).isoformat(),
"generator": "scripts/generate-fixtures.sh",
"sqlite_snapshot": {
"db_backend": "sqlite",
"fts_table": "chunks_fts",
"seed_chunk_count": len(seed_chunks),
"query_sql": fts_sql,
},
"embedding_model": {
"name": model.model_name,
"dimension": EMBEDDING_DIM,
"dtype": "float32",
},
"query": {
"match": query_match,
"expected_ids": [row["chunk_id"] for row in ranked_rows],
"baseline_rows": ranked_rows,
},
"sample_text": {
"text": sample_text,
"baseline_embedding": baseline_embedding,
"min_cosine_similarity": 0.999,
},
"chunks": [
{
**chunk,
"embedding": [float(value) for value in embedding],
}
for chunk, embedding in zip(seed_chunks, chunk_embeddings)
],
}
finally:
store.close()

fixture_path.write_text(json.dumps(payload, indent=2) + "\n")
print(f"Wrote fixture to {fixture_path}")
PY
6 changes: 6 additions & 0 deletions tests/fixtures/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Fixture Provenance

- `stale_index_query.json` is generated by `scripts/generate-fixtures.sh`.
- The generator creates a fresh temporary BrainLayer SQLite DB, seeds `chunks` plus `chunk_vectors`, and derives the FTS5 baseline with `sqlite-utils query`.
- Embedding baselines come from BrainLayer's configured query embedding model (`BAAI/bge-large-en-v1.5`, 1024-d float32 at generation time).
- Regenerate with: `scripts/generate-fixtures.sh`
Loading
Loading