Skip to content

Commit 1d355cd

Browse files
cursoragentEtanHey
andcommitted
fix(tests): remove -x flag from pytest to allow full test suite execution
Bugbot review found that the -x flag (exit on first failure) contradicts the script's design goal to 'never short-circuit on the first failing command'. This change ensures all test phases (unit suite, MCP tool registration, bun tests) run regardless of individual failures, with exit codes properly aggregated via bitwise OR. Co-authored-by: Etan Heyman <EtanHey@users.noreply.github.com>
1 parent ca19893 commit 1d355cd

2 files changed

Lines changed: 161 additions & 36 deletions

File tree

BUGBOT_REVIEW.md

Lines changed: 160 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,197 @@
1-
# Bugbot Review: PR feat/p2b-fixture-corpus
1+
# Bugbot Review: PR feat/p4b-run-tests-orchestrator
2+
3+
**Date**: 2026-04-27
4+
**Reviewer**: @bugbot
5+
**Status**: ⚠️ ISSUES FOUND
6+
7+
---
28

39
## Summary
4-
This PR adds a comprehensive test fixture for FTS5 ranking regression detection. The hermetic test fix addresses a **critical network dependency issue** found by @codex.
10+
11+
This PR adds `scripts/run_tests.sh` as a cross-language test orchestrator. The implementation is mostly solid, but I've identified **3 bugs** and **2 potential issues** that should be addressed.
512

613
---
714

8-
## 🐛 Critical Bugs Fixed
15+
## 🐛 Critical Issues
16+
17+
### 1. **pytest `-x` flag conflicts with "never short-circuit" design goal**
18+
19+
**Location**: `scripts/run_tests.sh:46`
20+
21+
```46:46:scripts/run_tests.sh
22+
run_step "pytest unit suite" run_pytest "$TEST_ROOT/" -v --tb=short -m "not integration" -x
23+
```
24+
25+
**Problem**: The `-x` flag makes pytest exit on first failure, contradicting the PR description's promise to "never short-circuit on the first failing command."
926

10-
### 1. **Network Dependency in FTS Test (CRITICAL)**
11-
**Severity: HIGH** ⚠️ **Found by @codex**
12-
**Location**: `tests/stale_index_query.test.ts`
27+
**Impact**: If the first test fails, pytest will stop immediately, and the remaining test phases (MCP tool registration, bun tests) will never run. The exit code aggregation works correctly, but you're not running all tests.
1328

14-
**Issue**: The test originally used `uvx --from sqlite-utils` which requires network access to PyPI on every run, breaking in offline/CI environments.
29+
**Evidence**: The PR description states:
30+
> aggregate failures with bitwise OR and never short-circuit on the first failing command
1531
16-
**Fix**: ✅ **FIXED** - Refactored to use native Bun SQLite (`db.query()`) for FTS assertions. The FTS ranking test is now fully hermetic. The embedding drift test still legitimately requires `uv run python3` for the embedding model.
32+
But pytest with `-x` means "stop on first failure."
33+
34+
**Fix**: Remove the `-x` flag from line 46.
1735

1836
---
1937

20-
### 2. **Missing Dependency Check**
21-
**Severity: MEDIUM**
22-
**Location**: `scripts/generate-fixtures.sh`
38+
### 2. **Missing file reference in line 49**
39+
40+
**Location**: `scripts/run_tests.sh:49`
41+
42+
```47:49:scripts/run_tests.sh
43+
run_step \
44+
"pytest MCP tool registration" \
45+
run_pytest "$TEST_ROOT/test_think_recall_integration.py::TestMCPToolCount" -v --tb=short
46+
```
47+
48+
**Problem**: The script hardcodes a reference to `test_think_recall_integration.py::TestMCPToolCount`, but this file exists and the test class exists. However, there's no validation that this file/test actually exists before attempting to run it.
2349

24-
**Issue**: Script assumes `uvx` is available without checking.
50+
**Impact**: If someone renames or removes this test, the script will fail with a confusing error (pytest collection error) rather than a clear message.
51+
52+
**Risk Level**: Medium - This is a fragile dependency. The file exists now, but the script should be more defensive.
53+
54+
**Recommendation**: Either:
55+
- Add a comment explaining why this specific test is important, OR
56+
- Add a check to see if the file exists before running it, OR
57+
- Make this configurable via an environment variable
58+
59+
---
2560

26-
**Fix**: ✅ **FIXED** - Added preflight check with helpful error message.
61+
### 3. **TypeScript test depends on `uv` and `uvx` which may not be installed**
62+
63+
**Location**: `tests/stale_index_query.test.ts:100, 115`
64+
65+
```100:109:tests/stale_index_query.test.ts
66+
"uvx",
67+
"--from",
68+
"sqlite-utils",
69+
"sqlite-utils",
70+
"query",
71+
sqlitePath,
72+
`SELECT chunk_id FROM chunks_fts WHERE chunks_fts MATCH '${fixture.query.match}' ORDER BY bm25(chunks_fts), chunk_id`,
73+
],
74+
repoRoot,
75+
);
76+
```
77+
78+
```113:126:tests/stale_index_query.test.ts
79+
const liveEmbeddingJson = runCommand(
80+
[
81+
"uv",
82+
"run",
83+
"python3",
84+
"-c",
85+
[
86+
"import json",
87+
"from brainlayer.embeddings import get_embedding_model",
88+
`print(json.dumps(get_embedding_model().embed_query(${JSON.stringify(fixture.sample_text.text)})))`,
89+
].join("; "),
90+
],
91+
repoRoot,
92+
);
93+
```
94+
95+
**Problem**: The TypeScript test file calls `uvx` and `uv run python3` directly, but:
96+
1. The orchestrator script respects `BRAINLAYER_USE_UV` env var (can be set to 0)
97+
2. `uv` may not be installed in the environment
98+
3. The test will fail with a confusing error if `uv` is missing
99+
100+
**Impact**: The bun test suite will fail in environments without `uv`, even though the orchestrator script has a fallback for pytest.
101+
102+
**Fix**: The TypeScript test should either:
103+
- Check for `uv` availability and skip if missing
104+
- Use the same fallback logic as the bash script
105+
- Document this requirement clearly
27106

28107
---
29108

30-
### 3. **Missing `.gitattributes`**
31-
**Severity: LOW**
109+
## ⚠️ Potential Issues
32110

33-
**Fix**: ✅ **FIXED** - Added entry to mark fixtures as linguist-generated for better GitHub stats.
111+
### 4. **Race condition potential with process substitution**
112+
113+
**Location**: `scripts/run_tests.sh:52-54`
114+
115+
```52:54:scripts/run_tests.sh
116+
while IFS= read -r test_file; do
117+
bun_tests+=("$test_file")
118+
done < <(collect_bun_tests)
119+
```
120+
121+
**Analysis**: Process substitution with `< <()` is generally safe in bash, but can be subtle in some edge cases. This code is correct, but consider using a simpler pattern:
122+
123+
```bash
124+
bun_tests=()
125+
if [ ! -d "$TEST_ROOT" ]; then
126+
: # no tests
127+
else
128+
mapfile -t bun_tests < <(find "$TEST_ROOT" -type f -name "*.test.ts" | sort)
129+
fi
130+
```
131+
132+
**Risk Level**: Low - Current code works, but the suggested refactor is cleaner.
34133

35134
---
36135

37-
## ⚠️ Design Notes
136+
### 5. **No validation of `$ROOT_DIR` resolution**
137+
138+
**Location**: `scripts/run_tests.sh:5`
139+
140+
```5:5:scripts/run_tests.sh
141+
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
142+
```
143+
144+
**Analysis**: If `cd` fails (e.g., permissions issue), `pwd` will output the current directory instead of failing. This could lead to tests running in the wrong location.
38145

39-
### Cosine Similarity Threshold
40-
The fixture uses 0.999 threshold. Consider relaxing to 0.995 if you see false positives across different CPU architectures (x86 vs ARM) or BLAS implementations.
146+
**Risk Level**: Very Low - This is an edge case and the script would likely fail fast anyway.
41147

42-
### Test Timeout
43-
Current 120s timeout may be tight on slow CI. Consider making it configurable via env var if needed.
148+
**Recommendation**: Add `set -e` at the top or check the result:
149+
```bash
150+
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" || exit 1
151+
```
44152

45153
---
46154

47-
## ✅ What's Excellent
155+
## ✅ What Works Well
48156

49-
1. **Hermetic FTS Test**: Core ranking assertions now require zero network access
50-
2. **Fixture Provenance**: README and embedded metadata make regeneration transparent
51-
3. **Dual Language Coverage**: Bun + pytest ensure the fixture works across ecosystems
52-
4. **Control Document**: `orchard-ml-004` is a proper negative control
157+
1. **Exit code aggregation**: The bitwise OR logic is correct and well-tested
158+
2. **Test coverage**: The contract tests properly validate the core behavior
159+
3. **Conditional bun execution**: Properly skips bun tests when none exist
160+
4. **Environment variable support**: Good use of `BRAINLAYER_TEST_ROOT` and `BRAINLAYER_USE_UV`
161+
5. **Clear output**: The step labels and PASS/FAIL messages are helpful
53162

54163
---
55164

56-
## Test Results
165+
## 📋 Test Results
57166

58-
-**pytest**: `tests/test_stale_index_fixture.py` passes
59-
-**Bun FTS test**: Now hermetic (no network required)
60-
- ⚠️ **Bun embedding test**: Requires `uv` (acceptable - needs embedding model)
167+
I ran the following validations:
168+
169+
- ✅ Bash syntax check: `bash -n scripts/run_tests.sh` → PASS
170+
- ✅ Python linting: `ruff check tests/test_run_tests_script.py` → PASS
171+
- ✅ Contract tests: `pytest tests/test_run_tests_script.py -v` → 2 PASSED
172+
173+
---
174+
175+
## 🔧 Recommended Fixes
176+
177+
### High Priority
178+
1. **Remove the `-x` flag** from line 46 to match the PR's design goal
179+
2. **Add documentation** or validation for the hardcoded `test_think_recall_integration.py` reference
180+
181+
### Medium Priority
182+
3. **Add `uv` dependency checking** to `stale_index_query.test.ts` or document the requirement
183+
184+
### Low Priority
185+
4. Consider adding `set -e` or explicit error handling for `ROOT_DIR` resolution
61186

62187
---
63188

64-
## Verdict
189+
## 🎯 Verdict
190+
191+
The orchestrator script is well-designed and the contract tests demonstrate good engineering practices. However, the **`-x` flag is a clear bug** that contradicts the stated goal of running all test phases regardless of failures.
65192

66-
**✅ READY TO MERGE** - Critical network dependency fixed, test infrastructure is solid.
193+
The PR should not be merged until issue #1 is fixed.
67194

68195
---
69196

70-
**Reviewed by**: Bugbot (Claude Agent) + @codex
71-
**Date**: 2026-04-27
72-
**Commits**: ea23dcf (+ bugbot fixes)
197+
**Signed**: Bugbot 🤖

scripts/run_tests.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ run_pytest() {
4343

4444
cd "$ROOT_DIR"
4545

46-
run_step "pytest unit suite" run_pytest "$TEST_ROOT/" -v --tb=short -m "not integration" -x
46+
run_step "pytest unit suite" run_pytest "$TEST_ROOT/" -v --tb=short -m "not integration"
4747
run_step \
4848
"pytest MCP tool registration" \
4949
run_pytest "$TEST_ROOT/test_think_recall_integration.py::TestMCPToolCount" -v --tb=short

0 commit comments

Comments
 (0)