Harden Gemini 404 model fallback and persist pipeline output incrementally#58
Conversation
…entally Agent-Logs-Url: https://github.com/davidamitchell/Latest-developments-/sessions/6ee60102-2fec-425c-a800-f640126e4124 Co-authored-by: davidamitchell <635733+davidamitchell@users.noreply.github.com>
Agent-Logs-Url: https://github.com/davidamitchell/Latest-developments-/sessions/6ee60102-2fec-425c-a800-f640126e4124 Co-authored-by: davidamitchell <635733+davidamitchell@users.noreply.github.com>
davidamitchell
left a comment
There was a problem hiding this comment.
Code Review — PR #58: Harden Gemini 404 model fallback + incremental persistence
The intent is sound and the three changes work together coherently. There are, however, a few correctness and resilience gaps that should be addressed before merge.
Overview
Three independent concerns addressed in one PR:
is_model_not_found_error()extended to detect text-payload 404sprocess()gains a per-item callback for incremental persistence- Workflow commit step now runs with
if: always()
Each is the right call. The issues below are in the execution, not the direction.
Issues requiring changes
1. "not_found" text match is too broad — false-positive risk (_quota.py)
text = str(exc).lower()
return (
"not_found" in text # ← this matches ANY error containing "not_found"
or ...
)"not_found" will match quota errors, transport errors, or any SDK error whose message happens to include that string. A false positive here advances the model cascade when it should not — that is a silent correctness failure, not a logged one.
The more specific patterns below it ("model is not found", "models/" + "not found") already cover the real Gemini model-not-found shape. The broad "not_found" check should be tightened to something like "not_found" in text and "model" in text, or removed in favour of the already-present specific patterns.
The '"code": 404' / "'code': 404" checks have the same issue — they match any error whose serialised payload contains a 404 code, not necessarily a model-not-found 404. At minimum add "model" in text as a guard.
2. Callback exception is unhandled — a write failure aborts the whole batch (run.py)
results.append(processed)
if on_item_processed is not None:
on_item_processed(processed) # raises → kills the loopIf _merge_and_write fails (disk full, permissions, etc.) the exception propagates and kills the remaining items in the batch. This contradicts the project rule: "Fetcher failures for a single source must not abort the entire run." The same resilience should apply here — catch, log at WARNING, and continue.
if on_item_processed is not None:
try:
on_item_processed(processed)
except Exception as exc: # noqa: BLE001 — callback is caller-supplied; must not abort batch
logger.warning("on_item_processed callback failed for item %s: %s", processed.id, exc)3. Split imports from the same module will fail make check (run.py)
from src.pipeline._gemini import (_ModelCascade,)
from src.pipeline._gemini import (make_gemini_client as _make_gemini_client,)
from src.pipeline._gemini import (resolve_cascade as _resolve_cascade,)Ruff collapses repeated imports from the same module into one block. This will be rejected by make check. Consolidate into a single from src.pipeline._gemini import (...).
4. Workflow push step not verified for if: always() (checklist item 6)
The diff only shows the commit step gaining if: always(). Per the workflow validation checklist in the project instructions:
If a Python script commits via subprocess and a later step is the only place that pushes: the later step must always push — otherwise locally-committed-but-not-pushed commits are silently discarded when the job ends.
The push step must also carry if: always() (or if: always() && steps.<commit-step>.outcome != 'skipped'). If it doesn't, partial commits made after a process failure will be lost when the job exits — exactly the scenario this PR is trying to fix.
Moderate concerns
5. Final _merge_and_write after the loop is redundant (run.py)
# In main():
on_item_processed=lambda item: _merge_and_write(out_path, [item]), # writes N times
...
merged = _merge_and_write(out_path, processed) # then writes all N againIf the callback ran for every item, the final call re-reads the file, re-merges all items that are already present, and re-writes. This is O(n) redundant I/O per run and will compound on large batches. Either skip the final call when a callback is set, or make the callback accumulate in memory and only flush at the end. The current behaviour is correct but wasteful.
Minor / test coverage
6. Test partitions are thin (test_pipeline_quota.py)
Per the project TDD mandate — "every function with conditional logic requires partitions for: typical value, boundary, empty/zero, and invalid/malformed" — the two tests cover only code == 404 (attribute) and one text-payload format. Missing partitions:
status_code == 404attribute path- Negative case: a quota-exhausted 429 must not be detected as model-not-found
- Negative case: a generic transport error must not be detected as model-not-found
- Each text-pattern variant individually (
"model is not found",models/...not found, dict-serialised"code": 404)
The negative cases are especially important given the false-positive risk in issue #1 above.
7. Callback test missing the None and raise paths (test_pipeline_run.py)
The new test confirms the happy path. Two additional partitions are needed once issue #2 is fixed:
on_item_processed=None— must not raise- Callback that raises — must log and continue, not abort the batch
What's good
- The text-based fallback detection is the right approach — SDK error shapes vary and relying solely on
.code/.status_codehas already proven fragile. Callable[[ProcessedItem], None] | Nonetyping is clean and correct.- CHANGELOG, PROGRESS, and Mini-Retro all updated per the project mandates — good hygiene.
- The workflow
if: always()fix is the correct lever for commit-on-failure.
Summary
| Issue | Severity |
|---|---|
"not_found" / '"code": 404' false-positive risk |
High |
| Callback exception unhandled — batch abort on write failure | High |
Split imports fail make check |
High |
Push step not verified for if: always() |
High |
Redundant final _merge_and_write |
Moderate |
| Thin test partitions (missing negatives) | Moderate |
Please address issues 1–4 before merge; 5–6 are strongly recommended.
Generated by Claude Code
Agent-Logs-Url: https://github.com/davidamitchell/Latest-developments-/sessions/4db7f054-7de2-4c60-a3c1-1d7429327fc9 Co-authored-by: davidamitchell <635733+davidamitchell@users.noreply.github.com>
Addressed in commit a8197bd. Changes made:
|
Ruff flagged the I001 import-sort issue and its formatter wrapped four lines that exceeded the 100-char project limit. No logic changes. https://claude.ai/code/session_01QVcqWDPsVT7EJ6LYF6bzzq
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
davidamitchell
left a comment
There was a problem hiding this comment.
Follow-up: Logic-against-intention verification + ruff fix
Checked out the branch, read the actual files (which include a8197bd — the review-feedback commit), and traced each change against its stated intent.
Fix applied: ruff line-length (commit 5733819)
ruff check --fix && ruff format wrapped four lines that exceeded the 100-char project limit inside run.py. No logic changes. All 9 test_pipeline_quota.py tests pass. The test_pipeline_run.py failures are pre-existing baseline issues (httpx, google.genai not installed in this environment) — confirmed unrelated to this PR.
Logic trace: is_model_not_found_error() ✅
has_model_hint = "model" in text or "models/" in text
return (
("not_found" in text and has_model_hint) # A: gRPC NOT_FOUND + model mention
or "model is not found" in text # B: natural-language phrase
or ("models/" in text and "not found" in text) # C: model path format
or (...'"code": 404'... and has_model_hint) # D: JSON payload + model mention
)Each pattern covers a distinct Gemini error shape:
- A catches the gRPC
NOT_FOUNDstatus string (lowercasednot_foundwith underscore, distinct from natural-language "not found" with a space).has_model_hintprevents matching generic resource-not-found errors. - B catches natural-language phrases like "model is not found" — not redundant with A because "not found" (space) ≠ "not_found" (underscore).
- C catches the Gemini model-path format
models/xxx is not found— specific and correct. - D catches serialised error payloads where the SDK stringifies the full JSON response into the exception message. Guarded by
has_model_hint.
One residual edge case: an error like "API config for model 'x' returned code: 404" could trigger D. Theoretical false positive; in practice Gemini SDK only surfaces model-ID 404s through this path. Acceptable.
Logic trace: on_item_processed callback resilience ✅
In process(): item is appended to results before the callback fires — it is never lost from the in-memory batch even if the callback raises. The exception is caught, logged at WARNING, and swallowed — the loop continues to the next item.
In main() — the _persist_item/callback_failed design: the _persist_item closure sets callback_failed = True then re-raises; process() catches and swallows that raise. callback_failed bridges the swallowed exception back to main() to trigger the fallback _merge_and_write. Three scenarios all handled correctly:
| Scenario | callback_failed | how merged |
|---|---|---|
| All writes succeed | False | read_processed_jsonl — no extra I/O |
| Write fails for item N, recovers later | True | _merge_and_write with full in-memory batch |
| All writes fail | True | _merge_and_write with full in-memory batch |
Every processed item ends up on disk in all cases. The flag elegantly avoids a redundant write on the happy path — the concern about redundant I/O from the initial review is resolved.
Logic trace: workflow if: always() ✅
git push origin HEAD was already outside the if git diff --cached --quiet; fi block before this PR — it always pushes unconditionally. The PR adds if: always() to the step itself so it runs even when the process job fails. Together they satisfy workflow checklist item 6. My original concern that the push might be missing if: always() was unfounded — the push was already unconditional inside the script.
All original issues resolved
| Issue | Status |
|---|---|
"not_found" / '"code": 404' false-positive risk |
Fixed — has_model_hint guard (a8197bd) |
| Callback exception unhandled — batch abort on write failure | Fixed — try/except in process() + callback_failed fallback (a8197bd) |
| Split imports failing ruff | Fixed — ruff auto-format, one-alias-per-block is its canonical form (5733819) |
Push step if: always() not verified |
Non-issue — push was already unconditional |
Redundant final _merge_and_write |
Resolved — callback_failed skips it on the happy path |
| Thin test partitions (missing negatives) | Fixed — 6 tests covering positives, negatives, boundary; None and raise paths for callback (a8197bd) |
Logic matches intention across all three stated goals. Ready to merge.
Generated by Claude Code
Pipeline failures were intermittently caused by Gemini
404 NOT_FOUNDmodel errors that were not always classified as model-not-found, so cascade fallback did not reliably advance. The processing flow also behaved too all-or-nothing under interruption, increasing reprocessing risk.Gemini 404 handling
_quota.is_model_not_found_error()to detect model-not-found from textual error payloads (NOT_FOUND, embeddedcode: 404, model-path not-found strings), not only structuredcode/status_codeattributes.Incremental processed-data persistence
process(...)and wiredmain()to merge/write each completedProcessedItemimmediately.WARNINGand processing continues, so persistence errors do not abort the batch.Workflow commit behavior on failure paths
fetch-and-process.ymlso the processed-data commit step runs withif: always().git push origin HEADalways executes in that step, so previously created local commits are not stranded when the run fails.Regression coverage
code == 404andstatus_code == 404not_foundtext)on_item_processed=None