Skip to content

fix: HttpLedgerBackend caches server's canonical model_hash#8

Merged
vigneshnarayanaswamy merged 2 commits intomainfrom
fix/http-backend-canonical-hash
Apr 17, 2026
Merged

fix: HttpLedgerBackend caches server's canonical model_hash#8
vigneshnarayanaswamy merged 2 commits intomainfrom
fix/http-backend-canonical-hash

Conversation

@vigneshnarayanaswamy
Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #7 addressing a P1 Codex review comment on the hash-to-name cache.

HttpLedgerBackend.save_model was caching the client-side model_hash from the incoming ModelRef, but ModelRef.model_hash is derived from created_at, which is set locally when the client calls Ledger.register() — and the server computes its own hash from its own created_at when processing /record. The two values diverge, so the cache ended up mapping a stale hash to the model name.

Downstream hash-based flows (get_model, list_snapshots, set_tag) would then either fail to resolve in a fresh backend instance or silently resolve to a ModelRef carrying the server's hash — inconsistent with what the caller just "saved."

Fix

  • Schema: RecordOutput gains a required model_hash field so the server communicates its canonical identity back to the caller.
  • Tool: record_fn populates model_hash from model.model_hash in both the registered and non-registration branches.
  • Backend: HttpLedgerBackend.save_model reads the server's hash from the response, reassigns it onto the incoming ModelRef (so callers who retain the reference see the authoritative identity), and caches only the server hash.

Tests

  • Updated existing TestRecordOutput fixtures to supply model_hash + assert the field is required in JSON Schema.
  • New TestSaveModelCanonicalHash suite covers:
    • save_model mutates the incoming ref to the server's hash (client hash differs intentionally, using a stale created_at)
    • The cache contains only the server hash; the stale client hash is absent
    • A fresh HttpLedgerBackend instance (empty cache) can still tag by name and list the tag back — the exact round-trip scenario Codex described

Test plan

  • .venv/bin/python -m pytest tests/697 pass, 4 skip (+4 new tests, +1 required-field assertion)
  • .venv/bin/python -m ruff check src/ tests/ — clean
  • .venv/bin/python -m mypy src/ — clean

🤖 Generated with Claude Code

Codex review on PR #7 (#7)
flagged that save_model was caching the client-side model_hash from
the incoming ModelRef, but the server computes its own hash from a
fresh created_at. The client hash and server hash therefore diverge,
and the hash-to-name cache would map a stale client hash to the name.

Any downstream hash-based flow (get_model, list_snapshots, set_tag)
would either fail to resolve in a fresh backend instance or silently
resolve to a ModelRef carrying the server's hash, making returned
model_hash / snapshot_hash values inconsistent with what the caller
just "saved."

Fix:
- RecordOutput now carries `model_hash` so /record returns the
  server's canonical identity
- HttpLedgerBackend.save_model reads the server's hash from the
  response, reassigns it onto the incoming ModelRef in place (so
  retained references see the authoritative identity), and caches
  only the server hash

Tests:
- Schema tests updated to supply model_hash in RecordOutput
  fixtures + assertion that the field is required
- TestSaveModelCanonicalHash suite covers ref mutation, cache
  contents, and the fresh-backend tag round-trip scenario Codex
  described

697 pass, 4 skip; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes HttpLedgerBackend’s hash→name cache by ensuring it stores and propagates the server-canonical model_hash returned from /record, avoiding stale client-side hashes derived from local created_at.

Changes:

  • Add required model_hash to RecordOutput and populate it in the record tool.
  • Update HttpLedgerBackend.save_model() to adopt the server-returned model_hash and cache only that value.
  • Extend tests to assert model_hash is required in schema and to cover canonical-hash reconciliation and cache behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/model_ledger/tools/schemas.py Adds required RecordOutput.model_hash and documents canonical server identity semantics.
src/model_ledger/tools/record.py Populates model_hash in RecordOutput for both registration and non-registration paths.
src/model_ledger/backends/http.py Updates save_model() to read server model_hash, mutate the passed ModelRef, and cache server hash.
tests/test_tools/test_schemas.py Updates fixtures and asserts model_hash is required in JSON Schema.
tests/test_backends/test_http_backend.py Adds coverage for canonical-hash mutation, cache contents, and fresh-backend tagging round-trip.
CHANGELOG.md Documents the new model_hash field and the canonical-hash caching fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/model_ledger/backends/http.py Outdated
Comment thread src/model_ledger/backends/http.py Outdated
Comment thread tests/test_backends/test_http_backend.py
Addresses all 3 review comments on PR #8 from copilot-pull-request-reviewer:

1. save_model now calls resp.raise_for_status() before mutating
   cache state, so a 4xx/5xx response no longer leaves the
   hash-to-name cache pointing at a model that was never persisted.

2. save_model now raises ValueError when a 2xx response lacks a
   valid model_hash string, instead of silently falling back to the
   client-computed hash. That fallback would reintroduce the exact
   stale-hash bug this PR was fixing, which could trigger during a
   client/server version mismatch where the server predates the
   required RecordOutput.model_hash field.

3. test_tag_flow_survives_fresh_backend_instance now asserts
   model is not None before dereferencing, for a clearer failure
   message.

Added defensive-path coverage:
- test_http_error_raises_and_does_not_cache — 500 response raises
  HTTPStatusError and leaves the cache empty
- test_success_without_model_hash_raises — 200 response missing
  model_hash raises ValueError and leaves the cache empty

Both use httpx.MockTransport to exercise the failure paths without
needing a live server.

699 pass, 4 skip; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vigneshnarayanaswamy
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@vigneshnarayanaswamy vigneshnarayanaswamy merged commit cf9ab50 into main Apr 17, 2026
5 checks passed
@vigneshnarayanaswamy vigneshnarayanaswamy deleted the fix/http-backend-canonical-hash branch April 17, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants