Skip to content

feat: ground chat agent in a cached, auto-generated PolicyEngine API reference#36

Merged
vahid-ahmadi merged 5 commits intomainfrom
feat/cached-api-reference
May 6, 2026
Merged

feat: ground chat agent in a cached, auto-generated PolicyEngine API reference#36
vahid-ahmadi merged 5 commits intomainfrom
feat/cached-api-reference

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Contributor

@vahid-ahmadi vahid-ahmadi commented Apr 21, 2026

Summary

Replaces drift-prone hardcoded API examples in the system prompt with a freshly generated, prompt-cached reference extracted from the installed library.

Commit 1 — inject cached reference (2084bf5)

  • New backend/scripts/build_reference.py walks the installed policyengine_uk_compiled, dumps docstrings, capabilities() output, and the full Parameters JSON schema to backend/reference.md (~30k tokens).
  • backend/routes/chatbot.py loads it at import and sends it as a second cache_control=ephemeral system block alongside SYSTEM_PROMPT.
  • _select_chat_model counts the reference in the Haiku→Sonnet routing estimate.

Commit 2 — drop stale prompt blocks (0214aa4)

  • Deletes COMMON WORKFLOWS, MODELLING SCOPE, and DATASETS sections from SYSTEM_PROMPT — all hardcoded examples now covered by the reference.
  • Replaces them with a short paragraph pointing the agent at the attached reference.
  • Every behavioral guardrail preserved (always compute with Python, reproducibility rules, analytical notes, user-facing style).

Why

Claude's training-data memory of the PolicyEngine API is stale the moment the library ships a release. The hardcoded examples in SYSTEM_PROMPT drift the same way and need a human edit to sync. Together these two changes eliminate both drift sources: the version-sensitive facts move to a file regenerated from the live library, and the prompt keeps only the behavioral rules. Prompt caching keeps the per-turn cost of the ~30k-token reference at ~10% of sending it fresh every time.

Measured impact

Local smoke test on Haiku 4.5:

  • Fresh question in a new session → cache_creation_input_tokens ≈ 70k (reference + prompt + tools cached).
  • Follow-up within 5 min → cache_read_input_tokens ≈ 70k, cache_creation_input_tokens = 0.
  • Agent correctly enumerates datasets and programmes using the cached capabilities() snapshot without the deleted DATASETS / MODELLING SCOPE blocks.

Refreshing the reference

Re-run after any policyengine-uk-compiled bump:

docker-compose exec backend python scripts/build_reference.py

A follow-up PR should wire this into pr-beta-deploy.yml so it runs automatically on every library upgrade.

Test plan

  • docker-compose up → send one chat message → second message should show cache_read_input_tokens > 0
  • Ask "What datasets are available?" → answer lists FRS, EFRS, LCFS, SPI, WAS from the reference / live capabilities() call, not from the deleted DATASETS block
  • Ask "Raise the personal allowance to £15k and show the distributional effect" → agent constructs Parameters.model_validate({...}) correctly against the current schema without the hardcoded example
  • Regression: quantitative answers still compute via Python, not memory

🤖 Generated with Claude Code

Adds scripts/build_reference.py which walks the installed
policyengine_uk_compiled library and dumps docstrings, capabilities(),
and the full Parameters JSON schema to backend/reference.md (~30k
tokens). The chat loop sends it as a second cache_control=ephemeral
system block so the agent grounds code generation against the actually
installed API instead of training-data memory, at ~10% of the normal
input cost thanks to prompt caching.

Re-run scripts/build_reference.py after bumping policyengine-uk-compiled
to refresh the reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment May 6, 2026 10:57am

Request Review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

Beta preview has been cleaned up because this PR was closed.

Those sections were hardcoded examples that drift every time
policyengine-uk-compiled ships a release. The cached reference added in
#36 — docstrings, capabilities() snapshot, full Parameters JSON schema —
is already the authoritative source, and the agent reaches the same
answers by calling capabilities() dynamically. Keep only the behavioral
guardrails and point the model at the reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title feat: cached PolicyEngine API reference in chat system prompt feat: ground chat agent in a cached, auto-generated PolicyEngine API reference Apr 21, 2026
Copy link
Copy Markdown

@SakshiKekre SakshiKekre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea here. One thought I had is that right now the app is still consuming an already-generated reference.md. If the follow-up PR wires regeneration into the build / startup path so the backend always comes up serving a freshly generated reference, then I think this concern mostly goes away.

Otherwise, it feels like we may just be recreating the same drift issue in a different form: not stale handwritten prompt examples, but a stale generated reference artifact.
Looking forward to the follow-up PR for that piece!

One separate concern: the generator currently seems to include inherited/generic docs as well as real package docs. In the generated reference I can already see built-in container docstrings and large chunks of generic Pydantic ModelMetaclass boilerplate. That increases prompt size/cost and lowers signal.
Could we tighten the generator so it prefers actual PolicyEngine API surface and skips framework/builtin boilerplate?

- Skip re-exports from outside policyengine_uk_compiled (stdlib, pydantic).
- Drop inherited docstrings (Pydantic BaseModel boilerplate was emitted on
  every model, adding ~30 lines of noise per entry).
- For module-level data constants, emit the value repr instead of the
  container class's stdlib docstring (DATASETS now shows the actual tuple,
  HOUSEHOLD_DEFAULTS shows the actual dict).
- Regenerate reference.md during `docker build` so deployed images always
  ship a reference matching the installed library version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi
Copy link
Copy Markdown
Contributor Author

Thanks @SakshiKekre — both points fair, pushed 8fb456d on this branch:

Drift (build/startup regeneration). Added RUN python scripts/build_reference.py to backend/Dockerfile after the COPY / pip install steps. Every image build now regenerates reference.md against the installed policyengine-uk-compiled version, so production always ships a fresh reference. The checked-in reference.md is from here on just a dev-time artifact for reviewing generator output without a docker build.

Generator noise. Tightened build_reference.py:

  • _from_package() — classes/functions whose __module__ is outside policyengine_uk_compiled.* are skipped entirely (drops stdlib / pydantic re-exports).
  • _own_doc() — compares the object's docstring to pydantic.BaseModel.__doc__ and to each base class's doc; discards matches. This is what was emitting ~30 lines of ModelMetaclass boilerplate on every Pydantic model.
  • Module-level constants (DATASETS, HOUSEHOLD_DEFAULTS, BENUNIT_DEFAULTS, PERSON_DEFAULTS) now emit the value repr instead of tuple.__doc__ / dict.__doc__ — the actual contents are the signal.

Will share the post-rebuild size delta once the preview redeploys. Expected shape: the reference section drops from ~3100 lines to roughly a third of that, dominated now by real signatures + the Parameters JSON schema.

Copy link
Copy Markdown

@SakshiKekre SakshiKekre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up. I think this is moving in the right direction.
Question: Do we need to do this for Modal as well since our deployed backend path is Modal?

Also, the committed backend/reference.md still contains the builtin dict docs and Pydantic BaseModel boilerplate, not sure if the tightened generator was run before committing.

Two follow-ups to the second-round review:

1. Modal is the production deploy path, but its image only had
   `add_local_dir("backend", ...)` after pip_install — so it shipped
   whatever reference.md was checked into git, not a fresh build.
   Add `.run_commands("python scripts/build_reference.py")` after
   the local dir is copied so Modal regenerates against its own
   installed policyengine-uk-compiled, mirroring the Dockerfile.

2. Stop committing backend/reference.md. Both deploy paths now
   regenerate it at image build time, so the only role of a checked-in
   copy is to drift — which is exactly what the previous reviewer
   flagged (builtin dict / Pydantic ModelMetaclass docs surviving from
   the pre-tightened generator). Add it to .gitignore; local dev
   regenerates it via `docker-compose exec backend python scripts/build_reference.py`
   or running the script directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi
Copy link
Copy Markdown
Contributor Author

Thanks @SakshiKekre — both fair. Pushed 2d92139:

Modal regeneration. You're right, Modal was the gap. modal_app.py only had add_local_dir("backend", …), which copies the local checked-in tree into the image — so Modal was shipping the stale committed reference.md regardless of what the Dockerfile did. Added .run_commands("cd /app/backend && python scripts/build_reference.py") after the local-dir step, so Modal now regenerates against its own pip-installed policyengine-uk-compiled. The Dockerfile already did this; both deploy paths are now symmetrical.

Stale committed reference. You diagnosed it correctly — the committed file was generated before the tightened generator landed, which is why it still has the ModelMetaclass boilerplate. Rather than re-running and re-committing (which would just reset the drift timer), I removed backend/reference.md from git entirely and added it to .gitignore. Both deploy paths now build it at image-build time, so there's no production reason to ship a static copy, and no more drifting artifact for reviewers to spot. Local dev regenerates with docker-compose exec backend python scripts/build_reference.py or by running the script directly.

@vahid-ahmadi vahid-ahmadi merged commit 4519811 into main May 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants