feat: ground chat agent in a cached, auto-generated PolicyEngine API reference by vahid-ahmadi · Pull Request #36 · PolicyEngine/policyengine-uk-chat

vahid-ahmadi · 2026-04-21T09:28:39Z

Summary

Replaces drift-prone hardcoded API examples in the system prompt with a freshly generated, prompt-cached reference extracted from the installed library.

Commit 1 — inject cached reference (2084bf5)

New backend/scripts/build_reference.py walks the installed policyengine_uk_compiled, dumps docstrings, capabilities() output, and the full Parameters JSON schema to backend/reference.md (~30k tokens).
backend/routes/chatbot.py loads it at import and sends it as a second cache_control=ephemeral system block alongside SYSTEM_PROMPT.
_select_chat_model counts the reference in the Haiku→Sonnet routing estimate.

Commit 2 — drop stale prompt blocks (0214aa4)

Deletes COMMON WORKFLOWS, MODELLING SCOPE, and DATASETS sections from SYSTEM_PROMPT — all hardcoded examples now covered by the reference.
Replaces them with a short paragraph pointing the agent at the attached reference.
Every behavioral guardrail preserved (always compute with Python, reproducibility rules, analytical notes, user-facing style).

Why

Claude's training-data memory of the PolicyEngine API is stale the moment the library ships a release. The hardcoded examples in SYSTEM_PROMPT drift the same way and need a human edit to sync. Together these two changes eliminate both drift sources: the version-sensitive facts move to a file regenerated from the live library, and the prompt keeps only the behavioral rules. Prompt caching keeps the per-turn cost of the ~30k-token reference at ~10% of sending it fresh every time.

Measured impact

Local smoke test on Haiku 4.5:

Fresh question in a new session → cache_creation_input_tokens ≈ 70k (reference + prompt + tools cached).
Follow-up within 5 min → cache_read_input_tokens ≈ 70k, cache_creation_input_tokens = 0.
Agent correctly enumerates datasets and programmes using the cached capabilities() snapshot without the deleted DATASETS / MODELLING SCOPE blocks.

Refreshing the reference

Re-run after any policyengine-uk-compiled bump:

docker-compose exec backend python scripts/build_reference.py

A follow-up PR should wire this into pr-beta-deploy.yml so it runs automatically on every library upgrade.

Test plan

docker-compose up → send one chat message → second message should show cache_read_input_tokens > 0
Ask "What datasets are available?" → answer lists FRS, EFRS, LCFS, SPI, WAS from the reference / live capabilities() call, not from the deleted DATASETS block
Ask "Raise the personal allowance to £15k and show the distributional effect" → agent constructs Parameters.model_validate({...}) correctly against the current schema without the hardcoded example
Regression: quantitative answers still compute via Python, not memory

🤖 Generated with Claude Code

Adds scripts/build_reference.py which walks the installed policyengine_uk_compiled library and dumps docstrings, capabilities(), and the full Parameters JSON schema to backend/reference.md (~30k tokens). The chat loop sends it as a second cache_control=ephemeral system block so the agent grounds code generation against the actually installed API instead of training-data memory, at ~10% of the normal input cost thanks to prompt caching. Re-run scripts/build_reference.py after bumping policyengine-uk-compiled to refresh the reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-04-21T09:28:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-uk-chat	Ready	Preview, Comment	May 6, 2026 10:57am

github-actions · 2026-04-21T09:29:14Z

Beta preview has been cleaned up because this PR was closed.

Those sections were hardcoded examples that drift every time policyengine-uk-compiled ships a release. The cached reference added in #36 — docstrings, capabilities() snapshot, full Parameters JSON schema — is already the authoritative source, and the agent reaches the same answers by calling capabilities() dynamically. Keep only the behavioral guardrails and point the model at the reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SakshiKekre

I like the idea here. One thought I had is that right now the app is still consuming an already-generated reference.md. If the follow-up PR wires regeneration into the build / startup path so the backend always comes up serving a freshly generated reference, then I think this concern mostly goes away.

Otherwise, it feels like we may just be recreating the same drift issue in a different form: not stale handwritten prompt examples, but a stale generated reference artifact.
Looking forward to the follow-up PR for that piece!

One separate concern: the generator currently seems to include inherited/generic docs as well as real package docs. In the generated reference I can already see built-in container docstrings and large chunks of generic Pydantic ModelMetaclass boilerplate. That increases prompt size/cost and lowers signal.
Could we tighten the generator so it prefers actual PolicyEngine API surface and skips framework/builtin boilerplate?

- Skip re-exports from outside policyengine_uk_compiled (stdlib, pydantic). - Drop inherited docstrings (Pydantic BaseModel boilerplate was emitted on every model, adding ~30 lines of noise per entry). - For module-level data constants, emit the value repr instead of the container class's stdlib docstring (DATASETS now shows the actual tuple, HOUSEHOLD_DEFAULTS shows the actual dict). - Regenerate reference.md during `docker build` so deployed images always ship a reference matching the installed library version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vahid-ahmadi · 2026-04-24T12:23:19Z

Thanks @SakshiKekre — both points fair, pushed 8fb456d on this branch:

Drift (build/startup regeneration). Added RUN python scripts/build_reference.py to backend/Dockerfile after the COPY / pip install steps. Every image build now regenerates reference.md against the installed policyengine-uk-compiled version, so production always ships a fresh reference. The checked-in reference.md is from here on just a dev-time artifact for reviewing generator output without a docker build.

Generator noise. Tightened build_reference.py:

_from_package() — classes/functions whose __module__ is outside policyengine_uk_compiled.* are skipped entirely (drops stdlib / pydantic re-exports).
_own_doc() — compares the object's docstring to pydantic.BaseModel.__doc__ and to each base class's doc; discards matches. This is what was emitting ~30 lines of ModelMetaclass boilerplate on every Pydantic model.
Module-level constants (DATASETS, HOUSEHOLD_DEFAULTS, BENUNIT_DEFAULTS, PERSON_DEFAULTS) now emit the value repr instead of tuple.__doc__ / dict.__doc__ — the actual contents are the signal.

Will share the post-rebuild size delta once the preview redeploys. Expected shape: the reference section drops from ~3100 lines to roughly a third of that, dominated now by real signatures + the Parameters JSON schema.

SakshiKekre

Thanks for the follow-up. I think this is moving in the right direction.
Question: Do we need to do this for Modal as well since our deployed backend path is Modal?

Also, the committed backend/reference.md still contains the builtin dict docs and Pydantic BaseModel boilerplate, not sure if the tightened generator was run before committing.

Two follow-ups to the second-round review: 1. Modal is the production deploy path, but its image only had `add_local_dir("backend", ...)` after pip_install — so it shipped whatever reference.md was checked into git, not a fresh build. Add `.run_commands("python scripts/build_reference.py")` after the local dir is copied so Modal regenerates against its own installed policyengine-uk-compiled, mirroring the Dockerfile. 2. Stop committing backend/reference.md. Both deploy paths now regenerate it at image build time, so the only role of a checked-in copy is to drift — which is exactly what the previous reviewer flagged (builtin dict / Pydantic ModelMetaclass docs surviving from the pre-tightened generator). Add it to .gitignore; local dev regenerates it via `docker-compose exec backend python scripts/build_reference.py` or running the script directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vahid-ahmadi · 2026-05-05T08:57:52Z

Thanks @SakshiKekre — both fair. Pushed 2d92139:

Modal regeneration. You're right, Modal was the gap. modal_app.py only had add_local_dir("backend", …), which copies the local checked-in tree into the image — so Modal was shipping the stale committed reference.md regardless of what the Dockerfile did. Added .run_commands("cd /app/backend && python scripts/build_reference.py") after the local-dir step, so Modal now regenerates against its own pip-installed policyengine-uk-compiled. The Dockerfile already did this; both deploy paths are now symmetrical.

Stale committed reference. You diagnosed it correctly — the committed file was generated before the tightened generator landed, which is why it still has the ModelMetaclass boilerplate. Rather than re-running and re-committing (which would just reset the drift timer), I removed backend/reference.md from git entirely and added it to .gitignore. Both deploy paths now build it at image-build time, so there's no production reason to ship a static copy, and no more drifting artifact for reviewers to spot. Local dev regenerates with docker-compose exec backend python scripts/build_reference.py or by running the script directly.

# Conflicts: # backend/routes/chatbot.py

vahid-ahmadi requested a review from nikhilwoodruff April 21, 2026 09:29

vercel Bot deployed to Preview April 21, 2026 09:29 View deployment

vahid-ahmadi self-assigned this Apr 21, 2026

vahid-ahmadi mentioned this pull request Apr 21, 2026

chore: drop stale API/workflow/dataset blocks from system prompt #37

Closed

3 tasks

vercel Bot deployed to Preview April 21, 2026 09:37 View deployment

vahid-ahmadi changed the title ~~feat: cached PolicyEngine API reference in chat system prompt~~ feat: ground chat agent in a cached, auto-generated PolicyEngine API reference Apr 21, 2026

vahid-ahmadi requested a review from SakshiKekre April 21, 2026 09:41

SakshiKekre reviewed Apr 24, 2026

View reviewed changes

vercel Bot deployed to Preview April 24, 2026 12:23 View deployment

vahid-ahmadi requested a review from SakshiKekre April 24, 2026 12:24

SakshiKekre reviewed May 1, 2026

View reviewed changes

vercel Bot deployed to Preview May 5, 2026 08:58 View deployment

vahid-ahmadi requested a review from SakshiKekre May 5, 2026 09:01

SakshiKekre approved these changes May 6, 2026

View reviewed changes

Merge main into feat/cached-api-reference (resolve plan-mode conflict)

9f5854a

# Conflicts: # backend/routes/chatbot.py

vercel Bot deployed to Preview May 6, 2026 10:57 View deployment

vahid-ahmadi merged commit 4519811 into main May 6, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ground chat agent in a cached, auto-generated PolicyEngine API reference#36

feat: ground chat agent in a cached, auto-generated PolicyEngine API reference#36
vahid-ahmadi merged 5 commits intomainfrom
feat/cached-api-reference

vahid-ahmadi commented Apr 21, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

SakshiKekre left a comment

Uh oh!

vahid-ahmadi commented Apr 24, 2026

Uh oh!

SakshiKekre left a comment

Uh oh!

vahid-ahmadi commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vahid-ahmadi commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Measured impact

Refreshing the reference

Test plan

Uh oh!

vercel Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SakshiKekre left a comment

Choose a reason for hiding this comment

Uh oh!

vahid-ahmadi commented Apr 24, 2026

Uh oh!

SakshiKekre left a comment

Choose a reason for hiding this comment

Uh oh!

vahid-ahmadi commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vahid-ahmadi commented Apr 21, 2026 •

edited

Loading

vercel Bot commented Apr 21, 2026 •

edited

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading