agent(prompts): move per-step metadata out of <agent_state> into a tail block#4891
Merged
Merged
Conversation
…il block The step counter (Step N maximum:M) and datetime.now() were rendered inside <agent_state>, ahead of <browser_state> in the user message. The cache miss already happens at the <agent_state> boundary today, so this isn't a live cache regression — but the layout meant that any future move of more-stable agent_state fields into the system prompt would still leave per-step varying bytes in the middle of the prefix, silently capping how far the cache could extend. Pull both fields into a new _get_step_meta_description() and append it at the very end of get_user_message(), after <agent_state>, <browser_state>, <read_state>, <page_specific_actions>, and unavailable- skills info. Everything above this tail block is now eligible to be treated as the cacheable region. Adds regression tests that lock the layout: - <step_info> must appear after <agent_state> and <browser_state> - <step_info> must not leak back into <agent_state> - bytes before <step_info> must be identical across two different step numbers (the step counter must not be in the prefix)
Agent Task Evaluation Results: 2/2 (100%)View detailed results
Check the evaluate-tasks job for detailed task execution logs. |
r266-tech
pushed a commit
to r266-tech/browser-use
that referenced
this pull request
May 26, 2026
…il block (browser-use#4891) ## Context Closes part of browser-use#4887 (item browser-use#3 — strip per-step metadata from anything prefix-stable). `AgentMessagePrompt._get_agent_state_description()` was rendering two per-step-varying values inside `<agent_state>`: - `Step{N+1} maximum:{M}` — changes every step. - `datetime.now().strftime('%Y-%m-%d')` — changes daily. The user message currently looks like: ``` <agent_history>...</agent_history> ← grows append-only (prefix-stable if HistoryItem is stable) <agent_state>...<step_info>...</...> </agent_state> ← cache miss starts here today <browser_state>...</browser_state> <read_state>...</read_state> ``` So the cache boundary already lands at `<agent_state>` and the step counter inside it isn't actively bursting a live cache. **But**: the layout meant that any future move of more-stable `<agent_state>` fields (user_request, file_system, todo_contents) into the system prompt — or anywhere we'd want to cache them — would still leave per-step varying bytes sitting inside the would-be prefix. That silently caps how far the cache can ever extend. ## Change Pull the step counter + date into a new helper `_get_step_meta_description()` and append it at the very tail of `get_user_message()`, after `<agent_state>`, `<browser_state>`, `<read_state>`, `<page_specific_actions>`, and the unavailable-skills info block. The new layout: ``` <agent_history>...</agent_history> <agent_state>...</agent_state> ← no more <step_info> inside <browser_state>...</browser_state> <read_state>...</read_state> <page_specific_actions>...</page_specific_actions> [unavailable_skills_info] <step_info>Step{N} maximum:{M}\nToday:{YYYY-MM-DD}</step_info> ← suffix, explicitly per-step ``` Everything above `<step_info>` is now eligible to be treated as the cacheable region — when/if we want to push that boundary further out, no per-step varying bytes are in the way. ## Tests New regression tests at `tests/ci/test_prompt_step_meta_suffix.py`: - `<step_info>` appears after both `<agent_state>` and `<browser_state>`. - `<step_info>` does not leak back into `<agent_state>`. - Bytes before `<step_info>` are byte-identical across two different step numbers (proves the step counter isn't in the prefix). - `<agent_state>` block is byte-identical across step numbers. ## Test plan - [x] New tests pass. - [x] Existing prompt / message_manager tests still pass (`pytest tests/ci -k 'prompt or message_manager or agent_message'`). - [x] pyright + ruff clean via pre-commit. - [ ] Eyeball one real agent loop to confirm the model still parses `<step_info>` correctly at the tail (no expected change in behavior — the LLM doesn't care about position). <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Moved per-step metadata (step counter and date) out of `<agent_state>` into a trailing `<step_info>` block so the user-message prefix is stable for caching. Preps the prompt layout for deeper caching and covers part of browser-use#4887. - **Refactors** - Added `_get_step_meta_description()` and append it at the end of `get_user_message()` after agent, browser, read, page actions, and unavailable-skills blocks. - Removed per-step `<step_info>` from `<agent_state>` so all bytes before `<step_info>` are stable across steps. - Added tests to lock ordering, prevent leakage into `<agent_state>`, and verify a byte-identical prefix and `<agent_state>` across step numbers. <sup>Written for commit b06b47a. Summary will update on new commits. <a href="https://cubic.dev/pr/browser-use/browser-use/pull/4891?utm_source=github">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Closes part of #4887 (item #3 — strip per-step metadata from anything prefix-stable).
AgentMessagePrompt._get_agent_state_description()was rendering two per-step-varying values inside<agent_state>:Step{N+1} maximum:{M}— changes every step.datetime.now().strftime('%Y-%m-%d')— changes daily.The user message currently looks like:
So the cache boundary already lands at
<agent_state>and the step counter inside it isn't actively bursting a live cache. But: the layout meant that any future move of more-stable<agent_state>fields (user_request, file_system, todo_contents) into the system prompt — or anywhere we'd want to cache them — would still leave per-step varying bytes sitting inside the would-be prefix. That silently caps how far the cache can ever extend.Change
Pull the step counter + date into a new helper
_get_step_meta_description()and append it at the very tail ofget_user_message(), after<agent_state>,<browser_state>,<read_state>,<page_specific_actions>, and the unavailable-skills info block. The new layout:Everything above
<step_info>is now eligible to be treated as the cacheable region — when/if we want to push that boundary further out, no per-step varying bytes are in the way.Tests
New regression tests at
tests/ci/test_prompt_step_meta_suffix.py:<step_info>appears after both<agent_state>and<browser_state>.<step_info>does not leak back into<agent_state>.<step_info>are byte-identical across two different step numbers (proves the step counter isn't in the prefix).<agent_state>block is byte-identical across step numbers.Test plan
pytest tests/ci -k 'prompt or message_manager or agent_message').<step_info>correctly at the tail (no expected change in behavior — the LLM doesn't care about position).Summary by cubic
Moved per-step metadata (step counter and date) out of
<agent_state>into a trailing<step_info>block so the user-message prefix is stable for caching. Preps the prompt layout for deeper caching and covers part of #4887._get_step_meta_description()and append it at the end ofget_user_message()after agent, browser, read, page actions, and unavailable-skills blocks.<step_info>from<agent_state>so all bytes before<step_info>are stable across steps.<agent_state>, and verify a byte-identical prefix and<agent_state>across step numbers.Written for commit b06b47a. Summary will update on new commits. Review in cubic