Step 3: extract-structured-data + capability dispatch refactor#4
Merged
rockfordlhotka merged 2 commits intomainfrom Apr 21, 2026
Merged
Step 3: extract-structured-data + capability dispatch refactor#4rockfordlhotka merged 2 commits intomainfrom
rockfordlhotka merged 2 commits intomainfrom
Conversation
…atch refactor
Adds a second A2A skill that navigates to a URL with Playwright, pulls an
aria-snapshot of the page, and asks an LLM to extract fields matching a
natural-language description. Also refactors the single-handler shape into a
per-capability dispatch so growing the capability surface doesn't bloat the
dispatcher. Milestone 3 from the project spec (§9.1).
- ICapability interface: each capability owns its AgentSkill metadata and its
own ExecuteAsync. FetchPageTitleCapability extracted from the old handler;
ExtractStructuredDataCapability is new.
- ForagentTaskHandler becomes a pure dispatcher that resolves
IEnumerable<ICapability> from DI and routes on SkillId.
- ForagentCapabilities.Skills is the single source of truth for advertised
skills — both A2AOptions.Card.Skills (bus) and GatewayOptions.Skills (HTTP)
read from it, killing the duplicate appsettings.json list flagged in step 1.
- Foragent.Browser.IBrowserSession gets CapturePageSnapshotAsync. Uses
Locator.AriaSnapshotAsync (IPage.Accessibility.SnapshotAsync is obsolete in
Playwright 1.50), falls back to InnerText if the aria snapshot is empty.
Returns a PageSnapshot record with the final URL, title, content, and source.
- Real LLM wired via Microsoft.Extensions.AI + OpenAI SDK. Config under
ForagentLlm (Endpoint/ModelId/ApiKey) so Foragent can use a different model
than any rockbot host. Program.cs fails fast at startup if it's missing.
EchoChatClient removed.
- CapabilityInput.Parse is the input shim for rockbot#281 — accepts either a
bare URL or {"url":"...","description":"..."} in the single text part.
When the framework gains metadata pass-through, swap this helper and
capability contracts stay stable.
- Tests: 4 dispatcher unit tests + 6 FetchPageTitleCapability tests + 8
ExtractStructuredDataCapability tests with stubbed browser and LLM.
Integration: 3 PageSnapshot tests against the Kestrel fixture, plus one
SkippableFact that drives the real Azure Foundry model when
FORAGENT_LLM_* env is set — fake shop page, extracts {name, price_usd}.
- docker-compose passes FORAGENT_LLM_* through to the foragent service and
fails-fast-at-up if unset. .env / .env.example gain FORAGENT_LLM_* alongside
the existing RockBot LLM_* entries.
- framework-feedback.md appended with step-3 observations; rockbot#281 filed
for metadata pass-through.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RockBot.A2A.Abstractions 0.8.5 adds Metadata to AgentMessage and
AgentTaskRequest, and the gateway bridge propagates both directions. Swap
CapabilityInput.Parse to read those first; keep the JSON / bare-URL / embedded-
URL paths as fallbacks for callers that don't (or can't) populate metadata.
Also accept a URL embedded in free-form text ("fetch the title of
https://example.com") so LLM-authored requests from RockBot's invoke_agent
tool parse through without requiring prompt gymnastics.
- CapabilityInput.Parse now reads request.Message.Metadata["url"]/["description"]
first, then request.Metadata, then falls back to the text-part paths.
- Tests cover all four paths for both capabilities; TestContext.RequestWithMetadata
helper builds requests that populate either level of the metadata chain.
- framework-feedback.md updated to mark rockbot#281 resolved.
- Verified end-to-end: curl with URL in params.message.metadata and empty text
part returns the real page title from Chromium.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a second A2A skill —
extract-structured-data— that navigates to a URL with Playwright, captures an aria-snapshot of the page, and asks an LLM to extract fields matching a natural-language description. Also refactors the monolithic task handler into a per-capability dispatch pattern so growing the surface doesn't bloat the handler. Milestone 3 from the project spec (§9.1).ICapabilityinterface. Each capability owns itsAgentSkillmetadata (staticSkillDefinition) and its ownExecuteAsync.FetchPageTitleCapabilityextracted from the old handler;ExtractStructuredDataCapabilityis new.ForagentTaskHandleris now a pure dispatcher that resolvesIEnumerable<ICapability>from DI and routes onSkillId. Unknown skills return a user-facing error rather than throwing.ForagentCapabilities.Skillsis the single source of truth — bothA2AOptions.Card.Skills(bus) andGatewayOptions.Skills(HTTP) read from it. Kills theappsettings.json:Gateway:Skillsduplication flagged in step 1.Foragent.Browser.IBrowserSession.CapturePageSnapshotAsync. UsesLocator.AriaSnapshotAsync(the Chromium accessibility tree API;IPage.Accessibility.SnapshotAsyncis obsolete in Playwright 1.50). Falls back to<body>inner text if the aria snapshot is empty. Returns aPageSnapshotrecord.Microsoft.Extensions.AI+ the official OpenAI SDK. Config underForagentLlm:Endpoint/:ModelId/:ApiKey(envFORAGENT_LLM_*) — namespaced so Foragent can use a different model than the host. Program.cs fails fast if missing.EchoChatClientstub removed.CapabilityInput.Parseshim — accepts either a bare URL or{"url":"...","description":"..."}in the single text part. See rockbot#281: the RockBot A2A bridge currently drops request and message metadata, so there's no way to shipextract-structured-data's real two-input contract (URL in metadata, description in text) end-to-end today. When the framework change lands, swap the helper and capability contracts stay stable.Test plan
dotnet build --configuration Release— clean, 0 warningsdotnet test --configuration Release— 30/30 pass (18 capability/dispatcher unit tests + 11 browser/Playwright integration tests + 1 placeholder). Includes aSkippableFactLLM end-to-end test that drives Azure Foundry against a local Kestrel shop page and asserts extracted{name, price_usd}.docker compose build foragentsucceeds withFORAGENT_LLM_*propagatedLaunching Playwright Chromium+Chromium launched (version 133.x)curl /.well-known/agent-card.jsonadvertises bothfetch-page-titleandextract-structured-dataPOST /withfetch-page-titleagainsthttps://example.comreturnsExample DomainPOST /withextract-structured-dataand description"the page heading and the text of any link, as fields called heading and link_text"returns{"heading":"Example Domain","link_text":"Learn more"}Known limitations
🤖 Generated with Claude Code