Skip to content

Step 3: extract-structured-data + capability dispatch refactor#4

Merged
rockfordlhotka merged 2 commits intomainfrom
step-3-extract-data
Apr 21, 2026
Merged

Step 3: extract-structured-data + capability dispatch refactor#4
rockfordlhotka merged 2 commits intomainfrom
step-3-extract-data

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

Summary

Adds a second A2A skill — extract-structured-data — that navigates to a URL with Playwright, captures an aria-snapshot of the page, and asks an LLM to extract fields matching a natural-language description. Also refactors the monolithic task handler into a per-capability dispatch pattern so growing the surface doesn't bloat the handler. Milestone 3 from the project spec (§9.1).

  • New ICapability interface. Each capability owns its AgentSkill metadata (static SkillDefinition) and its own ExecuteAsync. FetchPageTitleCapability extracted from the old handler; ExtractStructuredDataCapability is new.
  • ForagentTaskHandler is now a pure dispatcher that resolves IEnumerable<ICapability> from DI and routes on SkillId. Unknown skills return a user-facing error rather than throwing.
  • ForagentCapabilities.Skills is the single source of truth — both A2AOptions.Card.Skills (bus) and GatewayOptions.Skills (HTTP) read from it. Kills the appsettings.json:Gateway:Skills duplication flagged in step 1.
  • Foragent.Browser.IBrowserSession.CapturePageSnapshotAsync. Uses Locator.AriaSnapshotAsync (the Chromium accessibility tree API; IPage.Accessibility.SnapshotAsync is obsolete in Playwright 1.50). Falls back to <body> inner text if the aria snapshot is empty. Returns a PageSnapshot record.
  • Real LLM wired via Microsoft.Extensions.AI + the official OpenAI SDK. Config under ForagentLlm:Endpoint / :ModelId / :ApiKey (env FORAGENT_LLM_*) — namespaced so Foragent can use a different model than the host. Program.cs fails fast if missing. EchoChatClient stub removed.
  • CapabilityInput.Parse shim — accepts either a bare URL or {"url":"...","description":"..."} in the single text part. See rockbot#281: the RockBot A2A bridge currently drops request and message metadata, so there's no way to ship extract-structured-data's real two-input contract (URL in metadata, description in text) end-to-end today. When the framework change lands, swap the helper and capability contracts stay stable.

Test plan

  • dotnet build --configuration Release — clean, 0 warnings
  • dotnet test --configuration Release — 30/30 pass (18 capability/dispatcher unit tests + 11 browser/Playwright integration tests + 1 placeholder). Includes a SkippableFact LLM end-to-end test that drives Azure Foundry against a local Kestrel shop page and asserts extracted {name, price_usd}.
  • docker compose build foragent succeeds with FORAGENT_LLM_* propagated
  • Foragent container starts and logs Launching Playwright Chromium + Chromium launched (version 133.x)
  • curl /.well-known/agent-card.json advertises both fetch-page-title and extract-structured-data
  • POST / with fetch-page-title against https://example.com returns Example Domain
  • POST / with extract-structured-data and description "the page heading and the text of any link, as fields called heading and link_text" returns {"heading":"Example Domain","link_text":"Learn more"}

Known limitations

  • A2A metadata still can't carry structured input end-to-end (rockbot#281). Foragent uses the JSON-in-text shim today.
  • Still no per-task credential handling — that's milestone 4.

🤖 Generated with Claude Code

rockfordlhotka and others added 2 commits April 21, 2026 17:16
…atch refactor

Adds a second A2A skill that navigates to a URL with Playwright, pulls an
aria-snapshot of the page, and asks an LLM to extract fields matching a
natural-language description. Also refactors the single-handler shape into a
per-capability dispatch so growing the capability surface doesn't bloat the
dispatcher. Milestone 3 from the project spec (§9.1).

- ICapability interface: each capability owns its AgentSkill metadata and its
  own ExecuteAsync. FetchPageTitleCapability extracted from the old handler;
  ExtractStructuredDataCapability is new.
- ForagentTaskHandler becomes a pure dispatcher that resolves
  IEnumerable<ICapability> from DI and routes on SkillId.
- ForagentCapabilities.Skills is the single source of truth for advertised
  skills — both A2AOptions.Card.Skills (bus) and GatewayOptions.Skills (HTTP)
  read from it, killing the duplicate appsettings.json list flagged in step 1.
- Foragent.Browser.IBrowserSession gets CapturePageSnapshotAsync. Uses
  Locator.AriaSnapshotAsync (IPage.Accessibility.SnapshotAsync is obsolete in
  Playwright 1.50), falls back to InnerText if the aria snapshot is empty.
  Returns a PageSnapshot record with the final URL, title, content, and source.
- Real LLM wired via Microsoft.Extensions.AI + OpenAI SDK. Config under
  ForagentLlm (Endpoint/ModelId/ApiKey) so Foragent can use a different model
  than any rockbot host. Program.cs fails fast at startup if it's missing.
  EchoChatClient removed.
- CapabilityInput.Parse is the input shim for rockbot#281 — accepts either a
  bare URL or {"url":"...","description":"..."} in the single text part.
  When the framework gains metadata pass-through, swap this helper and
  capability contracts stay stable.
- Tests: 4 dispatcher unit tests + 6 FetchPageTitleCapability tests + 8
  ExtractStructuredDataCapability tests with stubbed browser and LLM.
  Integration: 3 PageSnapshot tests against the Kestrel fixture, plus one
  SkippableFact that drives the real Azure Foundry model when
  FORAGENT_LLM_* env is set — fake shop page, extracts {name, price_usd}.
- docker-compose passes FORAGENT_LLM_* through to the foragent service and
  fails-fast-at-up if unset. .env / .env.example gain FORAGENT_LLM_* alongside
  the existing RockBot LLM_* entries.
- framework-feedback.md appended with step-3 observations; rockbot#281 filed
  for metadata pass-through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RockBot.A2A.Abstractions 0.8.5 adds Metadata to AgentMessage and
AgentTaskRequest, and the gateway bridge propagates both directions. Swap
CapabilityInput.Parse to read those first; keep the JSON / bare-URL / embedded-
URL paths as fallbacks for callers that don't (or can't) populate metadata.
Also accept a URL embedded in free-form text ("fetch the title of
https://example.com") so LLM-authored requests from RockBot's invoke_agent
tool parse through without requiring prompt gymnastics.

- CapabilityInput.Parse now reads request.Message.Metadata["url"]/["description"]
  first, then request.Metadata, then falls back to the text-part paths.
- Tests cover all four paths for both capabilities; TestContext.RequestWithMetadata
  helper builds requests that populate either level of the metadata chain.
- framework-feedback.md updated to mark rockbot#281 resolved.
- Verified end-to-end: curl with URL in params.message.metadata and empty text
  part returns the real page title from Chromium.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 6e941b8 into main Apr 21, 2026
1 check passed
@rockfordlhotka rockfordlhotka deleted the step-3-extract-data branch April 21, 2026 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant