Skip to content

feat(examples): WhatsApp installment negotiation (multi-turn session.send + Asaas preview)#34

Merged
dangazineu merged 6 commits into
mainfrom
feature/whatsapp-installment-negotiation-example
May 19, 2026
Merged

feat(examples): WhatsApp installment negotiation (multi-turn session.send + Asaas preview)#34
dangazineu merged 6 commits into
mainfrom
feature/whatsapp-installment-negotiation-example

Conversation

@dangazineu

@dangazineu dangazineu commented May 17, 2026

Copy link
Copy Markdown
Contributor

Adds a multi-turn buyer-vs-merchant agent demo at
examples/whatsapp-installment-negotiation/. The buyer asks for a
payment option that was not on the merchant's pre-authored menu
("what about 6x?"), the agent computes the variant in real time by
calling the Asaas installment MCP in preview mode, presents
R$800/month back, the buyer confirms, and the agent commits the
payment, issues the NF-e, and sends the WhatsApp confirmation.

The skeleton test runs end-to-end against the OSS MCP bridge with
MCP_DEMO=true on every server and @copilotkit/aimock standing in
for api.anthropic.com — no real credentials needed. A live.test.ts
file is included for the live-LLM smoke gate (gated on
CODESPAR_LIVE_SMOKE=1) and is required to pass locally before
pushing, per the codespar-core CLAUDE.md workflow rule. Live smoke
against api.anthropic.com (claude-sonnet-4-6) was run separately
and exercised the full four-turn flow.

Why this matters

A BSP flow-builder (Blip, Zenvia, Take) breaks when a buyer asks for
a payment option the merchant did not pre-author. "What about 6x?"
is not on the menu; the flow-builder has no branch for it. The agent
computes the variant from a real MCP tool call, not from prose
hardcoded in the prompt. The runtime drives three buyer messages
through three separate session.send() calls; narrative continuity
across those sends lives in the fixture authoring and the agent's
prompt, not in shared message state (see "Per-send fixture
semantics" below).

Three judgment points the agent navigates:

  1. Which options to present first — Pix with discount vs. 12x as
    openers. Picking the wrong opener costs deals; no rule fires here.
  2. How to answer a non-enumerated variant — call Asaas, get
    R$800/month back, present it. A script that anticipates only
    pre-enumerated installment counts misroutes silently.
  3. When to stop exploring and close — after "confirma, pode
    fechar" the agent commits the payment and issues the NF-e instead
    of proposing another variant.

Per-send fixture semantics (read before reviewing the fixture)

The OSS chat loop resets the messages array on every
session.send() call. turnIndex (which aimock derives from the
number of assistant messages in the current LLM request) therefore
restarts at 0 on each new send rather than running continuously 0-4
across all three buyer turns. The five fixture entries are organised
per-send, and the two "after tool result" continuations sit at
turnIndex 1 within their own send with userMessage
discriminators so they do not collide. Substring matching on
userMessage is case-sensitive — the turn-3 match is Confirma,
not confirm.

The README's "How to extend the fixture for your own multi-turn
flow" subsection walks through the entry-per-completion mapping and
the turnIndex + hasToolResult match-key pattern.

Scaffold inherited from the natural-language NFS-e example

The boilerplate is verbatim from
examples/nfse-from-natural-language/:
three runtime modes in validate.sh (Docker default /
CODESPAR_BASE_URL / CODESPAR_RUNTIME_DIR), aimock lifecycle,
live.test.ts gated on CODESPAR_LIVE_SMOKE=1, the three
mockability layers boilerplate in the README, exact MCP pins as
devDeps, --demo flags in mcp-servers.json (source of truth for
demo mode).

What is new here:

  • Three session.send() calls instead of one — the test drives
    the buyer's three messages explicitly.
  • Five-entry aimock fixture instead of three — opener text →
    preview tool_use → preview reply text → close tool_uses × 3 →
    final confirmation text, organised per-send (see above).
  • The Asaas MCP get_installments preview path — the agent
    calls get_installments(value: 4800, installments: 6) without an
    id to get a hypothetical schedule before committing. The
    response shape carries preview: true and status: "PREVIEW" per
    installment so the test (and the agent) can distinguish a preview
    from a real payment schedule.
  • package-lock.json committednpm ci in the
    validate-example-whatsapp-installment-negotiation CI job
    requires it, pinned against the published
    @codespar/mcp-asaas@0.2.0.

Files

File Purpose
package.json Exact pins: @codespar/mcp-asaas@0.2.0, @codespar/mcp-nuvem-fiscal@0.3.0, @codespar/mcp-z-api@0.2.1, @codespar/sdk@^0.9.0
package-lock.json Lockfile against the published @codespar/mcp-asaas@0.2.0; required by npm ci in CI
mcp-servers.json Three stdio servers — asaas, nuvem-fiscal, z-api, all with --demo
fixtures/aimock-fixtures.json Five fixture entries organised per-send (turnIndex restarts at 0 each send, userMessage discriminators on the two turnIndex 1 continuations); turn 2's "R$800,00" text is hardcoded to match the Asaas demo handler's deterministic installmentValue: 800 for value: 4800 / installments: 6
skeleton.test.ts Three-send() test asserting the Asaas preview, the create_payment with installments: 6, the NF-e issuance, and the WhatsApp confirmation
live.test.ts Same flow against real Claude, gated on CODESPAR_LIVE_SMOKE=1, coarse assertions tolerant of LLM probabilism
scripts/validate.sh Same three runtime modes as the NFS-e example, container name updated to codespar-example-installments-$$
scripts/validate-live.sh Same modes minus aimock, requires ANTHROPIC_API_KEY
tsconfig.json / vitest.config.ts / .npmrc / .gitignore Identical to the NFS-e example
.github/workflows/ci.yml New validate-example-whatsapp-installment-negotiation job, same shape as the existing validate-example-nfse-from-natural-language job

Acceptance criteria

The skeleton.test.ts spec asserts:

  1. Turn 1 — string message, no tool calls (opener is conversation only).
  2. Turn 2 — exactly one asaas__get_installments call with
    input.value === 4800 and input.installments === 6, output carrying
    preview: true, installmentCount: 6, installmentValue: 800, and a
    six-entry installments array.
  3. Turn 3 — exactly one asaas__create_payment call with
    billingType: "CREDIT_CARD", value: 4800, installments: 6, output
    carrying id matching /^pay_demo_/, installments: 6,
    installmentValue: 800.
  4. Turn 3 — exactly one nuvem-fiscal__create_nfe call returning
    id matching /^nfe_demo_/ and status === "autorizada".
  5. Turn 3 — at least one z-api__send_text call whose message
    matches /confirm/i.
  6. Cross-turn — total iterations >= 3 across the three calls.
  7. Cross-turn — every dispatched tool call records
    status === "success".

Out of scope

  • Live test does not yet assert z-api__send_text presence. The
    skeleton test does, but the live test's coarse assertions skip
    this. Filed as a follow-up; not a blocker for merging this example.
  • Installment-interest impact on NF-e taxable amount. Brazilian
    credit-card installments often carry juros parcelado that
    increases the NF-e taxable amount. The Asaas demo handler computes
    value / installments flat; the NF-e is issued for the original
    sticker price. Documented in the README as a known gap.
  • A delete_payment cleanup path if the buyer backs out after the
    preview.
    The preview path now exists precisely to avoid creating
    payments tentatively, so the cleanup case is no longer needed. The
    demo never creates a payment until the buyer confirms.
  • The Nuvem Fiscal pagamento echo for create_nfe. Not
    required for this example — the existing canned demo response is
    sufficient. Can land as a focused follow-on if a future demo needs
    to assert installment terms round-trip into the NF-e response.

…send + Asaas preview)

Adds a multi-turn buyer-vs-merchant agent demo where the buyer asks
for a payment option that wasn't on the pre-authored menu ("what
about 6x?"), the agent computes the variant in real time by calling
the Asaas installment MCP in preview mode, the buyer confirms, and
the agent commits the payment, issues the NF-e, and sends the
WhatsApp confirmation.

Builds on the same scaffold the natural-language NFS-e example
shipped (Docker / CODESPAR_BASE_URL / CODESPAR_RUNTIME_DIR runtime
modes, aimock lifecycle, three mockability layers, live-LLM smoke
gate). The new piece is multi-turn session.send() — three buyer
messages drive three separate session.send() calls that share
conversation history.

The example pins exact versions of the MCP catalog:
  - @codespar/mcp-asaas@0.2.0 (introduces stateful installment
    fixtures + the get_installments preview path)
  - @codespar/mcp-nuvem-fiscal@0.3.0 (already published)
  - @codespar/mcp-z-api@0.2.1 (already published)

The Asaas 0.2.0 version ships in a paired mcp-dev-latam PR; this
example's npm install will fail until that ships. CI job added but
will go red until the dependency resolves.

The live-LLM smoke (npm run validate:live) is required to pass
locally before pushing — per the workflow rule in CLAUDE.md.
Aimock-driven skeleton.test.ts cannot catch Anthropic tool-name
regex, invalid model id, or system-prompt regressions that only
surface against real api.anthropic.com.
…from demo customer id

Rename cus_demo_buyer_d2 → cus_demo_buyer_001 in the aimock fixture
and the live-test prompt. The previous suffix referenced the private
demo codename, which must not appear in public-repo artifacts.
…schema + drop emoji

Two findings from the multi-reviewer panel:

1. The create_nfe fixture used a service-style {servico, valor}
   payload copied from the natural-language NFS-e example. The
   product NF-e tool actually requires ambiente, natureza_operacao,
   emitente, destinatario, itens, pagamento (all six are flagged
   required in its inputSchema). The --demo handler accepts the
   wrong shape silently, but the fixture would mislead anyone
   reading it as a template for a real NF-e call. Updated the
   aimock fixture to use the correct NF-e shape and updated the
   live-test turn-3 prompt to match.

2. The aimock fixture closing text carried a trailing furniture
   emoji from the early draft. Workspace convention forbids emojis
   in code or docs. Removed.
…fixture pattern + flat-math choice

Round 2 aggregate review surfaced two README accuracy issues:

- 'four LLM completion turns' was wrong; the fixture has five entries
  because each round of tool execution adds one extra completion
  request. Corrected the opening paragraph.

- The mockability section explained the fixture data flow but didn't
  give a copying customer enough guidance to extend it to their own
  multi-turn demos. Added 'How to extend the fixture for your own
  multi-turn flow' subsection with the entry-per-completion mapping
  and the turnIndex + hasToolResult match-key pattern.

- Made the flat-math (no juros) demo choice explicit so a reader
  doesn't assume the absence of interest is an oversight. Linked to
  the existing 'Known platform gaps' section for the deeper
  taxable-amount discussion.
…on — mcp-asaas@0.2.0 is now published

@codespar/mcp-asaas@0.2.0 is live on the npm registry, so the
pinned devDependencies resolve cleanly. Add the lockfile that
CI's npm ci step expects, generated against the published
version. Unblocks the validate-example-whatsapp-installment-negotiation
job.
@dangazineu dangazineu marked this pull request as ready for review May 18, 2026 23:06
… on per-send turnIndex semantics

The OSS runtime's chat loop starts a fresh messages array on every
session.send() call, so turnIndex (which aimock derives from the
number of assistant messages in the current LLM request) restarts
at 0 on each new send rather than running continuously 0-4 across
all three buyer turns. Restructure the five fixture entries so the
two 'after tool result' continuations (entries 2 and 4) sit at
turnIndex 1 within their own send and add userMessage discriminators
so they don't collide. Also fix the casing on the turn-3 user-message
match ('confirm' -> 'Confirma') since aimock substring matching is
case-sensitive.

Updates the README's fixture-pattern table to reflect per-send
turnIndex resetting, and corrects the wording that claimed the
session carries conversation history across sends — it does not in
the current runtime; the narrative continuity lives in the fixture
authoring, not in shared message state.
@dangazineu dangazineu merged commit 5fbe7f6 into main May 19, 2026
4 checks passed
dangazineu added a commit that referenced this pull request May 19, 2026
…negotiation example (#41)

Raises the existing Pix + NFS-e walking skeleton to the bar set by
[#34](#34)
(whatsapp-installment-negotiation). That PR established a set of
OSS-demo conventions — exact MCP pins, deep per-call assertions, and a
README shape that walks a copying customer from the hook → why-an-agent
→ run paths → per-turn acceptance criteria → known gaps. This PR aligns
the walking-skeleton example with the assertions-and-README parts of
that bar.

This demo intentionally is not an agent-thesis demo (no LLM, no aimock,
no multi-turn `session.send()`), so the W2 items that target the aimock
/ fixture / iterations machinery do not apply here. The applicable parts
of the bar — exact MCP pins, deep per-call assertions, README structure
with regex literals and exact values — are what this PR addresses.
dangazineu added a commit that referenced this pull request May 19, 2026
…tallment-negotiation example (#40)

Raises the existing NFS-e-from-natural-language demo to the bar set by
[#34](#34)
(whatsapp-installment-negotiation). That PR established a set of
OSS-demo conventions — exact MCP pins, multi-key aimock matchers, deep
per-call assertions, and a README shape that walks a copying customer
from the hook → why-an-agent → run paths → per-turn acceptance criteria
→ known gaps. This PR aligns the earlier nfse-from-natural-language demo
with all of that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant