feat(examples): WhatsApp installment negotiation (multi-turn session.send + Asaas preview)#34
Merged
dangazineu merged 6 commits intoMay 19, 2026
Conversation
…send + Asaas preview)
Adds a multi-turn buyer-vs-merchant agent demo where the buyer asks
for a payment option that wasn't on the pre-authored menu ("what
about 6x?"), the agent computes the variant in real time by calling
the Asaas installment MCP in preview mode, the buyer confirms, and
the agent commits the payment, issues the NF-e, and sends the
WhatsApp confirmation.
Builds on the same scaffold the natural-language NFS-e example
shipped (Docker / CODESPAR_BASE_URL / CODESPAR_RUNTIME_DIR runtime
modes, aimock lifecycle, three mockability layers, live-LLM smoke
gate). The new piece is multi-turn session.send() — three buyer
messages drive three separate session.send() calls that share
conversation history.
The example pins exact versions of the MCP catalog:
- @codespar/mcp-asaas@0.2.0 (introduces stateful installment
fixtures + the get_installments preview path)
- @codespar/mcp-nuvem-fiscal@0.3.0 (already published)
- @codespar/mcp-z-api@0.2.1 (already published)
The Asaas 0.2.0 version ships in a paired mcp-dev-latam PR; this
example's npm install will fail until that ships. CI job added but
will go red until the dependency resolves.
The live-LLM smoke (npm run validate:live) is required to pass
locally before pushing — per the workflow rule in CLAUDE.md.
Aimock-driven skeleton.test.ts cannot catch Anthropic tool-name
regex, invalid model id, or system-prompt regressions that only
surface against real api.anthropic.com.
…from demo customer id Rename cus_demo_buyer_d2 → cus_demo_buyer_001 in the aimock fixture and the live-test prompt. The previous suffix referenced the private demo codename, which must not appear in public-repo artifacts.
…schema + drop emoji
Two findings from the multi-reviewer panel:
1. The create_nfe fixture used a service-style {servico, valor}
payload copied from the natural-language NFS-e example. The
product NF-e tool actually requires ambiente, natureza_operacao,
emitente, destinatario, itens, pagamento (all six are flagged
required in its inputSchema). The --demo handler accepts the
wrong shape silently, but the fixture would mislead anyone
reading it as a template for a real NF-e call. Updated the
aimock fixture to use the correct NF-e shape and updated the
live-test turn-3 prompt to match.
2. The aimock fixture closing text carried a trailing furniture
emoji from the early draft. Workspace convention forbids emojis
in code or docs. Removed.
…fixture pattern + flat-math choice Round 2 aggregate review surfaced two README accuracy issues: - 'four LLM completion turns' was wrong; the fixture has five entries because each round of tool execution adds one extra completion request. Corrected the opening paragraph. - The mockability section explained the fixture data flow but didn't give a copying customer enough guidance to extend it to their own multi-turn demos. Added 'How to extend the fixture for your own multi-turn flow' subsection with the entry-per-completion mapping and the turnIndex + hasToolResult match-key pattern. - Made the flat-math (no juros) demo choice explicit so a reader doesn't assume the absence of interest is an oversight. Linked to the existing 'Known platform gaps' section for the deeper taxable-amount discussion.
…on — mcp-asaas@0.2.0 is now published @codespar/mcp-asaas@0.2.0 is live on the npm registry, so the pinned devDependencies resolve cleanly. Add the lockfile that CI's npm ci step expects, generated against the published version. Unblocks the validate-example-whatsapp-installment-negotiation job.
… on per-send turnIndex semantics
The OSS runtime's chat loop starts a fresh messages array on every
session.send() call, so turnIndex (which aimock derives from the
number of assistant messages in the current LLM request) restarts
at 0 on each new send rather than running continuously 0-4 across
all three buyer turns. Restructure the five fixture entries so the
two 'after tool result' continuations (entries 2 and 4) sit at
turnIndex 1 within their own send and add userMessage discriminators
so they don't collide. Also fix the casing on the turn-3 user-message
match ('confirm' -> 'Confirma') since aimock substring matching is
case-sensitive.
Updates the README's fixture-pattern table to reflect per-send
turnIndex resetting, and corrects the wording that claimed the
session carries conversation history across sends — it does not in
the current runtime; the narrative continuity lives in the fixture
authoring, not in shared message state.
dangazineu
added a commit
that referenced
this pull request
May 19, 2026
…negotiation example (#41) Raises the existing Pix + NFS-e walking skeleton to the bar set by [#34](#34) (whatsapp-installment-negotiation). That PR established a set of OSS-demo conventions — exact MCP pins, deep per-call assertions, and a README shape that walks a copying customer from the hook → why-an-agent → run paths → per-turn acceptance criteria → known gaps. This PR aligns the walking-skeleton example with the assertions-and-README parts of that bar. This demo intentionally is not an agent-thesis demo (no LLM, no aimock, no multi-turn `session.send()`), so the W2 items that target the aimock / fixture / iterations machinery do not apply here. The applicable parts of the bar — exact MCP pins, deep per-call assertions, README structure with regex literals and exact values — are what this PR addresses.
dangazineu
added a commit
that referenced
this pull request
May 19, 2026
…tallment-negotiation example (#40) Raises the existing NFS-e-from-natural-language demo to the bar set by [#34](#34) (whatsapp-installment-negotiation). That PR established a set of OSS-demo conventions — exact MCP pins, multi-key aimock matchers, deep per-call assertions, and a README shape that walks a copying customer from the hook → why-an-agent → run paths → per-turn acceptance criteria → known gaps. This PR aligns the earlier nfse-from-natural-language demo with all of that.
This was referenced May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a multi-turn buyer-vs-merchant agent demo at
examples/whatsapp-installment-negotiation/. The buyer asks for apayment option that was not on the merchant's pre-authored menu
("what about 6x?"), the agent computes the variant in real time by
calling the Asaas installment MCP in preview mode, presents
R$800/month back, the buyer confirms, and the agent commits the
payment, issues the NF-e, and sends the WhatsApp confirmation.
The skeleton test runs end-to-end against the OSS MCP bridge with
MCP_DEMO=trueon every server and@copilotkit/aimockstanding infor
api.anthropic.com— no real credentials needed. Alive.test.tsfile is included for the live-LLM smoke gate (gated on
CODESPAR_LIVE_SMOKE=1) and is required to pass locally beforepushing, per the codespar-core CLAUDE.md workflow rule. Live smoke
against
api.anthropic.com(claude-sonnet-4-6) was run separatelyand exercised the full four-turn flow.
Why this matters
A BSP flow-builder (Blip, Zenvia, Take) breaks when a buyer asks for
a payment option the merchant did not pre-author. "What about 6x?"
is not on the menu; the flow-builder has no branch for it. The agent
computes the variant from a real MCP tool call, not from prose
hardcoded in the prompt. The runtime drives three buyer messages
through three separate
session.send()calls; narrative continuityacross those sends lives in the fixture authoring and the agent's
prompt, not in shared message state (see "Per-send fixture
semantics" below).
Three judgment points the agent navigates:
openers. Picking the wrong opener costs deals; no rule fires here.
R$800/month back, present it. A script that anticipates only
pre-enumerated installment counts misroutes silently.
fechar" the agent commits the payment and issues the NF-e instead
of proposing another variant.
Per-send fixture semantics (read before reviewing the fixture)
The OSS chat loop resets the
messagesarray on everysession.send()call.turnIndex(which aimock derives from thenumber of assistant messages in the current LLM request) therefore
restarts at 0 on each new send rather than running continuously 0-4
across all three buyer turns. The five fixture entries are organised
per-send, and the two "after tool result" continuations sit at
turnIndex 1within their own send withuserMessagediscriminators so they do not collide. Substring matching on
userMessageis case-sensitive — the turn-3 match isConfirma,not
confirm.The README's "How to extend the fixture for your own multi-turn
flow" subsection walks through the entry-per-completion mapping and
the
turnIndex+hasToolResultmatch-key pattern.Scaffold inherited from the natural-language NFS-e example
The boilerplate is verbatim from
examples/nfse-from-natural-language/:three runtime modes in
validate.sh(Docker default /CODESPAR_BASE_URL/CODESPAR_RUNTIME_DIR), aimock lifecycle,live.test.tsgated onCODESPAR_LIVE_SMOKE=1, the threemockability layers boilerplate in the README, exact MCP pins as
devDeps,
--demoflags inmcp-servers.json(source of truth fordemo mode).
What is new here:
session.send()calls instead of one — the test drivesthe buyer's three messages explicitly.
preview tool_use → preview reply text → close tool_uses × 3 →
final confirmation text, organised per-send (see above).
get_installmentspreview path — the agentcalls
get_installments(value: 4800, installments: 6)without anidto get a hypothetical schedule before committing. Theresponse shape carries
preview: trueandstatus: "PREVIEW"perinstallment so the test (and the agent) can distinguish a preview
from a real payment schedule.
package-lock.jsoncommitted —npm ciin thevalidate-example-whatsapp-installment-negotiationCI jobrequires it, pinned against the published
@codespar/mcp-asaas@0.2.0.Files
package.json@codespar/mcp-asaas@0.2.0,@codespar/mcp-nuvem-fiscal@0.3.0,@codespar/mcp-z-api@0.2.1,@codespar/sdk@^0.9.0package-lock.json@codespar/mcp-asaas@0.2.0; required bynpm ciin CImcp-servers.jsonasaas,nuvem-fiscal,z-api, all with--demofixtures/aimock-fixtures.jsonturnIndexrestarts at 0 each send,userMessagediscriminators on the twoturnIndex 1continuations); turn 2's "R$800,00" text is hardcoded to match the Asaas demo handler's deterministicinstallmentValue: 800forvalue: 4800/installments: 6skeleton.test.tssend()test asserting the Asaas preview, thecreate_paymentwithinstallments: 6, the NF-e issuance, and the WhatsApp confirmationlive.test.tsCODESPAR_LIVE_SMOKE=1, coarse assertions tolerant of LLM probabilismscripts/validate.shcodespar-example-installments-$$scripts/validate-live.shANTHROPIC_API_KEYtsconfig.json/vitest.config.ts/.npmrc/.gitignore.github/workflows/ci.ymlvalidate-example-whatsapp-installment-negotiationjob, same shape as the existingvalidate-example-nfse-from-natural-languagejobAcceptance criteria
The
skeleton.test.tsspec asserts:message, no tool calls (opener is conversation only).asaas__get_installmentscall withinput.value === 4800andinput.installments === 6, output carryingpreview: true,installmentCount: 6,installmentValue: 800, and asix-entry
installmentsarray.asaas__create_paymentcall withbillingType: "CREDIT_CARD",value: 4800,installments: 6, outputcarrying
idmatching/^pay_demo_/,installments: 6,installmentValue: 800.nuvem-fiscal__create_nfecall returningidmatching/^nfe_demo_/andstatus === "autorizada".z-api__send_textcall whose messagematches
/confirm/i.iterations >= 3across the three calls.status === "success".Out of scope
z-api__send_textpresence. Theskeleton test does, but the live test's coarse assertions skip
this. Filed as a follow-up; not a blocker for merging this example.
credit-card installments often carry
juros parceladothatincreases the NF-e taxable amount. The Asaas demo handler computes
value / installmentsflat; the NF-e is issued for the originalsticker price. Documented in the README as a known gap.
delete_paymentcleanup path if the buyer backs out after thepreview. The preview path now exists precisely to avoid creating
payments tentatively, so the cleanup case is no longer needed. The
demo never creates a payment until the buyer confirms.
pagamentoecho forcreate_nfe. Notrequired for this example — the existing canned demo response is
sufficient. Can land as a focused follow-on if a future demo needs
to assert installment terms round-trip into the NF-e response.