-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Context
- Phase 13 kicks off conversational tooling for the Echoes LLM service, but we currently do not have a developer-facing harness to exercise
/parse_intentand/narrateoutside of automated tests. - Engineers need a lightweight way to chat with the running
echoes_llm_service(stub, OpenAI, Anthropic, or Foundry providers) to validate prompt changes, observe token usage, and debug latency before wiring any gameplay endpoints. - Providing a simple CLI chat loop will also let PMs and designers run scripted demos against remote environments without digging into FastAPI clients.
Goals
- Provide a repeatable command (e.g.,
uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001) that opens an interactive prompt, accepts user text, and relays it to the configuredechoes_llm_service. - Maintain basic multi-turn history on the client side so each request can optionally send the prior exchanges as context payload.
- Surface useful debugging metadata (status, latency, provider/model, token counts) after each response and allow exporting transcripts.
- Ship minimal documentation so teammates can run the tool locally or point it at a remote base URL.
Implementation Guidance
- Add a reusable HTTP client helper (e.g.,
src/gengine/echoes/llm/chat_client.py) that wrapshttpx.AsyncClientand knows how to hit/parse_intent(default) and/narratewhen a--mode narrateflag is set. Accept base URL, timeout, and optional API key headers. - Create a CLI entry point under
scripts/(for examplescripts/echoes_llm_chat.py) that:- uses
argparseto capture--service-url,--context-file(JSON),--mode (parse|narrate),--history-limit, and--export transcript.json. - supports slash commands like
/clear,/save <path>, and/quitfor convenience. - keeps an in-memory
List[Dict[str, str]]history that is serialized into thecontextpayload for/parse_intent(e.g.,{ "history": [...], "metadata": {...} }). - prints structured output: intents (pretty JSON) for parse mode, generated narrative for narrate mode, plus latency/token metrics extracted from response metadata if available.
- uses
- Add unit tests in
tests/echoes(e.g.,test_llm_chat_cli.py) that mock the HTTP layer (httpx.MockTransportorrespx) to verify:- requests are formed with history/context and mode-specific payloads
/clearresets the local buffer and/savewrites JSON transcripts- error responses surface readable messages without crashing the REPL.
- Extend README "LLM Service" coverage (or add a short "LLM Chat Harness" subsection) documenting prerequisites, commands, and sample session transcripts. Include guidance for pointing at stub vs. OpenAI/Anthropic providers and how to supply API keys via
ECHOES_LLM_*env vars. - Provide a short troubleshooting section covering TLS errors, authentication failures, and how to run against
docker compose(http://localhost:8001).
Acceptance Criteria
- Running
uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001opens an interactive prompt that can exchange messages with the stub provider out of the box. - Users can switch between
parse(intent JSON output) andnarrate(story text) modes via CLI flag without restarting the service. - Conversation history is included in subsequent requests and can be cleared/exported via commands.
- Errors from the service are handled gracefully with descriptive output and non-zero exit codes where appropriate.
- Documentation (README or linked doc) explains setup, command options, and sample usage for local + remote endpoints.
- Automated tests cover request formation, history management, and error handling.
Risks & Mitigations
- Provider authentication differences: Document environment variables and default to stub provider, so running without API keys still works.
- Long-running chats may reveal latency: Add per-request timing + token metrics to highlight slowness and provide guidance to switch providers.
- Transcript storage: Limit history size (
--history-limit) and redact API keys when exporting transcripts.
Tracker Reference
See .pm/tracker.md > Phase 13 > Task 13.1.1.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels