Skip to content

13.1.1 Build test chat interface #89

@SorraTheOrc

Description

@SorraTheOrc

Context

  • Phase 13 kicks off conversational tooling for the Echoes LLM service, but we currently do not have a developer-facing harness to exercise /parse_intent and /narrate outside of automated tests.
  • Engineers need a lightweight way to chat with the running echoes_llm_service (stub, OpenAI, Anthropic, or Foundry providers) to validate prompt changes, observe token usage, and debug latency before wiring any gameplay endpoints.
  • Providing a simple CLI chat loop will also let PMs and designers run scripted demos against remote environments without digging into FastAPI clients.

Goals

  • Provide a repeatable command (e.g., uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001) that opens an interactive prompt, accepts user text, and relays it to the configured echoes_llm_service.
  • Maintain basic multi-turn history on the client side so each request can optionally send the prior exchanges as context payload.
  • Surface useful debugging metadata (status, latency, provider/model, token counts) after each response and allow exporting transcripts.
  • Ship minimal documentation so teammates can run the tool locally or point it at a remote base URL.

Implementation Guidance

  1. Add a reusable HTTP client helper (e.g., src/gengine/echoes/llm/chat_client.py) that wraps httpx.AsyncClient and knows how to hit /parse_intent (default) and /narrate when a --mode narrate flag is set. Accept base URL, timeout, and optional API key headers.
  2. Create a CLI entry point under scripts/ (for example scripts/echoes_llm_chat.py) that:
    • uses argparse to capture --service-url, --context-file (JSON), --mode (parse|narrate), --history-limit, and --export transcript.json.
    • supports slash commands like /clear, /save <path>, and /quit for convenience.
    • keeps an in-memory List[Dict[str, str]] history that is serialized into the context payload for /parse_intent (e.g., { "history": [...], "metadata": {...} }).
    • prints structured output: intents (pretty JSON) for parse mode, generated narrative for narrate mode, plus latency/token metrics extracted from response metadata if available.
  3. Add unit tests in tests/echoes (e.g., test_llm_chat_cli.py) that mock the HTTP layer (httpx.MockTransport or respx) to verify:
    • requests are formed with history/context and mode-specific payloads
    • /clear resets the local buffer and /save writes JSON transcripts
    • error responses surface readable messages without crashing the REPL.
  4. Extend README "LLM Service" coverage (or add a short "LLM Chat Harness" subsection) documenting prerequisites, commands, and sample session transcripts. Include guidance for pointing at stub vs. OpenAI/Anthropic providers and how to supply API keys via ECHOES_LLM_* env vars.
  5. Provide a short troubleshooting section covering TLS errors, authentication failures, and how to run against docker compose (http://localhost:8001).

Acceptance Criteria

  • Running uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001 opens an interactive prompt that can exchange messages with the stub provider out of the box.
  • Users can switch between parse (intent JSON output) and narrate (story text) modes via CLI flag without restarting the service.
  • Conversation history is included in subsequent requests and can be cleared/exported via commands.
  • Errors from the service are handled gracefully with descriptive output and non-zero exit codes where appropriate.
  • Documentation (README or linked doc) explains setup, command options, and sample usage for local + remote endpoints.
  • Automated tests cover request formation, history management, and error handling.

Risks & Mitigations

  • Provider authentication differences: Document environment variables and default to stub provider, so running without API keys still works.
  • Long-running chats may reveal latency: Add per-request timing + token metrics to highlight slowness and provide guidance to switch providers.
  • Transcript storage: Limit history size (--history-limit) and redact API keys when exporting transcripts.

Tracker Reference

See .pm/tracker.md > Phase 13 > Task 13.1.1.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions