13.1.1 Build test chat interface

## Context
- Phase 13 kicks off conversational tooling for the Echoes LLM service, but we currently do not have a developer-facing harness to exercise `/parse_intent` and `/narrate` outside of automated tests.
- Engineers need a lightweight way to chat with the running `echoes_llm_service` (stub, OpenAI, Anthropic, or Foundry providers) to validate prompt changes, observe token usage, and debug latency before wiring any gameplay endpoints.
- Providing a simple CLI chat loop will also let PMs and designers run scripted demos against remote environments without digging into FastAPI clients.

## Goals
- Provide a repeatable command (e.g., `uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001`) that opens an interactive prompt, accepts user text, and relays it to the configured `echoes_llm_service`.
- Maintain basic multi-turn history on the client side so each request can optionally send the prior exchanges as context payload.
- Surface useful debugging metadata (status, latency, provider/model, token counts) after each response and allow exporting transcripts.
- Ship minimal documentation so teammates can run the tool locally or point it at a remote base URL.

## Implementation Guidance
1. Add a reusable HTTP client helper (e.g., `src/gengine/echoes/llm/chat_client.py`) that wraps `httpx.AsyncClient` and knows how to hit `/parse_intent` (default) and `/narrate` when a `--mode narrate` flag is set. Accept base URL, timeout, and optional API key headers.
2. Create a CLI entry point under `scripts/` (for example `scripts/echoes_llm_chat.py`) that:
   - uses `argparse` to capture `--service-url`, `--context-file` (JSON), `--mode (parse|narrate)`, `--history-limit`, and `--export transcript.json`.
   - supports slash commands like `/clear`, `/save <path>`, and `/quit` for convenience.
   - keeps an in-memory `List[Dict[str, str]]` history that is serialized into the `context` payload for `/parse_intent` (e.g., `{ "history": [...], "metadata": {...} }`).
   - prints structured output: intents (pretty JSON) for parse mode, generated narrative for narrate mode, plus latency/token metrics extracted from response metadata if available.
3. Add unit tests in `tests/echoes` (e.g., `test_llm_chat_cli.py`) that mock the HTTP layer (`httpx.MockTransport` or `respx`) to verify:
   - requests are formed with history/context and mode-specific payloads
   - `/clear` resets the local buffer and `/save` writes JSON transcripts
   - error responses surface readable messages without crashing the REPL.
4. Extend README "LLM Service" coverage (or add a short "LLM Chat Harness" subsection) documenting prerequisites, commands, and sample session transcripts. Include guidance for pointing at stub vs. OpenAI/Anthropic providers and how to supply API keys via `ECHOES_LLM_*` env vars.
5. Provide a short troubleshooting section covering TLS errors, authentication failures, and how to run against `docker compose` (`http://localhost:8001`).

## Acceptance Criteria
- Running `uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001` opens an interactive prompt that can exchange messages with the stub provider out of the box.
- Users can switch between `parse` (intent JSON output) and `narrate` (story text) modes via CLI flag without restarting the service.
- Conversation history is included in subsequent requests and can be cleared/exported via commands.
- Errors from the service are handled gracefully with descriptive output and non-zero exit codes where appropriate.
- Documentation (README or linked doc) explains setup, command options, and sample usage for local + remote endpoints.
- Automated tests cover request formation, history management, and error handling.

## Risks & Mitigations
- **Provider authentication differences**: Document environment variables and default to stub provider, so running without API keys still works.
- **Long-running chats may reveal latency**: Add per-request timing + token metrics to highlight slowness and provide guidance to switch providers.
- **Transcript storage**: Limit history size (`--history-limit`) and redact API keys when exporting transcripts.

## Tracker Reference
See `.pm/tracker.md` > Phase 13 > Task 13.1.1.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

13.1.1 Build test chat interface #89

Context

Goals

Implementation Guidance

Acceptance Criteria

Risks & Mitigations

Tracker Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

13.1.1 Build test chat interface #89

Description

Context

Goals

Implementation Guidance

Acceptance Criteria

Risks & Mitigations

Tracker Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions