Track scoped NPC voice sessions and playback implementation lane

## Purpose

Track the implementation and evidence lane for scoped NPC voice sessions, TTS playback, and future facial/viseme hooks after the text-first focused portrait slice is stable.

## Background

Issue #262 closed the design/tracking baseline for the custom SECOND SPAWN voice and Ida Faber facial-animation lane. This successor issue tracks implementation and validation work that should remain separate from #139 text-first portrait and speaking animation.

## Source Docs

- `docs/design/82-alpha-npc-voice-and-facial-animation-lane.md`
- `docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md`
- `docs/design/141-alpha-focused-npc-portrait-speaking-builder-brief.md`
- `docs/design/147-alpha-ida-faber-import-verification-builder-brief.md`
- `docs/design/148-open-design-decision-backlog.md`
- `docs/design/52-llm-role-play-provider-evaluation.md`
- `docs/playtests/_templates/npc-voice-session-playback/README.md`

## Scope

- Add or validate a Nakama/API path that mints short-lived, actor-scoped `api.dos.ai` voice session material without exposing provider keys to Unity.
- Add Unity NPC TTS playback only through scoped sessions.
- Keep OpenAI Realtime, ElevenLabs, Convai-style products, or self-hosted TTS as optional provider spikes behind `api.dos.ai`, not client dependencies.
- Preserve text-first dialogue fallback when voice is unavailable, too slow, too expensive, interrupted, or blocked.
- Prove text/audio parity: played audio, speaking state, and text line must map to the same accepted `line_id`, or be clearly labeled as fallback.
- Add future hooks for provider viseme data or Ida Faber blendshape profiles only after imported blendshape names are verified in Unity.
- Capture latency, cost, reliability, interruption, fallback, replay/stale-session behavior, and per-NPC voice identity evidence.

## Acceptance Criteria

- [ ] Unity receives no provider API keys.
- [ ] Nakama or the approved server path mints scoped session or playback material.
- [ ] Session material is short-lived, actor-scoped, conversation-scoped, and redacted in evidence.
- [ ] Text-only fallback still works.
- [ ] Voice playback maps to the same accepted text `line_id`, or mismatch/fallback is labeled honestly.
- [ ] Replay, skip, interruption, failure, or stale playback does not create a new accepted dialogue turn.
- [ ] One playback path is demonstrated or honestly marked unavailable.
- [ ] Latency, cost, failure, interruption, and fallback behavior are recorded.
- [ ] Per-NPC `voice_profile_id` and voice identity result are recorded.
- [ ] Ida Faber facial mapping is blocked until real blendshape names are verified.

## Out Of Scope

- Full voice for all NPCs in alpha.
- Provider API keys in Unity.
- Client-owned dialogue, quest, reward, TIME, SECOND, inventory, combat, or body mutations.
- Making any vendor SDK the durable NPC memory or dialogue backbone.
- Closing #288 on a local OS speech fallback alone.

## Evidence Expectations

- server-minted session or token flow evidence, redacted
- Unity playback screenshot or clip when implementation exists
- text-only fallback evidence
- text/voice parity evidence from `text-voice-parity-note.md`
- no provider key in client evidence
- latency and failure-mode notes
- verified blendshape-name report before any Ida Faber facial mapping is accepted

Refs #139, #251, #260, #262, #263.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track scoped NPC voice sessions and playback implementation lane #288

Purpose

Background

Source Docs

Scope

Acceptance Criteria

Out Of Scope

Evidence Expectations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Track scoped NPC voice sessions and playback implementation lane #288

Description

Purpose

Background

Source Docs

Scope

Acceptance Criteria

Out Of Scope

Evidence Expectations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions