Skip to content

Track scoped NPC voice sessions and playback implementation lane #288

@JOY

Description

@JOY

Purpose

Track the implementation and evidence lane for scoped NPC voice sessions, TTS playback, and future facial/viseme hooks after the text-first focused portrait slice is stable.

Background

Issue #262 closed the design/tracking baseline for the custom SECOND SPAWN voice and Ida Faber facial-animation lane. This successor issue tracks implementation and validation work that should remain separate from #139 text-first portrait and speaking animation.

Source Docs

  • docs/design/82-alpha-npc-voice-and-facial-animation-lane.md
  • docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md
  • docs/design/141-alpha-focused-npc-portrait-speaking-builder-brief.md
  • docs/design/147-alpha-ida-faber-import-verification-builder-brief.md
  • docs/design/148-open-design-decision-backlog.md
  • docs/design/52-llm-role-play-provider-evaluation.md
  • docs/playtests/_templates/npc-voice-session-playback/README.md

Scope

  • Add or validate a Nakama/API path that mints short-lived, actor-scoped api.dos.ai voice session material without exposing provider keys to Unity.
  • Add Unity NPC TTS playback only through scoped sessions.
  • Keep OpenAI Realtime, ElevenLabs, Convai-style products, or self-hosted TTS as optional provider spikes behind api.dos.ai, not client dependencies.
  • Preserve text-first dialogue fallback when voice is unavailable, too slow, too expensive, interrupted, or blocked.
  • Prove text/audio parity: played audio, speaking state, and text line must map to the same accepted line_id, or be clearly labeled as fallback.
  • Add future hooks for provider viseme data or Ida Faber blendshape profiles only after imported blendshape names are verified in Unity.
  • Capture latency, cost, reliability, interruption, fallback, replay/stale-session behavior, and per-NPC voice identity evidence.

Acceptance Criteria

  • Unity receives no provider API keys.
  • Nakama or the approved server path mints scoped session or playback material.
  • Session material is short-lived, actor-scoped, conversation-scoped, and redacted in evidence.
  • Text-only fallback still works.
  • Voice playback maps to the same accepted text line_id, or mismatch/fallback is labeled honestly.
  • Replay, skip, interruption, failure, or stale playback does not create a new accepted dialogue turn.
  • One playback path is demonstrated or honestly marked unavailable.
  • Latency, cost, failure, interruption, and fallback behavior are recorded.
  • Per-NPC voice_profile_id and voice identity result are recorded.
  • Ida Faber facial mapping is blocked until real blendshape names are verified.

Out Of Scope

Evidence Expectations

  • server-minted session or token flow evidence, redacted
  • Unity playback screenshot or clip when implementation exists
  • text-only fallback evidence
  • text/voice parity evidence from text-voice-parity-note.md
  • no provider key in client evidence
  • latency and failure-mode notes
  • verified blendshape-name report before any Ida Faber facial mapping is accepted

Refs #139, #251, #260, #262, #263.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:ai-agentOffline player agent, NPC intelligence, and agent observabilityarea:designGame design, economy rules, lore, and GDD workarea:nakamaNakama runtime, storage, auth, social, or backend modulesarea:unityUnity client, scenes, assets, or editor workflowpriority:p2Important but not blocking current milestonesize:lLarge task

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions