Skip to content

docs: add progressive disclosure AI documentation#2132

Open
BenWeekes wants to merge 10 commits intomainfrom
docs/progressive-disclosure
Open

docs: add progressive disclosure AI documentation#2132
BenWeekes wants to merge 10 commits intomainfrom
docs/progressive-disclosure

Conversation

@BenWeekes
Copy link
Copy Markdown
Contributor

@BenWeekes BenWeekes commented Apr 7, 2026

Summary

Progressive disclosure AI documentation for the TEN Framework repo — structured as L0 (repo card) → L1 (8 topic files) → L2 (6 deep dives) so AI agents and developers can navigate efficiently.

What's included

  • L0: Repo identity card with one-line description blockquote and L1 index
  • L1: Setup, architecture, code map, conventions, workflows, interfaces, gotchas, security
  • L2 deep dives: Deployment, extension development, graph configuration, operations/restarts, server architecture, testing
  • AGENTS.md: Aligned to upstream 4.7 standard — How to Load, Levels, Git Conventions (commit messages, branch names, general rules), Doc Commands
  • CLAUDE.md: Thin @AGENTS.md redirect per standard template
  • ai_agents/AGENTS.md: Reduced from 539 lines to 3-line redirect — all content now lives in docs/ai/

Accuracy pass

All docs verified against actual repo code and corrected:

  • Fixed 30+ factual errors including wrong function signatures, incorrect Go struct types, wrong port numbers, stale process names, missing manifest.json nesting
  • Base class abstract methods tables now match actual source (ASR, TTS WS, TTS HTTP, LLM)
  • Server architecture: corrected startPropMap type/mappings, worker spawn command, API routes
  • Graph configuration: added missing nodes, fixed extension_group names, added names array pattern
  • Extension development: fixed request_tts signature, added update_params() pattern, corrected manifest schema
  • Unified nuclear restart procedure, replaced bin/main with bin/worker throughout
  • Test docs: correct guarder test counts, optional vs required configs, vendor-specific coverage

Codex review fixes

  • tman installtask install in graph_configuration.md (avoids wiping bin/worker)
  • Manifest dependency examples now show path-based format matching actual repo manifests
  • Architecture sample graph wrapped in required "ten" object
  • Added config log prefix convention (config:) for QA automation regex

Portal cross-references

Added links to relevant portal docs: interrupt flow tutorial, graph concepts, message system, deploy guide, API events/schemas

Related

Test plan

  • Cross-referenced all code examples against actual source files
  • Verified paths, commands, and config structures exist in repo
  • No broken internal links between L0/L1/L2 files
  • AGENTS.md matches upstream 4.7 template structure

add structured documentation hierarchy for AI coding agents:

- L0 repo card: identity, L1 index with audience column
- L1 summaries (8 files): setup, architecture, code map, conventions,
  workflows, interfaces, gotchas, security
- L2 deep dives (6 files): extension development, graph configuration,
  testing, deployment, server architecture, operations/restarts
- root AGENTS.md and CLAUDE.md as entry points

documentation follows the progressive disclosure standard:
- L0+L1 under 5000 tokens for efficient agent context loading
- L2 loaded on-demand via links when L1 detail is insufficient
- each L1 file under 200 lines, L2 files have no ceiling

key operational content includes container restart procedures,
port 3000 debugging, docker cp cache pitfalls, and the tar-based
sync method for extension code deployment.
BenWeekes pushed a commit that referenced this pull request Apr 7, 2026
moved to separate PR #2132 (docs/progressive-disclosure branch).
these repo-wide AI documentation files are a cross-cutting concern
independent of the deepgram tts extension.
Ubuntu added 2 commits April 8, 2026 12:42
L0:
- fix repo name TEN-Agent → ten-framework, extension count 93 → 90

L1 core:
- fix addon registration param addon_name → name
- fix base class abstract methods table (ASR missing 3 methods,
  HTTP TTS showed request_tts as abstract, WS TTS listed cancel_tts
  as abstract)
- fix main extension variant names (main_cascade_python, main_realtime_python,
  main_nodejs)
- add LOG_PATH to required env vars
- unify nuclear restart procedure across 01_setup and 05_workflows
- add TTS HTTP base class section to 06_interfaces
- fix manifest.json api.property nesting (add inner properties key)
- fix connection schema to show names array pattern
- replace bin/main with bin/worker everywhere

deep dives:
- fix nginx port 453 → 443 in deployment
- fix request_tts signature from generator to None-returning method
- fix HTTP TTS to show create_config/create_client pattern
- fix ten_runtime_python version 0.8 → 0.11
- fix startPropMap type, field mappings, and worker spawn command
- fix addon default properties route
- add missing graph nodes (main_control, message_collector, streamid_adapter)
- fix extension_group names to match real values
- clarify graph example as voice_assistant-style skeleton with internal routing
- treat property.json as optional in test configs
- add update_params() passthrough pattern to config examples
- add openai_tts2_python to ENABLE_SAMPLE_RATE=False list
- clarify test_long_duration_stream as excluded, not skipped
plutoless pushed a commit that referenced this pull request Apr 9, 2026
)

* feat: add deepgram tts extension with voice-assistant integration

add Deepgram TTS extension using WebSocket streaming API with Aura-2
voices. Wire into voice-assistant example as voice_assistant_deepgram_tts
graph variant. Include progressive disclosure AI documentation.

addresses PR #2128 review feedback:
- remove raw config logging that exposed API keys
- extract _finalize_request() helper (consolidate 5 duplicate patterns)
- await client.start() instead of fire-and-forget asyncio.create_task()
- add _reconnect_client() for immediate reconnect after errors
- consume EVENT_TTS_FLUSH internally (don't leak to caller)
- add early text validation in get() for empty/whitespace text
- reduce websocket recv timeout from 10s to 5s
- drop audio chunks received after cancellation flag is set
- reconnect websocket after cancel for clean state on next request
- change manifest.json sample_rate type from int64 to int32

test results:
- standalone: 13/13 passed
- guarder: 14/16 passed
  - test_flush: PASS (was failing — fixed cancel race condition)
  - test_invalid_text_handling: PASS (was skipped — fixed with text
    validation + timeout reduction)
  - test_interleaved_requests: FAIL — websocket state from previous
    request causes timeout on request 8/8. needs duplex websocket
    architecture (separate send/receive tasks) to fully resolve.
  - test_subtitle_alignment: FAIL — feature gap, deepgram tts api
    does not provide word-level timing data. config file not present.

* refactor: rewrite deepgram tts client with duplex websocket pattern

rewrite DeepgramTTSClient with separate send and receive async tasks
on a single websocket, matching the cartesia_tts architecture. this
replaces the serial send-then-receive pattern that caused state leaks
between interleaved requests.

key changes:
- _send_loop(): reads from _text_queue, sends Speak+Flush to WS
- _receive_loop(): reads from WS, puts events into _output_queue
- _connection_loop(): auto-reconnect with exponential backoff
- cancel drops audio in receive loop, Flushed always signals END
- update docs/ai gotchas with deployment lessons learned

test results unchanged at 14/16 guarder passed:
- test_interleaved_requests: still fails — request 8/8 gets timeout
  because output queue has stale END from cancelled request. needs
  per-request-id event routing (next iteration).
- test_subtitle_alignment: feature gap (no word-level timing)

* fix: reconnect websocket per request_id to fix interleaved requests

revert from duplex pattern to clean serial model with key improvement:
reconnect websocket when request_id changes. this prevents deepgram's
connection from going stale after many rapid Speak+Flush cycles.

cancel() now drains until Flushed before returning so the connection
is clean for subsequent requests. mark_needs_reconnect() called by
extension on request_id change triggers fresh connection.

test_interleaved_requests now passes (was timing out on request 8/8
because deepgram stopped responding on a long-lived connection).

* fix: remove reconnect-per-request, rely on cancel drain instead

reconnecting on every request_id change caused test_append_input_stress
to timeout (100 requests = 100 reconnections). the cancel() drain is
sufficient: it waits for Flushed before returning, keeping the
connection clean for the next request. reconnect only on error/timeout.

both test_interleaved_requests and test_append_input_stress now pass.

* refactor: align progressive disclosure docs with PD standard

L0 repo card:
- remove descriptive blockquote
- remove L2 section (L2 is reached via L1 links)
- rename Type to Repo Type, use enum value distributed-system
- add Description row to identity block
- rename L1 Summaries to L1 Index with Audience column
- update Last Reviewed to 2026-04-07

07_gotchas.md:
- cut from 236 to 117 lines (under 200-line ceiling)
- remove full restart recipe and operational runbook material
- keep actual gotchas: property tuples, signal handlers, task run,
  zombies, .env, next.js lock, tman wipe, graph cache, port 3000
- add pointer to new L2 deep dive

new L2 deep dive operations_restarts.md:
- full restart procedure with next-server kill
- zombie worker cleanup
- stale lock and cache cleanup
- port 3000 conflict debugging with /proc forensics
- .env and container restart recovery
- docker cp extension code workflow
- after-container-restart checklist

cross-links:
- add operations_restarts to deep_dives/_index.md
- add to related deep dives in 07_gotchas.md and 05_workflows.md
- trim 05_workflows.md docker cp section to pointer

* fix: address codex review — connect fail-fast and error handling

issue 1 (high): _connect() now always raises after calling the error
callback on 401. previously it returned control to the caller with
self._ws == None, causing a secondary AttributeError that masked the
real auth failure.

issue 2 (high): EVENT_TTS_ERROR on non-final chunks is logged as a
warning but not sent as a data event. sending error data for transient
partial-stream failures confuses the test harness and the base class
state machine. errors are only surfaced via _finalize_request() on
the final chunk (text_input_end=True), which is the correct contract.

open question: request state fields (current_request_id, sent_ts,
_audio_start_sent) are shared mutable state. however, the base class
AsyncTTS2BaseExtension serializes request_tts() calls — it does not
overlap them. this is confirmed by the interleaved_requests test
passing, which exercises rapid request_id switching.

* test: add state machine, recovery, and redaction tests

address codex review gaps vs cartesia_tts test coverage:

- test_sequential_requests: 3 requests with different IDs,
  validates request_id in audio_start and audio_end events
- test_reconnect_after_error: first request errors mid-stream,
  second request completes successfully (recovery)
- test_config_redacts_api_key: to_str(sensitive_handling=True)
  does not leak the API key
- test_client_empty_text_yields_end: unit test on client.get()
  for empty text — yields END immediately, no WS connection
- test_client_whitespace_text_yields_end: same for whitespace

standalone tests: 18/18 passed (was 13)

* fix: eliminate double error emission on auth failure, add targeted tests

remove error callbacks from DeepgramTTSClient._connect() — error
reporting is now solely the caller's responsibility. this eliminates
the double-report where _connect() called send_fatal_tts_error and
then raised, causing _handle_connection_error to send a second error.

consolidate error handlers to use _finalize_request() which emits
exactly one error via finish_request(error=...).

new tests:
- test_auth_error_single_emission: 401 produces exactly 1 error event
- test_nonfinal_error_not_surfaced: error on non-final chunk is logged
  but not sent as public data event (documented contract)

standalone tests: 20/20 passed

* docs: add tar sync method, cache cleanup, fix guarder test count

- operations_restarts.md: add tar-based container sync that excludes
  __pycache__ and .pytest_cache (recommended over docker cp). add
  cleanup command for stale cache artifacts in container.
- testing.md: fix TTS guarder count from 15 to 16. add container
  sync guidance before running tests.

* fix: address code review — 401 detection, dead code, dump writes

- use websockets.exceptions.InvalidStatus for typed 401 detection
  with string-match fallback for non-websockets exceptions
- remove dead send_fatal/non_fatal_tts_error methods (unused after
  client callback removal)
- remove redundant "LOG_CATEGORY_KEY_POINT: " log prefix
- await _write_dump() and _setup_recorder() directly instead of
  fire-and-forget asyncio.create_task (errors were silently lost)
- remove unused asyncio import
- remove duplicate pathlib import in test_basic.py

graph connections verified: voice_assistant_deepgram_tts has the same
3 connection blocks as the working voice_assistant graph. the
main_python extension handles LLM/TTS routing internally.

* fix: resolve pylint W1404 implicit string concatenation warnings

* fix: reconnect on server errors, break after finalize, cleanup

- set _needs_reconnect on Deepgram server-side Error messages,
  not just Python exceptions. a protocol-level error leaves the
  websocket in an unknown state.
- add break after _finalize_request() in empty-payload branch
  to stop processing after request is finalized.
- remove dead mark_needs_reconnect() method and test mock refs.
- replace inline 8.0 timeout with WS_RECV_TIMEOUT constant.

* fix: cancel finalization, exception cleanup, test bootstrap

- cancel_tts() now always calls _finalize_request() when
  current_request_id is set, regardless of sent_ts. prevents
  downstream consumers hanging when cancel arrives before first
  text is processed.
- simplify redundant except (asyncio.TimeoutError, Exception)
  to except Exception.
- move sys.path bootstrap to conftest.py, remove from all 6
  test files. license headers now appear first as per repo style.
- remove unused import (copy) from test_state_machine.py.

* chore: remove progressive disclosure docs from deepgram tts PR scope

moved to separate PR #2132 (docs/progressive-disclosure branch).
these repo-wide AI documentation files are a cross-cutting concern
independent of the deepgram tts extension.

* fix: move cancel flag reset to just before ws.send

clear _is_cancelled just before sending Speak+Flush, not at
method entry. prevents a concurrent cancel() from being lost
if it races with get() starting up.

* fix: remove dual finalization path, dead config code, simplify

- remove duplicate _finalize_request on empty EVENT_TTS_RESPONSE.
  rely solely on EVENT_TTS_END to close requests, avoiding risk
  of double-finalization.
- remove dead to_str() branch checking params['api_key'] after
  update_params() already deletes it.
- simplify _ensure_dict to only handle dict and fallback to empty.

* feat: add vendor params passthrough to deepgram websocket URL

forward additional deepgram query parameters from config.params through
to the websocket connection string. known keys (api_key, base_url, model,
encoding, sample_rate) are normalized onto the config object; remaining
scalar keys are appended to the query string via urlencode.

- replace f-string URL building with urlencode for correctness
- improve TTS_END logging to distinguish final vs intermediate events
- add test_params_passthrough unit test for URL construction
- bump version to 0.1.1

* fix: add clarifying comments for event constant gap and sent_ts overwrite

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-22-252.us-west-2.compute.internal>
Ubuntu added 7 commits April 9, 2026 07:38
Restore Levels section and full How to Load instructions per the
standard template. Add one-line description blockquote to L0.
…closure

All content now lives in docs/ai/ (L0/L1/L2). Remove duplicate
guidance and point to the single PD entrypoint.
Add three-subsection structure: commit messages (conventional commits
with types and scoped variants), branch names (type/short-description),
and general rules. Update CLAUDE.md reference text to match template.
Lowercase headings, remove backtick formatting on commands, add
pointer to progressive-disclosure-standard.md sections 6 and 7.
Replace bare tman install with task install to avoid wiping bin/worker.
Show path-based dependencies matching actual repo manifests. Add missing
ten wrapper to architecture sample graph.
QA regex expects config: prefix on config log messages.
Without it, new extensions fail vendor_config validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant