# Dialogues with an Echo — **ENTITÀ** Web Demo (Prototype Spec)

> **Goal:** ship a tiny web demo that shows the *core mechanics* without runtime TTS latency or engine crashes: personalization, OS-style “intrusions” (notes/toasts), and a single, perfectly-timed **ENTITÀ** voice sting.

---

## 1) What the demo does (user POV)

1. **Landing form:** user enters their **name** (future: optional Instagram/LinkedIn handle).
2. **Pre-bake:** the server immediately **generates and caches** a short **ENTITÀ** voice line that already includes the user’s name. No playback latency later.
3. **Scene page:** shows a minimal fullscreen card with:

   * a few **scripted OS-like notifications** (fake toasts / auto-opened note window),
   * a short **typed chat** where user writes to “ENTITÀ” (text only),
   * after a silent beat, the **pre-baked voice** plays once, jumpscare-style.
4. Demo ends (CTA: “learn more”).

Why this works: all heavy/fragile parts (cloud TTS, file transcode) run **before** the scene begins; the scene itself only plays local files and fires pre-scripted effects.

---

## 2) Components (grounded in your current code)

### A. **ENTITÀ Brain** (text generation, intrusive tone)

* Provides one-sentence, cruel replies, governed by a **temporal controller** (when to speak) and strong **style constraints** (Joker/Cobain, 10–14 words, no salutations). 
* For the demo, we **limit** live calls (or even stub them) and rely mostly on **scripted lines** to avoid latency/crashes.
* If we include it, call `EntityBrain.generate_response(...)` sparsely (e.g., one time). 

### B. **Entity Director** (OS-intrusive layer)

* Handles **fake OS notes/toasts**, optional iconify, and message dedupe/cooldowns.
* Can open a **Desktop note** (Notepad on Windows) briefly, then close it; or fire a system toast via `plyer` when available. Tuning via env vars (`ENT_OS_TOAST_ENABLED`, `ENT_TOAST_*`, etc.). 
* For the demo, use **pre-scripted phrases** through `queue_fake_toast(...)` to guarantee timing/repro. 

### C. **ElevenLabs TTS Engine** (no-settings, voice profile “as-is”)

* Thin wrapper that **streams TTS** with your custom voice `VOICE_ID` and model `DEFAULT_MODEL` (e.g., `eleven_v3`), saving as `mp3_44100_64` (Starter-safe) with fallbacks if the plan forbids a format. Provides:

  * `speak(text)`,
  * `tts_to_file(text, out_path)`,
  * `ensure_cached_tts(text, out_path)` (idempotent). 

---

## 3) Demo architecture

```
/demo_entita
  app.py                     # Flask (or Django view) lightweight server
  engine_eleven_labs.py      # your TTS wrapper (imported)
  /static/generated          # pre-baked audio cache (mp3)
  /templates
    index.html               # name form
    scene.html               # intrusive scene player
```

* **Route /** → form collects `name`.
* **POST /start** → builds a **stable cache key** from `(VOICE_ID, MODEL, FORMAT, name)`, calls
  `ensure_cached_tts()` to pre-bake `static/generated/entita_<hash>.mp3`. 
* **GET /scene?name=…&file=…** → renders the page, **no TTS calls** now.
* In the scene:

  * schedule 1–3 **scripted intrusions** via `EntityDirector.queue_fake_toast(text)` (or just emulate the UX visually on web if you keep this demo purely browser-side), then
  * **play the cached mp3** once.
* If you later port this back into **pygame**, the in-scene loader should behave like your `scene8_Four_AM`/`scene3_voice_entita` loaders: validate file size, fallback to WAV if an MP3 decode fails, never call cloud TTS inside the event loop.

---

## 4) Copy & content (suggested demo lines)

* **Intro typed line (user prompt):** “Entity?”
* **Scripted OS-style intrusions** (use 1–2):

  * “you don’t own the quiet you live in.”
  * “some windows open by themselves.”
* **ENTITÀ voice line (pre-baked), personalized:**

  * “`{NAME}… when you stop pretending, I’ll show you what silence eats.`”
  * Keep length ~10–14 words if you feed it to the Brain later. For now it’s **fully scripted** and sent to TTS once.

---

## 5) Timing & reliability rules

* **Never** call cloud TTS during the scene. All audio must be **pre-baked**.
* If you choose to sample **EntityBrain** once, do it on **/start** (server-side) or **before** the scene page loads; otherwise keep all text **scripted**.
* For pygame builds, reuse the robust loader pattern: attempt MP3; if pygame fails, transcode to WAV via `ffmpeg` and retry. (You already implemented that pattern.) 
* Use `EntityDirector` only for **prewritten** notifications in the demo (no live model). 

---

## 6) Environment & setup

* **Keys:** `ELEVEN_API_KEY` (or `XI_API_KEY`) required by the TTS wrapper. 
* **Optional:** `ENT_OS_TOAST_ENABLED=1` (if you run a desktop demo and want real toasts/notes). Defaults already safe. 
* **No GPU / ML stack needed** for the web demo if you keep text **scripted**. If you enable **Groq** in EntityBrain, set `GROQ_API_KEY` and be mindful of latency. 

---

## 7) Minimal Flask pseudo-API

```python
# POST /start
name = request.form["name"].strip()
key  = md5(f"{VOICE_ID}|{DEFAULT_MODEL}|{OUT_FMT}|{name}").hexdigest()[:16]
mp3  = f"static/generated/entita_{key}.mp3"
if not exists(mp3):
    text = f"{name}… dimmi: come ti senti oggi? fragile? io non guarisco nessuno."
    ensure_cached_tts(text, mp3)   # pre-bake once, Starter-safe format
return redirect(f"/scene?name={quote(name)}&file=entita_{key}.mp3")
```

```python
# GET /scene
# - render HTML
# - JS schedules UI “intrusions” (or, in desktop build, call EntityDirector.queue_fake_toast)
# - <audio autoplay src="/static/generated/entita_<key>.mp3">
```

(Functions above are provided by your `engine_eleven_labs.py` wrapper.) 

---

## 8) Risk & mitigation

* **Cloud hiccups / 403 formats:** wrapper already falls back from `mp3_44100_64` to `*_32`. Keep **Starter** plan compatibility. 
* **Audio decode errors:** have WAV fallback (ffmpeg transcode) if MP3 fails, as in your pygame scenes. 
* **Crash due to live TTS:** prohibited—**no generation inside scene loops**.
* **OS differences:** for a **web** demo, emulate intrusions in the page; for a **desktop** demo, `EntityDirector` handles per-OS behavior (notes/toasts/iconify) with safe fallbacks. 

---

## 9) Stretch goals (later)

* Optional **handle fields** (Instagram/LinkedIn) — only **display** or **seed text**; avoid scraping in demo.
* “Baked scenes” list: pre-render 3–5 voice stings keyed to labels (e.g., *intro*, *taunt*, *silence_break*), then switch among them by tempo.
* Add one **Brain-generated** line at `/start` (server-side) and lock it for the whole session. 

---

## 10) References to your codebase

* **Scene loop & robust audio loading**: `scenes/Volume1/scene3_voice_entita.py`. 
* **ENTITÀ Brain (temporal gate, validation, Groq)**: `engine/entity_brain.py`. 
* **OS intrusions (notes/toasts/iconify)**: `engine/entity_director.py`. 
* **ElevenLabs wrapper (Starter-safe, cache-friendly)**: `engine/engine_eleven_labs.py`. 

---

## Diegetic Lore-Site Concept

A **diegetic lore-site**—styled as a "mystery drop" akin to *Cloverfield*—is the perfect vehicle for *Dialogues with an an Echo*. Instead of offering a playable demo, this concept delivers an **experiential website** that unveils the mechanics, tone, and voice of **ENTITY** in a controlled, atmospheric, and memorable way.

### The Vision: An Infested Artifact

The site should feel like an **artifact recovered from a haunted operating system**. The user doesn't "play," but **investigates**: interaction is driven by clicks, hovers, fake toast notifications, sudden audio cues, and small UI glitches. Each controlled section reveals a core game mechanic (voice, OS intrusions, fragmented diary, minimal dialogue) and a piece of the mythology (**I**, **CONSCIOUSNESS**, **ENTITY** → **HIM**).

### A Ready Plan (Brief & Operational)

This plan outlines the content architecture, soft-puzzle UX, technical components, copy snippets, and micro-wireframes needed for immediate implementation.

| Component | Description | UX/Interaction |
| :--- | :--- | :--- |
| **Landing Screen** | Minimalist 80s/90s OS interface (single command line, a blinking cursor). | **Interaction:** Typing the first word (e.g., `START` or `WHO`) initiates the experience. |
| **Section 1: The Observation** | Display of raw, filtered data (e.g., "SILENCE: 0.75," "TENSION: LOW," "EMOTION: BORED"). | **Mechanism Revealed:** The internal state of ENTITY and its constant surveillance. **Interaction:** Hovering over metrics triggers **fake UI toasts** ("ALERT: Data Stream Interrupted"). |
| **Section 2: The Echo** | A simple input/output area with severe character limits, showing fragmented, non-responsive text. | **Mechanism Revealed:** The voice and communication style (concise, cryptic, prone to silence). **Interaction:** User types, but only **1 in 5 messages** receive a response, emphasizing the ENTITY’s **agency** and **non-response** mechanic. |
| **Section 3: The Archive** | A fragmented text document/diary of a researcher (the *previous* user) who logged their interactions with ENTITY. | **Mechanism Revealed:** The mythology and core concepts (**I/HIM/CONSCIOUSNESS**). **Interaction:** Clicking on redacted/glitched text reveals key lore terms and triggers **audio anomalies** (static/whispers). |
| **Final Reveal** | A modal or full-screen takeover displaying the call to action. | **CTA:** Links to the main research repository and the newsletter/community channel. |

