## 1. Install Required Libraries
This cell installs the three main SDKs we’ll need:

OpenAI → for story text generation.

Google Generative AI (Gemini 2.5 Flash) → for generating and editing fairy-tale images.

ElevenLabs → for converting generated text into speech.

We also print out their versions to confirm successful installation.

In [1]:
# 📦 Install core AI libraries
!pip install -q --upgrade openai google-generativeai elevenlabs

# ✅ Check installed versions
import openai, google.generativeai as genai, elevenlabs
print("OpenAI version:", openai.__version__)
print("Google Generative AI version:", genai.__version__)
print("ElevenLabs version:", elevenlabs.__version__)


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/948.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m948.4/948.4 kB[0m [31m40.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/955.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m955.6/955.6 kB[0m [31m57.5 MB/s[0m eta [36m0:00:00[0m
[?25hOpenAI version: 1.109.0
Google Generative AI version: 0.8.5
ElevenLabs version: 2.16.0


## 2. Enter Input Topic

This cell allows you to define the topic of the fairy tale.
The topic will later be passed to OpenAI to generate a story, to Gemini for image creation, and to ElevenLabs for narration.

In [2]:
# 🎭 Input Cell: Define the fairy tale topic
# You can change the text below to any theme or idea you want.
# Example: "A little dragon who wants to learn how to sing."

topic = input("✨ Enter the topic for your fairy tale: ")

print(f"✅ Topic set: {topic}")


✨ Enter the topic for your fairy tale: An epyc world of dragons and fairies
✅ Topic set: An epyc world of dragons and fairies


## 3. Enhance the Topic with OpenAI (GPT-4o-mini)

This cell uses OpenAI’s GPT-4o-mini to expand and enrich the topic.
The goal is to transform a short idea into a detailed fairy tale concept (characters, setting, mood, possible conflicts) — without yet writing the story itself.

In [3]:
# 🧠 Enhance Topic with OpenAI GPT-4o-mini
# Expands the topic into a richer concept (characters, setting, mood)
# but does NOT create the full story yet.

from openai import OpenAI
import os
from google.colab import userdata # Import userdata

# Initialize OpenAI client (make sure OPENAI_API_KEY is set in Colab's secrets)
# Use userdata.get to access the secret
client = OpenAI(api_key=userdata.get("OPENAI_API_KEY"))

prompt = f"""
You are an assistant helping to create a fairy tale.
Take this topic: "{topic}" and expand it into a detailed fairy tale concept.
Include:
- Main characters and their traits also physical traits for each character as well.
- Setting (world, time, magical elements)
- Central theme or conflict
- Tone/mood of the story

Do NOT write the story yet. Just develop the concept.
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant for creating fairy tales."},
        {"role": "user", "content": prompt}
    ],
    max_tokens=900,
    temperature=0.7
)

enhanced_topic = response.choices[0].message.content
print("✨ Enhanced Fairy Tale Concept:\n")
print(enhanced_topic)

✨ Enhanced Fairy Tale Concept:

**Fairy Tale Concept: "The Celestial Alliance"**

### Main Characters:

1. **Elysia the Fairy**
   - **Traits:** Elysia is spirited, courageous, and resourceful. She possesses an innate curiosity about the world around her and is fiercely protective of her friends. Though she sometimes acts impulsively, her heart is always in the right place.
   - **Physical Traits:** Elysia has delicate, shimmering wings that resemble the night sky, dotted with tiny, twinkling stars. She has long, flowing hair that changes color with her emotions, and her eyes are a piercing blue, reminiscent of the clearest summer skies.

2. **Drakon the Dragon**
   - **Traits:** Drakon is wise, noble, and somewhat aloof. He carries the weight of his dragon lineage, often feeling the pressure to uphold the traditions of his kind. Despite his intimidating appearance, he has a gentle soul and a deep love for the natural world.
   - **Physical Traits:** Drakon is large and majestic, with 

## 4. Extract Characters with Structured Outputs & Save JSON File

This cell takes the enhanced fairy tale concept and uses OpenAI GPT-4o-mini with structured outputs to reliably extract the main characters.
For each character, it generates:

char_name → the character’s name

prompt → a ready-to-use image generation prompt starting with “Generate me image of…”

The result is guaranteed to be valid JSON (thanks to a schema) and is saved as a file (characters.json).
This makes it easy to reuse the characters later when generating their illustrations with Gemini.

What it does step-by-step:


1.   Defines a strict JSON Schema with the required fields (char_name, prompt).
2.   Sends the enhanced concept to GPT-4o-mini and enforces the schema.
3.   Prints the structured results in Colab.
4.   Saves the characters and prompts into a file characters.json.








In [4]:
# 🧩 Structured JSON Extraction of Character Prompts (Strict Schema, root object)

import json
import os
from openai import OpenAI

# Reuse existing client if defined; otherwise init here
client = globals().get("client") or OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

schema = {
    "type": "object",
    "additionalProperties": False,
    "properties": {
        "characters": {
            "type": "array",
            "items": {
                "type": "object",
                "additionalProperties": False,
                "properties": {
                    "char_name": {"type": "string", "minLength": 1},
                    "prompt": {
                        "type": "string",
                        "minLength": 10,
                        "pattern": r"^Generate me image of"
                    }
                },
                "required": ["char_name", "prompt"]
            }
        }
    },
    "required": ["characters"]
}

instruction = f"""
From the following fairy tale concept, extract the DISTINCT main characters.
For EACH character, produce:
- char_name: the character’s name
- prompt: start EXACTLY with "Generate me image of" and then give a concise, vivid,
  single-image description combining physical traits, attire, notable props, age,
  mood, and setting hints. Avoid backstory and plot; no camera jargon; 1–2 sentences.

Constraints:
- Output ONLY a JSON object with a single key "characters" whose value is the array.
- No other keys or text.
- No duplicate characters. No empty or generic entries.
- Keep each prompt specific enough to illustrate a single portrait/full-body image.
- Do NOT invent new characters unless the concept clearly implies them.

Fairy Tale Concept:
{enhanced_topic}
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You return only JSON that strictly matches the provided schema."},
        {"role": "user", "content": instruction}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "character_image_prompts",
            "strict": True,
            "schema": schema
        }
    },
    temperature=0.4,
    max_tokens=900,
)


raw_json = response.choices[0].message.content
data = json.loads(raw_json)

# Extract array for convenience
characters_data = data["characters"]

# Pretty print
print("✅ Extracted Character Prompts (JSON):\n")
print(json.dumps(characters_data, indent=2))

# Save to file
json_file = "characters.json"
with open(json_file, "w") as f:
    json.dump(characters_data, f, indent=2)

print(f"\n💾 Saved character prompts to {json_file}")


✅ Extracted Character Prompts (JSON):

[
  {
    "char_name": "Elysia the Fairy",
    "prompt": "Generate me image of a spirited fairy with delicate, shimmering wings resembling the night sky, long flowing hair that changes color with her emotions, and piercing blue eyes. She wears a vibrant dress made of flower petals, standing in a lush forest filled with sparkling lights."
  },
  {
    "char_name": "Drakon the Dragon",
    "prompt": "Generate me image of a majestic dragon with emerald green and gold shimmering scales, deep amber eyes radiating wisdom, and powerful wings spread wide. He stands atop a rocky cliff overlooking a vibrant landscape, exuding a noble yet gentle demeanor."
  },
  {
    "char_name": "Queen Seraphelle the Fairy Monarch",
    "prompt": "Generate me image of a regal fairy queen with radiant golden wings that catch the light, luxurious silver hair adorned with enchanted flowers, and a gown that flows like liquid light. She stands gracefully in a sunlit glade, emb

## 5. Generate the Fairy Tale & Save to TXT

Creates a complete fairy tale based solely on enhanced_topic, using GPT-4o-mini.
Saves the result as a .txt file so you can reuse it for narration and video later.

In [5]:
# 📖 Generate Full Fairy Tale from the Enhanced Concept and Save to TXT

import os
from openai import OpenAI
from datetime import datetime

# Reuse existing OpenAI client or initialize a new one
client = globals().get("client") or OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Safety checks
if "enhanced_topic" not in globals() or not enhanced_topic.strip():
    raise ValueError("enhanced_topic is empty or missing. Please run the topic enhancement cell first.")

system_msg = (
    "You are a master fairy-tale writer. Write a vivid, emotionally engaging fairy tale "
    "appropriate for a wide audience (children-friendly but enjoyable for adults). "
    "Use clear structure (beginning, middle, resolution), rich sensory details, "
    "and natural dialogue. Avoid violence and horror. Keep it timeless."
)

user_prompt = f"""
Write a complete fairy tale based ONLY on this concept (do not reprint the concept):

--- ENHANCED CONCEPT START ---
{enhanced_topic}
--- ENHANCED CONCEPT END ---

Requirements:
- 500–700 words.
- Clear 3-act structure (setup, challenge, resolution).
- Gentle moral/theme emerging naturally from events.
- Suitable for high-quality TTS narration (short to medium sentences, varied rhythm).
- No meta commentary. Do NOT include outlines or bullet points — prose only.

Should be as a plain text, no special characters or formatting.
"""

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_msg},
        {"role": "user", "content": user_prompt}
    ],
    temperature=0.8,
    max_tokens=2000
)

story = resp.choices[0].message.content.strip()

# Save to file with timestamped name for versioning
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
story_filename = f"fairy_tale_story_{timestamp}.txt"
with open(story_filename, "w", encoding="utf-8") as f:
    f.write(story)

print(f"✅ Story generated and saved to: {story_filename}\n")
print("📝 Preview (first 600 chars):\n")
print(story[:600] + ("..." if len(story) > 600 else ""))


✅ Story generated and saved to: fairy_tale_story_20250923_190424.txt

📝 Preview (first 600 chars):

In the radiant land of Aetheria, where vibrant forests danced under shimmering auroras, the air was thick with magic and possibilities. Among the fluttering fairies, one stood out—Elysia. With wings that sparkled like the night sky and hair that changed colors with her emotions, she was a beacon of curiosity and courage. Elysia lived in a glen adorned with flowers that whispered secrets of the past, always yearning for adventure beyond the familiar.

One serene morning, as the sun spilled golden light over the treetops, Elysia felt an unusual stir in the air. "What if today holds something ext...


#

In [None]:
# 🖼️ Robust character image generation with retries (Gemini 2.5 Flash preview)

import os, json, re, base64, time
from io import BytesIO
from PIL import Image
import google.generativeai as genai

# 1) Configure Gemini
try:
    from google.colab import userdata
    GOOGLE_API_KEY = userdata.get("GOOGLE_API_KEY") or os.getenv("GOOGLE_API_KEY")
except Exception:
    GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
assert GOOGLE_API_KEY, "Set GOOGLE_API_KEY"
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel("gemini-2.5-flash-image-preview")

# 2) Load characters
with open("characters.json", "r", encoding="utf-8") as f:
    characters = json.load(f)

# 3) Output folder
os.makedirs("characters_images", exist_ok=True)

# 4) Helper for filenames
def fname(s: str) -> str:
    import re
    s = re.sub(r"\s+", "_", s.strip().lower())
    return re.sub(r"[^a-z0-9_\-]", "", s) or "character"

# 5) Small helper to call the API with retries
def generate_image_bytes(prompt: str, max_retries: int = 3, base_delay: float = 1.0):
    for attempt in range(1, max_retries + 1):
        try:
            res = model.generate_content(
                prompt,
                generation_config=genai.types.GenerationConfig(
                    temperature=0.6,
                    candidate_count=1,
                ),
            )
            parts = getattr(res, "parts", None) or res.candidates[0].content.parts
            for p in parts:
                inline = getattr(p, "inline_data", None)
                if inline and getattr(inline, "mime_type", "").startswith("image/"):
                    return inline.data if isinstance(inline.data, (bytes, bytearray)) else base64.b64decode(inline.data)
            # If no image part, raise to trigger retry
            raise RuntimeError("No image returned in response.")
        except Exception as e:
            if attempt == max_retries:
                raise
            sleep_for = base_delay * (2 ** (attempt - 1))
            print(f"   -> Attempt {attempt} failed: {e}. Retrying in {sleep_for:.1f}s...")
            time.sleep(sleep_for)

# 6) Iterate characters and generate images
manifest = []
for i, ch in enumerate(characters, 1):
    name = ch.get("char_name", f"Character_{i}")
    prompt = ch.get("prompt", "")
    if not prompt:
        print(f"[{i}/{len(characters)}] {name} -> skipped (empty prompt)")
        continue

    print(f"[{i}/{len(characters)}] {name}")
    try:
        img_bytes = generate_image_bytes(prompt)
        img = Image.open(BytesIO(img_bytes))
        path = os.path.join("characters_images", f"{i:02d}_{fname(name)}.png")
        img.save(path)
        print(f"  -> Saved {path}")
        manifest.append({"char_name": name, "prompt": prompt, "image_path": path})
    except Exception as e:
        print(f"  -> Failed for {name}: {e}")

# 7) Save manifest
with open(os.path.join("characters_images", "manifest.json"), "w", encoding="utf-8") as f:
    json.dump(manifest, f, ensure_ascii=False, indent=2)
print("📒 Manifest saved to characters_images/manifest.json")


## 7. Extract 10 Key Scenes (Structured JSON) & Save

This cell reads the full story, uses GPT-4o-mini with a strict JSON schema to produce exactly 10 scenes in order, and saves them to scenes.json. Each scene includes:

characters_involved — list of character names (strings)

scene_prompt — concise image prompt starting with “Generate me image of …”

In [7]:
# 🎬 Extract 10 Key Scenes from Story (Strict JSON) and Save

import os, json
from openai import OpenAI

# Reuse existing client or init
client = globals().get("client") or OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Load the latest story file if variable 'story' is present; else raise
if "story" not in globals() or not story.strip():
    raise ValueError("Missing 'story'. Please run the story generation cell first.")

# Optional: if you want to align names with characters.json for consistency
names_whitelist = set()
if os.path.exists("characters.json"):
    try:
        with open("characters.json", "r", encoding="utf-8") as f:
            for c in json.load(f):
                if isinstance(c, dict) and "char_name" in c and c["char_name"]:
                    names_whitelist.add(c["char_name"])
    except Exception:
        pass  # If parsing fails, we'll let the model infer names from the story

schema = {
    "type": "object",
    "additionalProperties": False,
    "properties": {
        "scenes": {
            "type": "array",
            "minItems": 10,
            "maxItems": 10,
            "items": {
                "type": "object",
                "additionalProperties": False,
                "properties": {
                    "characters_involved": {
                        "type": "array",
                        "items": {"type": "string", "minLength": 1},
                        "minItems": 1
                    },
                    "scene_prompt": {
                        "type": "string",
                        "minLength": 20,
                        "pattern": r"^Generate me image of"
                    }
                },
                "required": ["characters_involved", "scene_prompt"]
            }
        }
    },
    "required": ["scenes"]
}

# Build instruction
whitelist_hint = ""
if names_whitelist:
    wl = ", ".join(sorted(names_whitelist))
    whitelist_hint = f"""
When listing characters_involved, prefer matching these known names exactly when applicable:
[{wl}]
"""

instruction = f"""
Read the fairy tale below and extract EXACTLY 10 key scenes in strict chronological order.
For EACH scene return:
- characters_involved: list of character names who appear materially in that moment
- scene_prompt:  MUST start EXACTLY with "Generate me image of" and then describe a single vivid scene:
  setting, time of day, mood, visible actions, and essential visual details. Avoid camera jargon.

Constraints:
- Keep prompts concise (1–2 sentences) but visually rich.
- No spoilers or meta commentary.
- Do NOT include dialogue lines.
- Focus on distinct moments (no duplicates across scenes).

{whitelist_hint}

FAIRY TALE:
{story}
"""

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You output only JSON that matches the provided schema; be concise and precise."},
        {"role": "user", "content": instruction}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "scene_list",
            "strict": True,
            "schema": schema
        }
    },
    temperature=0.5,
    max_tokens=1600
)

raw = resp.choices[0].message.content
data = json.loads(raw)
scenes = data["scenes"]

# Pretty print
print("✅ Extracted Scenes (10):\n")
print(json.dumps(scenes, ensure_ascii=False, indent=2))

# Save to file
with open("scenes.json", "w", encoding="utf-8") as f:
    json.dump(scenes, f, ensure_ascii=False, indent=2)
print("\n💾 Saved to scenes.json")


✅ Extracted Scenes (10):

[
  {
    "characters_involved": [
      "Elysia the Fairy"
    ],
    "scene_prompt": "Generate me image of Elysia soaring through a bright blue sky, her iridescent wings shimmering in the golden morning light, surrounded by vibrant flowers below, embodying a sense of adventure and curiosity."
  },
  {
    "characters_involved": [
      "Drakon the Dragon"
    ],
    "scene_prompt": "Generate me image of Drakon perched majestically atop a rocky mountain, his emerald scales glistening in the sunlight, looking out over the enchanted landscape with a thoughtful expression, as a soft breeze rustles the clouds."
  },
  {
    "characters_involved": [
      "Elysia the Fairy",
      "Drakon the Dragon"
    ],
    "scene_prompt": "Generate me image of Elysia standing courageously on the mountainside, her delicate frame illuminated by sunlight, as Drakon's massive shadow looms overhead, creating a contrast between fear and bravery."
  },
  {
    "characters_involved":

##

## 8a — Match Scenes to Character Images (Build refs list per scene)
This cell loads scenes.json and characters_images/manifest.json, matches the characters_involved in each scene to their reference image paths, and outputs an array like:
[{ "chars_images": [<paths>], "scene_prompt": "<prompt>" }, ... ].
It also saves the result to scenes_with_refs.json for the next step.

In [None]:
# 🔗 Match scenes to character reference images
# Input:  scenes.json  +  characters_images/manifest.json
# Output: scenes_with_refs.json -> [{ "chars_images": [...], "scene_prompt": "..." }, ...]

import os, json

SCENES_PATH = "scenes.json"
MANIFEST_PATH = os.path.join("characters_images", "manifest.json")
OUT_PATH = "scenes_with_refs.json"

# --- Load inputs ---
assert os.path.exists(SCENES_PATH), "scenes.json not found — run the scene extraction cell."
assert os.path.exists(MANIFEST_PATH), "characters_images/manifest.json not found — run the character image generation cell."

with open(SCENES_PATH, "r", encoding="utf-8") as f:
    scenes = json.load(f)

with open(MANIFEST_PATH, "r", encoding="utf-8") as f:
    manifest = json.load(f)

# --- Build name -> image path map (case-insensitive) ---
name_to_path = {}
for entry in manifest:
    n = str(entry.get("char_name", "")).strip()
    p = entry.get("image_path")
    if n and p and os.path.exists(p):
        name_to_path[n] = p

def find_ref_paths(names):
    """Return unique image paths for provided character names, with best-effort matching."""
    paths = []
    for raw in names or []:
        query = str(raw).strip()
        # exact
        if query in name_to_path:
            paths.append(name_to_path[query]); continue
        # case-insensitive exact
        lower_hit = next((name_to_path[k] for k in name_to_path if k.lower() == query.lower()), None)
        if lower_hit:
            paths.append(lower_hit); continue
        # relaxed contains (Eliot the Brave -> Eliot)
        contains_hit = next(
            (name_to_path[k] for k in name_to_path
             if query.lower() in k.lower() or k.lower() in query.lower()),
            None
        )
        if contains_hit:
            paths.append(contains_hit)
        else:
            print(f"⚠️  No reference image found for character: '{raw}'")
    # dedupe preserve order
    seen, uniq = set(), []
    for p in paths:
        if p not in seen:
            uniq.append(p); seen.add(p)
    return uniq

# --- Build result structure for all scenes ---
scenes_with_refs = []
for sc in scenes:
    chars = sc.get("characters_involved", [])
    prompt = sc.get("scene_prompt", "")
    refs = find_ref_paths(chars)
    scenes_with_refs.append({
        "chars_images": refs,
        "scene_prompt": prompt
    })

# --- Save and preview ---
with open(OUT_PATH, "w", encoding="utf-8") as f:
    json.dump(scenes_with_refs, f, ensure_ascii=False, indent=2)

print(f"✅ Built {len(scenes_with_refs)} scene entries with references.")
print(f"💾 Saved to: {OUT_PATH}\n")
print(json.dumps(scenes_with_refs[:3], ensure_ascii=False, indent=2))  # preview first 3


✅ Built 10 scene entries with references.
💾 Saved to: scenes_with_refs.json

[
  {
    "chars_images": [
      "characters_images/01_eliot.png",
      "characters_images/02_rusty.png"
    ],
    "scene_prompt": "Generate me image of a sunny morning in the vibrant village of Starhaven, where Eliot, a curious boy with tousled hair and bright blue eyes, stands beside Rusty, a small humanoid robot made of scrap metal, as they prepare for an adventure amidst blooming flowers."
  },
  {
    "chars_images": [
      "characters_images/01_eliot.png",
      "characters_images/02_rusty.png"
    ],
    "scene_prompt": "Generate me image of Eliot and Rusty walking through the Enchanted Forest, surrounded by towering trees that whisper secrets, with Eliot's expression showing excitement mixed with doubt as the air thickens with magic."
  },
  {
    "chars_images": [
      "characters_images/01_eliot.png",
      "characters_images/02_rusty.png",
      "characters_images/04_the_cloudsmith.png"
    ],


## 8b — Generate Scene Images from scenes_with_refs.json (Gemini 2.5 Flash, with refs)
This cell loads the prepared list of scenes (each with scene_prompt and chars_images), uploads the character images as reference inputs, and asks Gemini 2.5 Flash (preview) to compose the final scene image. Results are saved to scenes_images/ with a simple manifest.

In [None]:
# 🎬 Generate scene images using prompts + character reference images (Gemini 2.5 Flash Preview)

import os, json, base64, time, re
from io import BytesIO
from PIL import Image
import google.generativeai as genai

model = genai.GenerativeModel("gemini-2.5-flash-image-preview")

# --- 2) Load scenes_with_refs.json ---
IN_PATH = "scenes_with_refs.json"
assert os.path.exists(IN_PATH), "scenes_with_refs.json not found — run Cell 8a first."

with open(IN_PATH, "r", encoding="utf-8") as f:
    scenes_with_refs = json.load(f)

assert isinstance(scenes_with_refs, list) and scenes_with_refs, "Input must be a non-empty list."

# --- 3) Output folder ---
OUT_DIR = "scenes_images"
os.makedirs(OUT_DIR, exist_ok=True)

# --- 4) Upload cache (avoid re-uploading same files repeatedly) ---
_upload_cache = {}
def upload_once(path: str):
    if path in _upload_cache:
        return _upload_cache[path]
    f = genai.upload_file(path)
    _upload_cache[path] = f
    return f

def safe_name(s: str) -> str:
    s = re.sub(r"\s+", "_", s.strip().lower())
    return re.sub(r"[^a-z0-9_\-]", "", s) or "scene"

# --- 5) Core: generate a scene image with retries & inline image extraction ---
def generate_scene_image(prompt: str, ref_paths, max_retries=3, base_delay=1.0):
    parts = [prompt]
    for p in (ref_paths or []):
        if not os.path.exists(p):
            print(f"   ⚠️ ref not found, skipping: {p}")
            continue
        try:
            parts.append(upload_once(p))
        except Exception as e:
            print(f"   ⚠️ upload failed for {p}: {e}")

    for attempt in range(1, max_retries + 1):
        try:
            res = model.generate_content(
                parts,
                generation_config=genai.types.GenerationConfig(
                    temperature=0.45,   # faithful to references
                    candidate_count=1
                ),
            )
            # Extract first inline image
            content = res.candidates[0].content
            for part in content.parts:
                inline = getattr(part, "inline_data", None)
                if inline and getattr(inline, "mime_type", "").startswith("image/"):
                    data = inline.data if isinstance(inline.data, (bytes, bytearray)) else base64.b64decode(inline.data)
                    return Image.open(BytesIO(data))
            raise RuntimeError("No image returned in response.")
        except Exception as e:
            if attempt == max_retries:
                raise
            delay = base_delay * (2 ** (attempt - 1))
            print(f"   -> Attempt {attempt} failed: {e}. Retrying in {delay:.1f}s...")
            time.sleep(delay)

# --- 6) Iterate and generate all scenes ---
scene_manifest = []
for i, entry in enumerate(scenes_with_refs, 1):
    prompt = entry.get("scene_prompt", "")
    refs = entry.get("chars_images", []) or []

    if not prompt:
        print(f"[{i}/{len(scenes_with_refs)}] ❗ Missing scene_prompt — skipping.")
        continue

    print(f"[{i}/{len(scenes_with_refs)}] Generating scene (refs: {len(refs)})")
    try:
        img = generate_scene_image(prompt, refs)
        fname = f"{i:02d}_{safe_name(prompt[:50])}.png"
        out_path = os.path.join(OUT_DIR, fname)
        img.save(out_path)
        print(f"   ✅ Saved: {out_path}")
        scene_manifest.append({
            "index": i,
            "scene_prompt": prompt,
            "image_path": out_path,
            "reference_images": [p for p in refs if os.path.exists(p)]
        })
    except Exception as e:
        print(f"   ❌ Failed to generate scene {i}: {e}")

# --- 7) Save manifest ---
MANIFEST_PATH = os.path.join(OUT_DIR, "manifest.json")
with open(MANIFEST_PATH, "w", encoding="utf-8") as f:
    json.dump(scene_manifest, f, ensure_ascii=False, indent=2)

print(f"\n📒 Scene manifest saved: {MANIFEST_PATH}")
print(f"🗂️  Output folder: {os.path.abspath(OUT_DIR)}")


[1/10] Generating scene (refs: 2)
   ✅ Saved: scenes_images/01_generate_me_image_of_a_sunny_morning_in_the_vibran.png
[2/10] Generating scene (refs: 2)
   -> Attempt 1 failed: No image returned in response.. Retrying in 1.0s...
   -> Attempt 2 failed: No image returned in response.. Retrying in 2.0s...
   ❌ Failed to generate scene 2: No image returned in response.
[3/10] Generating scene (refs: 3)
   ✅ Saved: scenes_images/03_generate_me_image_of_a_clearing_in_the_enchanted_f.png
[4/10] Generating scene (refs: 2)
   ✅ Saved: scenes_images/04_generate_me_image_of_eliot_standing_before_the_clo.png
[5/10] Generating scene (refs: 2)
   ✅ Saved: scenes_images/05_generate_me_image_of_the_cloudsmith_waving_his_sta.png
[6/10] Generating scene (refs: 2)
   ✅ Saved: scenes_images/06_generate_me_image_of_eliot_closing_his_eyes_with_d.png
[7/10] Generating scene (refs: 2)
   -> Attempt 1 failed: No image returned in response.. Retrying in 1.0s...
   -> Attempt 2 failed: No image returned in respo

In [None]:
# 🔊 Text-to-Speech: Convert the fairy tale into an MP3 with ElevenLabs
# - Loads `story` from memory or the newest saved story file.
# - Splits long text into chunks safe for TTS.
# - Synthesizes each chunk and merges into story_tts.mp3.
# - Plays a short preview inline.

import os, re, glob, io
from datetime import datetime
from IPython.display import Audio, display

# 1) Load story text (from variable or newest file)
def load_story_text():
    if "story" in globals() and isinstance(story, str) and story.strip():
        return story.strip()
    files = sorted(glob.glob("fairy_tale_story_*.txt"))
    if not files:
        raise FileNotFoundError("No story text found. Generate the story first (Cell 5).")
    with open(files[-1], "r", encoding="utf-8") as f:
        return f.read().strip()

text = load_story_text()

# 2) Chunking (sentence-aware, ~1800 chars per chunk)
def split_into_chunks(s, max_len=1800):
    sentences = re.split(r'(?<=[\.\!\?])\s+', s)
    chunks, buf = [], ""
    for sent in sentences:
        if len(buf) + len(sent) + 1 <= max_len:
            buf = (buf + " " + sent).strip()
        else:
            if buf: chunks.append(buf)
            buf = sent
    if buf: chunks.append(buf)
    return chunks

chunks = split_into_chunks(text, max_len=1800)
print(f"🧩 Story length: {len(text):,} chars -> {len(chunks)} chunk(s)")

# 3) ElevenLabs client (API key from Colab userdata or env)
try:
    from google.colab import userdata
    ELEVEN_API_KEY = userdata.get("ELEVENLABS_API_KEY") or os.getenv("ELEVENLABS_API_KEY")
except Exception:
    ELEVEN_API_KEY = os.getenv("ELEVENLABS_API_KEY")

if not ELEVEN_API_KEY:
    raise RuntimeError("Missing ELEVENLABS_API_KEY. Add it in Colab (userdata) or environment.")

from elevenlabs.client import ElevenLabs
client = ElevenLabs(api_key=ELEVEN_API_KEY)

# 4) Choose voice (use existing VOICE_ID if defined, else a stable default demo voice)
VOICE_ID = globals().get("VOICE_ID") or "JBFqnCBsd6RMkjVDRZzb"  # Rachel (demo) — change to your favorite
VOICE_SETTINGS = {
    "stability": 0.55,
    "similarity_boost": 0.9,
    "style": 0.35,
    "use_speaker_boost": True
}

# 5) Generate audio for each chunk and merge using pydub
!pip -q install pydub
from pydub import AudioSegment

out_dir = "tts_output"
os.makedirs(out_dir, exist_ok=True)
chunk_paths = []

for i, chunk in enumerate(chunks, 1):
    print(f"🎙️  Synthesizing chunk {i}/{len(chunks)}...")
    stream = client.text_to_speech.convert(
        voice_id=VOICE_ID,
        model_id="eleven_multilingual_v2",
        output_format="mp3_44100_128",
        text=chunk,
        voice_settings=VOICE_SETTINGS
    )
    audio_bytes = b"".join(stream)
    path = os.path.join(out_dir, f"chunk_{i:02d}.mp3")
    with open(path, "wb") as f:
        f.write(audio_bytes)
    chunk_paths.append(path)

# Merge all chunks
final_name = f"story_tts_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp3"
final_path = os.path.join(out_dir, final_name)

merged = None
for idx, p in enumerate(chunk_paths, 1):
    seg = AudioSegment.from_file(p, format="mp3")
    merged = seg if merged is None else merged + seg

if merged is None:
    raise RuntimeError("No audio produced.")
merged.export(final_path, format="mp3", bitrate="128k")
print(f"✅ Final TTS saved: {final_path}")

# Inline preview (first ~20 seconds)
preview = merged[:20_000]
preview_path = os.path.join(out_dir, "preview_20s.mp3")
preview.export(preview_path, format="mp3", bitrate="128k")
display(Audio(preview_path))


In [None]:
# 🔊 Merge ElevenLabs TTS chunks into one narration MP3

import glob, os
from pydub import AudioSegment
from datetime import datetime

# Collect all chunk files
chunks = sorted(glob.glob("tts_output/chunk_*.mp3"))
if not chunks:
    raise FileNotFoundError("No TTS chunks found in tts_output/. Run the TTS cell first.")

print(f"🧩 Found {len(chunks)} audio chunks")

# Merge them in order
merged = None
for c in chunks:
    seg = AudioSegment.from_file(c, format="mp3")
    merged = seg if merged is None else merged + seg

# Save final file
final_name = f"story_narration_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp3"
final_path = os.path.join("tts_output", final_name)
merged.export(final_path, format="mp3", bitrate="128k")

print(f"✅ Final narration saved: {final_path}")


🧩 Found 2 audio chunks
✅ Final narration saved: tts_output/story_narration_20250903_175723.mp3


In [None]:
# 🎬 Simple and Reliable Slideshow Creator using FFmpeg

import os
import subprocess
import wave
import contextlib
import glob
import json
from datetime import datetime

print("🎬 Creating video slideshow using FFmpeg...")
print("=" * 60)

# ---- A) Find or build narration audio ----
def find_or_merge_audio():
    # Look for existing audio files
    candidates = (
        sorted(glob.glob("tts_output/story_tts_*.mp3")) +
        sorted(glob.glob("tts_output/story_narration_*.mp3")) +
        sorted(glob.glob("story_tts_*.mp3")) +
        sorted(glob.glob("story_narration_*.mp3"))
    )
    if candidates:
        return candidates[-1]
    if os.path.exists("story_narration.mp3"):
        return "story_narration.mp3"

    # Try to merge chunks if available
    chunks = sorted(glob.glob("tts_output/chunk_*.mp3"))
    if chunks:
        print(f"🔊 Found {len(chunks)} chunk(s). Merging...")
        try:
            from pydub import AudioSegment
        except ImportError:
            subprocess.run(["pip", "install", "pydub"], check=True)
            from pydub import AudioSegment

        merged = None
        for chunk in chunks:
            segment = AudioSegment.from_file(chunk, format="mp3")
            merged = segment if merged is None else merged + segment

        output_path = f"story_narration_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp3"
        merged.export(output_path, format="mp3", bitrate="128k")
        print(f"✅ Merged audio: {output_path}")
        return output_path

    raise FileNotFoundError("No narration audio found")

# ---- B) Find story images ----
def find_story_images():
    image_paths = []

    # Try manifest first
    manifest_path = os.path.join("scenes_images", "manifest.json")
    if os.path.exists(manifest_path):
        with open(manifest_path, "r", encoding="utf-8") as f:
            manifest = json.load(f)
        for item in manifest:
            path = item.get("image_path")
            if path and os.path.exists(path):
                image_paths.append(path)

    # Fallback to directory scan
    if not image_paths:
        for ext in ["*.png", "*.jpg", "*.jpeg", "*.bmp", "*.gif"]:
            image_paths.extend(sorted(glob.glob(os.path.join("scenes_images", ext))))
            if image_paths:
                break

    if not image_paths:
        raise FileNotFoundError("No story images found in 'scenes_images/' directory")

    return image_paths

# Get audio and images
try:
    audio_file = find_or_merge_audio()
    print(f"🎵 Using audio: {audio_file}")

    story_images = find_story_images()
    print(f"🖼️ Found {len(story_images)} story images")

except FileNotFoundError as e:
    print(f"❌ {e}")
    exit(1)

print("✅ Prerequisites check passed")
print("-" * 60)

# ---- C) Install FFmpeg if needed ----
try:
    result = subprocess.run(['ffmpeg', '-version'], capture_output=True, text=True)
    print("✅ FFmpeg is available")
except FileNotFoundError:
    print("📦 Installing FFmpeg...")
    subprocess.run(['apt', 'update', '-qq'], check=True)
    subprocess.run(['apt', 'install', '-y', 'ffmpeg'], check=True)
    print("✅ FFmpeg installed")

print("-" * 60)

try:
    # ---- D) Get audio duration ----
    # Convert mp3 to wav to read duration (more reliable)
    subprocess.run(['ffmpeg', '-i', audio_file, '-y', 'temp_audio.wav'],
                   capture_output=True, check=True)

    with contextlib.closing(wave.open('temp_audio.wav','r')) as f:
        frames = f.getnframes()
        rate = f.getframerate()
        duration = frames / float(rate)

    print(f"🎵 Audio duration: {duration:.2f} seconds")
    print(f"🖼️ Number of images: {len(story_images)}")

    # Calculate duration per image
    duration_per_image = duration / len(story_images)
    print(f"⏱️ Duration per image: {duration_per_image:.2f} seconds")
    print("-" * 40)

    # ---- E) Save images with numbered names for FFmpeg ----
    from PIL import Image

    image_files = []
    for i, img_path in enumerate(story_images):
        filename = f"slide_{i:03d}.png"

        # Load and save image (ensure consistent format)
        with Image.open(img_path) as img:
            img = img.convert("RGB")
            img.save(filename, "PNG")

        image_files.append(filename)
        print(f"💾 Saved: {filename}")

    print("-" * 40)
    print("🎞️ Creating video with FFmpeg...")

    # ---- F) Create video using FFmpeg ----
    output_filename = f"ai_story_slideshow_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp4"

    ffmpeg_cmd = [
        'ffmpeg',
        '-y',  # Overwrite output file
        '-framerate', f'1/{duration_per_image}',  # Frame rate (1 frame per duration)
        '-i', 'slide_%03d.png',  # Input image pattern
        '-i', audio_file,  # Input audio
        '-c:v', 'libx264',  # Video codec
        '-c:a', 'aac',  # Audio codec
        '-pix_fmt', 'yuv420p',  # Pixel format for compatibility
        '-shortest',  # Stop when shortest input ends
        output_filename  # Output file
    ]

    print(f"🔧 Running: ffmpeg with {len(image_files)} slides...")
    # Run FFmpeg command
    result = subprocess.run(ffmpeg_cmd, capture_output=True, text=True)

    if result.returncode == 0:
        print("✅ Video created successfully!")

        # Get file info
        file_size_mb = os.path.getsize(output_filename) / (1024*1024)

        print("=" * 60)
        print("🎉 VIDEO SLIDESHOW COMPLETE!")
        print(f"📁 Filename: {output_filename}")
        print(f"📊 File size: {file_size_mb:.1f} MB")
        print(f"⏱️ Duration: {duration:.1f} seconds")
        print(f"🖼️ Images: {len(story_images)} scenes")

        # Clean up temporary files
        for img_file in image_files:
            try:
                os.remove(img_file)
            except:
                pass
        try:
            os.remove('temp_audio.wav')
        except:
            pass

        # Check if we're in Jupyter for display
        try:
            from IPython.display import HTML, display
            print("\n🎬 Video Preview:")
            display(HTML(f'''
            <video width="600" controls style="max-width: 100%; border: 2px solid #ddd; border-radius: 8px;">
                <source src="{output_filename}" type="video/mp4">
                <p>Your browser doesn't support video playback.</p>
            </video>
            '''))
        except ImportError:
            print(f"\n🎥 Video saved as: {output_filename}")

        print("\n🎊 AI MULTIMEDIA STORYTELLING SUCCESS!")
        print("🚀 Your AI-generated slideshow is ready!")

    else:
        print("❌ FFmpeg failed. Error output:")
        print(result.stderr)
        print("\nCommand used:")
        print(" ".join(ffmpeg_cmd))
        raise Exception("Video creation failed")

except Exception as e:
    print(f"❌ Error creating video: {str(e)}")
    print("\n💡 Check that:")
    print("  - Audio file exists and is valid MP3")
    print("  - Image files exist in scenes_images/ directory")
    print("  - FFmpeg is properly installed")

🎬 Creating video slideshow using FFmpeg...
🎵 Using audio: tts_output/story_narration_20250903_175723.mp3
🖼️ Found 8 story images
✅ Prerequisites check passed
------------------------------------------------------------
✅ FFmpeg is available
------------------------------------------------------------
🎵 Audio duration: 225.72 seconds
🖼️ Number of images: 8
⏱️ Duration per image: 28.22 seconds
----------------------------------------
💾 Saved: slide_000.png
💾 Saved: slide_001.png
💾 Saved: slide_002.png
💾 Saved: slide_003.png
💾 Saved: slide_004.png
💾 Saved: slide_005.png
💾 Saved: slide_006.png
💾 Saved: slide_007.png
----------------------------------------
🎞️ Creating video with FFmpeg...
🔧 Running: ffmpeg with 8 slides...
✅ Video created successfully!
🎉 VIDEO SLIDESHOW COMPLETE!
📁 Filename: ai_story_slideshow_20250903_182724.mp4
📊 File size: 3.6 MB
⏱️ Duration: 225.7 seconds
🖼️ Images: 8 scenes

🎬 Video Preview:



🎊 AI MULTIMEDIA STORYTELLING SUCCESS!
🚀 Your AI-generated slideshow is ready!
