# Animated Story Video Generator

This notebook generates an animated story video using Google Generative AI for both images and narration. Each scene is created based on a user-provided theme, with AI-generated visuals and audio, and then assembled into a video.

## 1. Install Required Packages
Install all necessary Python packages. This cell only needs to be run once per environment.

In [None]:
%pip install google-genai google-generativeai moviepy Pillow nest_asyncio

## 2. Import Libraries
Import all required libraries for data processing, image and audio handling, and video creation.

In [None]:
import os
import json
import numpy as np
from io import BytesIO
from PIL import Image
from IPython.display import display, HTML
from moviepy.editor import ImageClip, AudioFileClip, CompositeVideoClip, concatenate_videoclips
import time
from base64 import b64encode
import nest_asyncio
nest_asyncio.apply()
import asyncio
import contextlib
import wave
import google.generativeai as genai

## 3. Set Up Google API Key
Set your Google API key as an environment variable. Replace `'YOUR_GOOGLE_API_KEY'` with your actual key or use a secure method to load it.

In [None]:
os.environ['GOOGLE_API_KEY'] = 'YOUR_GOOGLE_API_KEY'  # TODO: Replace with your actual API key or use a secure method

## 4. Initialize Google Generative AI Client
Set up the client and model IDs for text and image generation.

In [None]:
client = genai.Client(http_options={'api_version': 'v1alpha'})
MODEL = "models/gemini-2.0-flash-exp"
IMAGE_MODEL_ID = "imagen-3.0-generate-002"

## 5. Define Story Generation Function
This function generates a sequence of story scenes, each with an image prompt, narration, and character description.

In [None]:
from typing import List, Dict, Tuple

def generate_story_sequence(complete_story: str, pages: int) -> Tuple[List[Dict], int]:
    """
    Generate a story sequence using Google Generative AI.
    Returns a list of scene dictionaries and the number of pages.
    """
    response = client.models.generate_content(
        model=MODEL,
        contents=f'''you are an animation video producer. Generate a story sequence about {complete_story} in {pages} scenes (with interactions and characters), 1 sec each scene. Write:\n\nimage_prompt:(define art style for kids animation(consistent for all the characters)) a full description of the scene, the characters in it, and the background in 20 words or less. Progressively shift the scene as the story advances.\naudio_text: a one-sentence dialogue/narration for the scene.\ncharacter_description: no people ever, only animals and objects. Describe all characters (consistent names, features, clothing, etc.) with an art style reference (e.g., "Pixar style," "photorealistic," "Ghibli", "Anime," "Digital Art," "Comic Book," "Disney style,") in 30 words or less.''',
        config={
            'response_mime_type': 'application/json',
            'response_schema': list
        }
    )
    try:
        story_data_list = json.loads(response.text)
        if isinstance(story_data_list, list) and len(story_data_list) > 0:
            story_data = story_data_list[0]
            return story_data.get('complete_story', []), story_data.get('pages', pages)
        else:
            return [], pages
    except (KeyError, TypeError, IndexError, json.JSONDecodeError) as e:
        print(f"Error in parsing the story data : {e}")
        return [], pages

## 6. Generate Story Segments
Set your story theme and number of scenes, then generate the story sequence.

In [None]:
theme = "Jerry steals a giant cheese, and Tom goes on a wild chase across a bustling city."
num_scenes = 10

story_segments, _ = generate_story_sequence(theme, num_scenes)
print(json.dumps(story_segments, indent=2))

## 7. Helper Functions for Audio Generation
Define a context manager for writing WAV files and a function to generate audio narration for each scene.

In [None]:
@contextlib.contextmanager
def wave_file(filename, channels=1, rate=24000, sample_width=2):
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(sample_width)
        wf.setframerate(rate)
        yield wf

def generate_audio_live(api_text, output_filename):
    """
    Generate audio narration for a given text using Google Generative AI Live API.
    """
    import asyncio
    collected_audio = bytearray()

    async def _generate():
        config = {
            "response_modalities": ["AUDIO"]
        }
        async with client.aio.live.connect(model=MODEL, config=config) as session:
            await session.send(input=api_text, end_of_turn=True)
            async for response in session.receive():
                if response.data:
                    collected_audio.extend(response.data)
        return bytes(collected_audio)

    audio_bytes = asyncio.run(_generate())
    with wave_file(output_filename) as wf:
        wf.writeframes(audio_bytes)
    return output_filename

## 8. Generate Images, Audio, and Assemble Video
For each scene, generate an image and audio, then assemble them into a video clip. All clips are concatenated into the final video.

In [None]:
temp_audio_files = []  # Track temporary audio files
temp_image_files = []  # Track temporary image files
video_clips = []       # Store video clips for each scene

audio_negative_prompt = "don't say OK , I will do this or that, just only read this story using voice expressions without introductions or ending ,more segments are coming ,don't say OK , I will do this or that:\n"

for i, segment in enumerate(story_segments):
    image_prompt = segment['image_prompt']
    audio_text = audio_negative_prompt + segment['audio_text']
    char_desc = segment['character_description']
    print(f"Processing scene {i+1}:")
    print("Image Prompt:", image_prompt)
    print("Audio Text:", segment['audio_text'])
    print("Character Description:", char_desc)
    print("--------------------------------")

    # Generate image using Google Imagen
    combined_prompt = "detailed children book animation style " + image_prompt + " " + char_desc
    result = client.models.generate_images(
        model=IMAGE_MODEL_ID,
        prompt=combined_prompt,
        config={
            "number_of_images": 1,
            "output_mime_type": "image/jpeg",
            "person_generation": "DONT_ALLOW",
            "aspect_ratio": "1:1"
        }
    )
    try:
        if not result.generated_images:
            raise ValueError("No images were generated. The prompt might have been flagged as harmful. Please modify your prompt and try again.")
        for generated_image in result.generated_images:
            image = Image.open(BytesIO(generated_image.image.image_bytes))
    except Exception as e:
        print("Image generation failed ", e)
        continue
    image_path = f"image_{i}.png"
    image.save(image_path)
    temp_image_files.append(image_path)
    display(image)

    # Generate audio narration
    audio_path = f"audio_{i}.wav"
    audio_path = generate_audio_live(audio_text, audio_path)
    temp_audio_files.append(audio_path)

    # Create video clip (image + audio)
    audio_clip = AudioFileClip(audio_path)
    np_image = np.array(image)
    image_clip = ImageClip(np_image).set_duration(audio_clip.duration)
    composite_clip = CompositeVideoClip([image_clip]).set_audio(audio_clip)
    video_clips.append(composite_clip)

## 9. Concatenate and Display Final Video
Combine all video clips into a single video, display it in the notebook, and clean up temporary files.

In [None]:
final_video = concatenate_videoclips(video_clips)
output_filename = f"{int(time.time())}_output_video.mp4"
print("Writing final video to", output_filename)
final_video.write_videofile(output_filename, fps=24)

def show_video(video_path):
    """Display video in notebook"""
    video_file = open(video_path, "rb")
    video_bytes = video_file.read()
    video_b64 = b64encode(video_bytes).decode()
    video_tag = f'<video width="640" height="480" controls><source src="data:video/mp4;base64,{video_b64}" type="video/mp4"></video>'
    return HTML(video_tag)

display(show_video(output_filename))

# Cleanup: Close video clips and remove temporary files
final_video.close()
for clip in video_clips:
    clip.close()
for file in temp_audio_files:
    os.remove(file)
for file in temp_image_files:
    os.remove(file)

---

**A video player will appear above after successful execution.**

---