# Video Generation Experiments

---

##  Final System Prompt for `03_video_generation_experiments.ipynb`

###  Notebook Title:
**TinyTutor Capstone Notebook 03: Video Generation Experiments**

###  Objective:
Define and test the **visual media generation tools** for TinyTutor using ADK `FunctionTool` wrappers. These tools simulate the transformation of a child-friendly story script into engaging visual assets—images, videos, and coherent scenes—using structured inputs and mocked outputs. The notebook must demonstrate tool modularity, agent compatibility, and adherence to ADK and MCP best practices.

---

###  System Prompt:
> Generate runnable Python code for `03_video_generation_experiments.ipynb` that defines and tests TinyTutor’s visual media tools. Implement the following:
>
> 1. **Tool 1: `generate_hyper_image_imagen(prompt: str, style: str) -> dict`**
>     - Simulates image generation using Imagen 3 or Nano Banana.
>     - Returns a structured dictionary with an image URI (e.g., `{'status': 'success', 'image_uri': 'data/images/scene1.png'}`).
>     - Include a clear docstring describing its purpose and required parameters.
>
> 2. **Tool 2: `generate_smooth_video_lumiere(scene_description: str, duration_sec: int) -> dict`**
>     - Simulates video generation using Lumiere or Veo 3.
>     - Returns a structured dictionary with a video URI.
>     - Include a docstring emphasizing child-friendly output and duration constraints.
>
> 3. **Tool 3: `combine_scenes_whisk(artifact_id_list: List[str]) -> dict`**
>     - Simulates scene composition using Whisk.
>     - Accepts a list of image/video URIs and returns a single cohesive scene URI.
>     - Include a docstring describing its role in merging visual assets.
>
> 4. **Tool Wrapping**:
>     - Wrap all three functions as ADK `FunctionTool` objects.
>     - Ensure type hints, structured outputs, and discoverability by agents.
>
> 5. **Test Agent (`VisualTestAgent`)**:
>     - Create an `LlmAgent` equipped with all three tools.
>     - Run a mock sequence:
>         - Generate an image from a scene prompt
>         - Generate a video from a description
>         - Merge them into a final scene
>     - Confirm successful tool invocation and output parsing.
>
> 6. **Best Practices**:
>     - Avoid raw media data in outputs
>     - Use concise artifact URIs
>     - Log tool parameters and agent decisions
>     - Include inline comments and Markdown to explain tool design and Capstone relevance

---

##  Final Checklist for `03_video_generation_experiments.ipynb`

| **Category**         | **Requirement**                                                                                                                                       | **Source/Justification**                                                                 |
|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| **Core Concept**      | Extending **Multimodal Action** capabilities (“Hands”)                                                                                               | Capstone Level 2–3 architecture                                                           |
| **Goal**              | Define and wrap custom tools for image, video, and scene composition                                                                                 | Enables visual storytelling for TinyTutor                                                 |
| **Dependencies**      | Requires scene descriptions and style inputs from Notebook 01 (`final_script`)                                                                      | Demonstrates tool chaining and multimodal readiness                                       |
| **Required Tools**    | - `generate_hyper_image_imagen(prompt, style)` <br> - `generate_smooth_video_lumiere(scene_description, duration_sec)` <br> - `combine_scenes_whisk(artifact_id_list)` | Simulates Imagen, Lumiere, and Whisk workflows                                           |
| **Tool Design**       | - Clear docstrings <br> - Type hints <br> - Structured outputs (URIs only)                                                                           | Aligns with ADK and MCP best practices                                                    |
| **Output Format**     | - Each tool returns: `{'status': 'success', 'artifact_uri': '...'}`                                                                                  | Prevents context bloat and ensures clarity                                                |
| **Agent Execution**   | - `VisualTestAgent` must call all three tools <br> - Confirm tool invocation and output parsing                                                      | Validates tool integration and agent orchestration                                        |
| **Architecture**      | - FunctionTool wrappers <br> - LlmAgent with tool access                                                                                             | Mirrors production-ready ADK design                                                       |
| **Good Practices**    | - Avoid raw media data <br> - Use concise artifact references <br> - Log tool usage                                                                 | Ensures scalability and traceability                                                      |
| **Documentation**     | - Inline comments <br> - Markdown explanations                                                                                                       | Supports Capstone reviewers and future collaborators                                      |

---

###  What We’ll Have When This Code Is Done

-  Three production-ready visual media tools: image generation, video synthesis, and scene composition
-  ADK-compliant FunctionTool wrappers with clear docstrings and structured outputs
-  A test agent that demonstrates tool invocation and output parsing
-  Simulated visual artifacts for downstream use
-  Clear documentation and inline logic to support Capstone delivery and debugging

---

#  TinyTutor Capstone Notebook 03: Video Generation Experiments

This notebook defines and tests the visual media tools for TinyTutor using mock classes that simulate ADK `FunctionTool` behavior. It introduces:
- An image generation tool (simulating Imagen 3 or Nano Banana)
- A video synthesis tool (simulating Lumiere or Veo 3)
- A scene composition tool (simulating Whisk)
These tools are modular, agent-compatible, and return structured artifact references for downstream use.

In [1]:
from typing import Callable, Dict, Any, List

class FunctionTool:
    def __init__(self, name: str, function: Callable, description: str = ""):
        self.name = name
        self.function = function
        self.description = description

    def call(self, *args, **kwargs):
        return self.function(*args, **kwargs)

class LlmAgent:
    def __init__(self, name: str, system_instruction: str, tools: List[FunctionTool] = None):
        self.name = name
        self.system_instruction = system_instruction
        self.tools = tools or []

    def run(self, input_text: str) -> Dict[str, Any]:
        print(f"\n[{self.name}] Instruction: {self.system_instruction}")
        print(f"[{self.name}] Input: {input_text}")
        results = {}
        for tool in self.tools:
            if tool.name == "generate_hyper_image_imagen":
                results["image"] = tool.call(prompt=input_text, style="cartoon")
            elif tool.name == "generate_smooth_video_lumiere":
                results["video"] = tool.call(scene_description=input_text, duration_sec=10)
            elif tool.name == "combine_scenes_whisk":
                results["scene"] = tool.call(artifact_id_list=["img_001", "vid_001"])
        return results

##  Step 1: Define the Image Generation Tool

This tool simulates generating a child-friendly image from a scene prompt and style.

In [2]:
def generate_hyper_image_imagen(prompt: str, style: str) -> Dict[str, str]:
    """
    Simulates generating an image from a prompt and style.

    Args:
        prompt (str): Scene description.
        style (str): Visual style (e.g., 'cartoon').

    Returns:
        dict: Simulated image URI.
    """
    return {
        "status": "success",
        "image_uri": "data/images/scene1.png",
        "style": style
    }

image_tool = FunctionTool(
    name="generate_hyper_image_imagen",
    function=generate_hyper_image_imagen,
    description="Generates a child-friendly image from a scene prompt."
)

##  Step 2: Define the Video Generation Tool

This tool simulates generating a short video from a scene description and duration.

In [3]:
def generate_smooth_video_lumiere(scene_description: str, duration_sec: int) -> Dict[str, str]:
    """
    Simulates generating a video from a scene description.

    Args:
        scene_description (str): Description of the scene.
        duration_sec (int): Duration of the video in seconds.

    Returns:
        dict: Simulated video URI.
    """
    return {
        "status": "success",
        "video_uri": "data/videos/scene1.mp4",
        "duration": duration_sec
    }

video_tool = FunctionTool(
    name="generate_smooth_video_lumiere",
    function=generate_smooth_video_lumiere,
    description="Generates a short video from a scene description."
)

##  Step 3: Define the Scene Composition Tool

This tool simulates merging multiple visual artifacts into a single cohesive scene.

In [4]:
def combine_scenes_whisk(artifact_id_list: List[str]) -> Dict[str, str]:
    """
    Simulates combining multiple visual artifacts into a single scene.

    Args:
        artifact_id_list (List[str]): List of image/video IDs.

    Returns:
        dict: Simulated scene URI.
    """
    return {
        "status": "success",
        "scene_uri": "data/scenes/final_scene.mp4",
        "combined_ids": artifact_id_list
    }

scene_tool = FunctionTool(
    name="combine_scenes_whisk",
    function=combine_scenes_whisk,
    description="Combines visual artifacts into a single cohesive scene."
)

##  Step 4: Define the VisualTestAgent

This agent is equipped with all three visual tools and can generate images, videos, and composed scenes.

In [5]:
visual_agent = LlmAgent(
    name="VisualTestAgent",
    system_instruction="Use tools to generate visuals from a scene prompt.",
    tools=[image_tool, video_tool, scene_tool]
)

##  Step 5: Run Tool Tests

We’ll now simulate:
- Generating an image from a scene prompt
- Generating a video from the same prompt
- Merging them into a final scene

In [6]:
scene_prompt = "A friendly space pirate waving hello on a colorful planet."

results = visual_agent.run(scene_prompt)

print("\n Image Output:\n", results.get("image"))
print("\n Video Output:\n", results.get("video"))
print("\n Scene Composition Output:\n", results.get("scene"))


[VisualTestAgent] Instruction: Use tools to generate visuals from a scene prompt.
[VisualTestAgent] Input: A friendly space pirate waving hello on a colorful planet.

 Image Output:
 {'status': 'success', 'image_uri': 'data/images/scene1.png', 'style': 'cartoon'}

 Video Output:
 {'status': 'success', 'video_uri': 'data/videos/scene1.mp4', 'duration': 10}

 Scene Composition Output:
 {'status': 'success', 'scene_uri': 'data/scenes/final_scene.mp4', 'combined_ids': ['img_001', 'vid_001']}
