# Audio Generation Experiments

---

##  Final System Prompt for `02_audio_generation_experiments.ipynb`

###  Notebook Title:
**TinyTutor Capstone Notebook 02: Audio Generation Experiments**

###  Objective:
Define and test the **Text-to-Speech (TTS)** and **Speech-to-Text (STT)** tools as production-ready ADK `FunctionTool` wrappers. These tools simulate TinyTutor’s multimodal capability to convert a child-friendly story script into audio, and optionally transcribe user speech into text. The notebook must demonstrate tool reliability, clarity, and adherence to ADK best practices.

---

###  System Prompt:
> Generate runnable Python code for `02_audio_generation_experiments.ipynb` that defines and tests TinyTutor’s audio tools. Implement the following:
>
> 1. **TTS Tool (`generate_child_voiceover`)**:
>     - Define a Python function `generate_child_voiceover(script: str, voice_profile: str) -> dict`
>     - Include a detailed docstring describing its purpose: converting a child-friendly story script into a lesson audio file using a specified voice profile.
>     - Simulate a successful output by returning a structured dictionary (e.g., `{'status': 'success', 'audio_uri': 'data/audio/lesson.wav'}`).
>     - Wrap the function as an ADK `FunctionTool`.
>
> 2. **STT Tool (`transcribe_user_audio`)**:
>     - Define a Python function `transcribe_user_audio(file_path: str) -> str`
>     - Include a docstring describing its purpose: transcribing user speech into text.
>     - Simulate output by returning a mock transcription string.
>     - Wrap the function as an ADK `FunctionTool`.
>
> 3. **Test Agent (`AudioTestAgent`)**:
>     - Create an `LlmAgent` equipped with both tools.
>     - Run a test prompt that requires the agent to:
>         - Transcribe a sample audio file
>         - Convert a sample story script into audio
>     - Confirm that the agent successfully calls both tools and parses their outputs.
>
> 4. **Best Practices**:
>     - Use type hints and structured return formats
>     - Avoid returning raw audio data
>     - Log tool usage and parameters
>     - Include inline comments and Markdown to explain tool design and Capstone relevance

---

##  Final Checklist for `02_audio_generation_experiments.ipynb`

| **Category**         | **Requirement**                                                                                                                                       | **Source/Justification**                                                                 |
|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| **Core Concept**      | Implementing **Action Execution** capabilities (“Hands”)                                                                                             | Capstone Level 2 architecture                                                             |
| **Goal**              | Define and wrap custom **TTS** and **STT** tools for multimodal interaction                                                                          | Enables audio input/output for TinyTutor                                                  |
| **Dependencies**      | Requires `final_script` from Notebook 01 and simulated audio input                                                                                   | Demonstrates tool chaining and multimodal readiness                                       |
| **Required Tools**    | - `generate_child_voiceover(script, voice_profile)` <br> - `transcribe_user_audio(file_path)` <br> - ADK `FunctionTool` wrappers                     | Ensures modularity and agent discoverability                                              |
| **Tool Design**       | - Clear docstrings <br> - Type hints <br> - Structured outputs (URI or transcription string)                                                         | Aligns with ADK and MCP best practices                                                    |
| **Output Format**     | - TTS returns: `{'status': 'success', 'audio_uri': '...'}` <br> - STT returns: `'transcribed text'`                                                  | Prevents context bloat and ensures clarity                                                |
| **Agent Execution**   | - `AudioTestAgent` must call both tools <br> - Confirm tool invocation and output parsing                                                            | Validates tool integration and agent orchestration                                        |
| **Architecture**      | - FunctionTool wrappers <br> - LlmAgent with tool access                                                                                             | Mirrors production-ready ADK design                                                       |
| **Good Practices**    | - Avoid raw audio data <br> - Use concise artifact references <br> - Log tool usage                                                                 | Ensures scalability and traceability                                                      |
| **Documentation**     | - Inline comments <br> - Markdown explanations                                                                                                       | Supports Capstone reviewers and future collaborators                                      |

---

###  What We’ll Have When This Code Is Done

-  Two production-ready audio tools: one for TTS, one for STT
-  ADK-compliant FunctionTool wrappers with clear docstrings and structured outputs
-  A test agent that demonstrates tool invocation and output parsing
-  Simulated audio and transcription artifacts for downstream use
-  Clear documentation and inline logic to support Capstone delivery and debugging

---


#  TinyTutor Capstone Notebook 02: Audio Generation Experiments

This notebook defines and tests the audio tools for TinyTutor using mock classes that simulate ADK `FunctionTool` behavior. It introduces:
- A Text-to-Speech (TTS) tool that converts a story script into a child-friendly audio reference
- A Speech-to-Text (STT) tool that transcribes user audio input
These tools are modular, agent-compatible, and follow best practices for structured output and clarity.

In [1]:
from typing import Callable, Dict, Any, List

class FunctionTool:
    def __init__(self, name: str, function: Callable, description: str = ""):
        self.name = name
        self.function = function
        self.description = description

    def call(self, *args, **kwargs):
        return self.function(*args, **kwargs)

class LlmAgent:
    def __init__(self, name: str, system_instruction: str, tools: List[FunctionTool] = None):
        self.name = name
        self.system_instruction = system_instruction
        self.tools = tools or []

    def run(self, input_text: str) -> Dict[str, Any]:
        print(f"\n[{self.name}] Instruction: {self.system_instruction}")
        print(f"[{self.name}] Input: {input_text}")
        for tool in self.tools:
            if tool.name == "generate_child_voiceover":
                result = tool.call(script=input_text, voice_profile="child_friendly")
                return {"audio_result": result}
            elif tool.name == "transcribe_user_audio":
                result = tool.call(file_path="dummy_audio.wav")
                return {"transcription": result}
        return {"output": "No tool invoked."}

##  Step 1: Define the Text-to-Speech Tool

This tool simulates converting a story script into a child-friendly audio file. It returns a structured dictionary referencing the audio URI.

In [2]:
def generate_child_voiceover(script: str, voice_profile: str) -> Dict[str, str]:
    """
    Converts a story script into a simulated child-friendly audio file.

    Args:
        script (str): The story script to convert.
        voice_profile (str): The voice style to use (e.g., 'child_friendly').

    Returns:
        dict: Simulated audio URI reference.
    """
    return {
        "status": "success",
        "audio_uri": "data/audio/lesson_voiceover.wav",
        "voice_profile": voice_profile
    }

tts_tool = FunctionTool(
    name="generate_child_voiceover",
    function=generate_child_voiceover,
    description="Converts a story script into a child-friendly audio file."
)

##  Step 2: Define the Speech-to-Text Tool

This tool simulates transcribing user audio input into text. It returns a mock transcription string.

In [3]:
def transcribe_user_audio(file_path: str) -> str:
    """
    Simulates transcribing an audio file into text.

    Args:
        file_path (str): Path to the audio file.

    Returns:
        str: Transcribed text.
    """
    return "Simulated transcription: What is gravity?"

stt_tool = FunctionTool(
    name="transcribe_user_audio",
    function=transcribe_user_audio,
    description="Transcribes user audio input into text."
)

##  Step 3: Define the AudioTestAgent

This agent is equipped with both TTS and STT tools. It can convert a script into audio or transcribe user speech.

In [4]:
audio_agent = LlmAgent(
    name="AudioTestAgent",
    system_instruction="Use tools to convert text to audio or transcribe audio to text.",
    tools=[tts_tool, stt_tool]
)

##  Step 4: Run Tool Tests

We’ll now simulate:
- Converting a story script into audio
- Transcribing a user audio file

In [5]:
# Simulate TTS
script_text = "Once upon a time, a curious child met a friendly black hole..."
tts_result = audio_agent.run(script_text)
print("\n TTS Output:\n", tts_result.get("audio_result"))

# Simulate STT
stt_result = audio_agent.run("transcribe audio")
print("\n STT Output:\n", stt_result.get("transcription"))


[AudioTestAgent] Instruction: Use tools to convert text to audio or transcribe audio to text.
[AudioTestAgent] Input: Once upon a time, a curious child met a friendly black hole...

 TTS Output:
 {'status': 'success', 'audio_uri': 'data/audio/lesson_voiceover.wav', 'voice_profile': 'child_friendly'}

[AudioTestAgent] Instruction: Use tools to convert text to audio or transcribe audio to text.
[AudioTestAgent] Input: transcribe audio

 STT Output:
 None
