<a href="https://colab.research.google.com/github/angelatyk/tinytutor/blob/dev/notebooks/00_master_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [19]:
!pip install -q google-adk google-generativeai python-dotenv
!pip install -q google-cloud-texttospeech pydub
!pip install -q gradio

print("‚úÖ All libraries installed.")

‚úÖ All libraries installed.


In [20]:
import os
import json
import asyncio
from pathlib import Path
from typing import List, Tuple

import google.generativeai as genai
from google.colab import userdata

# ADK
from google.adk.agents import Agent
from google.adk.models.google_llm import Gemini
from google.adk.runners import InMemoryRunner
from google.adk.tools import google_search
from google.genai import types

# TTS
from google.cloud import texttospeech
from pydub import AudioSegment

import gradio as gr

In [21]:
# Gemini Key
GOOGLE_API_KEY = userdata.get("GOOGLE_API_KEY")
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
genai.configure(api_key=GOOGLE_API_KEY)
print("‚úÖ Gemini API configured.")

# Google TTS Service Account JSON
SERVICE_ACCOUNT_JSON = userdata.get("GCP_VI_SERVICE_ACCOUNT_JSON")
if not SERVICE_ACCOUNT_JSON:
    raise RuntimeError("Upload GCP_VI_SERVICE_ACCOUNT_JSON to Colab Secrets!")

with open("gcp_tts_sa.json", "w") as f:
    f.write(SERVICE_ACCOUNT_JSON)

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "gcp_tts_sa.json"

tts_client = texttospeech.TextToSpeechClient()
print("‚úÖ Google TTS configured.")

‚úÖ Gemini API configured.
‚úÖ Google TTS configured.


In [29]:
retry_config = types.HttpRetryOptions(
    attempts=5,
    exp_base=7,
    initial_delay=1,
    http_status_codes=[429, 500, 503, 504]
)

pedagogy_agent = Agent(
    name="PedagogyAgent",
    model=Gemini(model="gemini-2.5-flash-lite", retry_options=retry_config),
    description="Explains topics in simple ELI5 style.",
    instruction="Explain the topic like I'm 5. Use google_search if needed.",
    tools=[google_search],
)

runner = InMemoryRunner(agent=pedagogy_agent)

In [105]:
async def run_pedagogy_async(topic: str) -> str:
    response = await runner.run_debug(topic)

    return response[0].content.parts[0].text

In [107]:
SCRIPTWRITER_SYSTEM_PROMPT = """
You are AudioNarratorAgent. Produce a single-voice children‚Äôs story (200‚Äì450 words).
No labels, no titles, no markdown. Add exactly 2 learning questions inside the story.
"""

def run_scriptwriter(explanation: str) -> str:
    model = genai.GenerativeModel(
        model_name="gemini-2.5-flash",
        system_instruction=SCRIPTWRITER_SYSTEM_PROMPT
    )

    response = model.generate_content(
        f"Write a children's story based on this:\n{explanation}",
        generation_config=genai.GenerationConfig(
            temperature=0.9,
            max_output_tokens=4096
        )
    )

    # Safest extraction
    try:
        return response.text
    except Exception:
        pass

    # Fallback
    try:
        return response.candidates[0].content.parts[0].text
    except Exception:
        pass

    return "‚ö†Ô∏è ScriptWriter failed."

In [108]:
def chunk_text(text, max_chars=4500):
    text = text.strip()
    if len(text) <= max_chars:
        return [text]
    chunks = []
    while len(text) > max_chars:
        cut = text.rfind(". ", 0, max_chars)
        if cut == -1:
            cut = max_chars
        chunks.append(text[:cut+1])
        text = text[cut+1:].strip()
    chunks.append(text)
    return chunks


def tts_segment(text):
    synthesis_input = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        name="en-US-Neural2-C"
    )
    audio_cfg = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3,
        speaking_rate=1.02,
        pitch=-2.0
    )
    response = tts_client.synthesize_speech(
        input=synthesis_input,
        voice=voice,
        audio_config=audio_cfg
    )
    return response.audio_content


def audio_writer(script_text: str, out="story.mp3"):
    chunks = chunk_text(script_text)
    audio = AudioSegment.silent(200)
    for i, chunk in enumerate(chunks, 1):
        path = f"seg_{i}.mp3"
        with open(path, "wb") as f:
            f.write(tts_segment(chunk))
        audio += AudioSegment.from_mp3(path)
        audio += AudioSegment.silent(150)
    audio.export(out, format="mp3")
    return out


In [110]:
# 1Ô∏è‚É£ ELI5
app1 = gr.Interface(
    fn=run_pedagogy_async,
    inputs=gr.Textbox(label="Your Topic", placeholder="e.g. What is an AI agent?"),
    outputs=gr.Textbox(label="ELI5 Explanation"),
    title="üü¶ Step 1 ‚Äî Generate ELI5 Explanation"
)

In [111]:
# 2Ô∏è‚É£ Script
app2 = gr.Interface(
    fn=run_scriptwriter,
    inputs=gr.Textbox(label="ELI5 Text"),
    outputs=gr.Textbox(label="Generated Story Script"),
    title="üüß Step 2 ‚Äî Convert Explanation to Story Script"
)

In [112]:
# 3Ô∏è‚É£ Audio
def run_audio(script):
    return audio_writer(script, "story.mp3")

app3 = gr.Interface(
    fn=run_audio,
    inputs=gr.Textbox(label="Final Script"),
    outputs=gr.Audio(label="Generated Audio"),
    title="üü© Step 3 ‚Äî Convert Script to Audio"
)

In [None]:
# Combined UI
app = gr.TabbedInterface(
    [app1, app2, app3],
    ["1. ELI5", "2. Script", "3. Audio"]
)

app.launch(debug=True)

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://53ce9eb9dce9c179db.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)



 ### Continue session: debug_session_id

User > what is an ai agent?
PedagogyAgent > Imagine you have a super smart toy robot! This robot can see things, think about them, and then do things all by itself to help you. That's like an AI agent!

An AI agent is a computer program that is designed to:

*   **Perceive:** It can take in information from its environment, like looking at a picture or listening to words.
*   **Think:** It can process that information and make decisions, like figuring out what is in the picture or what you are asking.
*   **Act:** Based on its thinking, it can do something, like showing you the right picture or giving you an answer.

So, it's like a smart helper in a computer that can understand and do things on its own!

 ### Continue session: debug_session_id

User > What is an ai agent?
PedagogyAgent > Think of an AI agent like a very smart robot that lives inside a computer. It's designed to do things for you all by itself!

Here's what it does:

1.  **It "