# Full Pipeline

This jupyter notebook summarizes all steps for manual podcast creation with all steps exemplyfied for using the Google TTS.

1. Generate script from Input
2. Check script quality (length, ARI, ...)
3. Generate Audio

In [None]:
%pip install openai
%pip install -U -q "google-genai>=1.16.0" 
%pip install py-readability-metrics

# Helpers

import contextlib
import wave
from IPython.display import Audio

file_index = 0

@contextlib.contextmanager
def wave_file(filename, channels=1, rate=24000, sample_width=2):
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(sample_width)
        wf.setframerate(rate)
        yield wf

def play_audio_blob(blob):
  global file_index
  file_index += 1

  fname = f'audio_{file_index}.wav'
  with wave_file(fname) as wav:
    wav.writeframes(blob.data)

  return Audio(fname, autoplay=True)

def play_audio(response):
    return play_audio_blob(response.candidates[0].content.parts[0].inline_data)


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [None]:
# Set the OpenAI API key
import os
os.environ["OPENAI_API_KEY"] = "sk-..."

# TTS setup 
from google import genai
from google.genai import types
Google_client = genai.Client(api_key="...")



In [None]:
### Script generation with OPENAI o4 Mini as model and data/query.txt as input file


from openai import OpenAI


OPENAI_client = OpenAI()

# get input query from a file
with open("query.txt", "r") as file:
    input_query = file.read().strip()

response = OPENAI_client.responses.create(
    model="gpt-4o",
    input=input_query
)

print(response.output_text)

## Optional: Save the response to a file
with open("response.txt", "w") as file:
    file.write(response.output_text)

podcast_script = response.output_text


```
Speaker 1: Welcome to "Little Big Minds," where we explore the whys of the world! I'm your host, Lisa.

Speaker 2: And I'm Dr. Emily Sky, a scientist with a passion for explaining things.

Speaker 1: Thanks for joining us, Dr. Sky. So, we're diving into podcasts today. Why are they so important?

Speaker 2: Great question, Lisa! Think of podcasts like a magical library you carry in your pocket. They let you explore stories, learn new things, and even feel like you're on an adventure, all just by listening.

Speaker 1: Like when I'm reading comic books, but with voices?

Speaker 2: Exactly! And they’re perfect for curious minds, like Bart and Lisa Simpson. Podcasts can teach you about dinosaurs, space, or even how rainbows are made.

Speaker 1: What makes them extra special, though?

Speaker 2: Podcasts are like friendly chats. They make learning feel like a fun game, not like sitting in a classroom.

Speaker 1: So, they’re like having a fun storytelling session in your ear?

Speake

In [None]:
# Optional: Restore response from a file
with open("response.txt", "r") as file:
    podcast_script = file.read().strip()
print(f"Restored response: {podcast_script}")


Restored response: ```
Speaker 1: Welcome to "Little Big Minds," where we explore the whys of the world! I'm your host, Lisa.

Speaker 2: And I'm Dr. Emily Sky, a scientist with a passion for explaining things.

Speaker 1: Thanks for joining us, Dr. Sky. So, we're diving into podcasts today. Why are they so important?

Speaker 2: Great question, Lisa! Think of podcasts like a magical library you carry in your pocket. They let you explore stories, learn new things, and even feel like you're on an adventure, all just by listening.

Speaker 1: Like when I'm reading comic books, but with voices?

Speaker 2: Exactly! And they’re perfect for curious minds, like Bart and Lisa Simpson. Podcasts can teach you about dinosaurs, space, or even how rainbows are made.

Speaker 1: What makes them extra special, though?

Speaker 2: Podcasts are like friendly chats. They make learning feel like a fun game, not like sitting in a classroom.

Speaker 1: So, they’re like having a fun storytelling session i

In [4]:
## Check if the script is beneath the maximum length

import re

wpm = 120  # Words per minute

def extract_episode_length(query):
    match = re.search(r'"episode_length_minutes":\s*(\d+)', query)
    if match:
        return int(match.group(1))
    return None

# get input query from a file
with open("data/query.txt", "r") as file:
    input_query = file.read().strip()
episode_length_minutes = extract_episode_length(input_query)

print(f"Episode length in minutes: {episode_length_minutes}")

if episode_length_minutes is not None:
    max_length = episode_length_minutes * wpm 
    script_length = len(response.output_text.split())
    if script_length > max_length:
        print(f"Warning: The script length ({script_length} words) exceeds the maximum allowed length ({max_length} words).")
    else:
        print(f"The script length ({script_length} words) is within the allowed limit.")




Episode length in minutes: 2


In [None]:
from readability import Readability

# Clean up script text
script_text = podcast_script.replace("Speaker 1:", "").replace("Speaker 2:", "").strip()

# Analyze readability
readability = Readability(script_text)
ari_score = readability.ari()

# get ages: ges: [7, 9]
min_age = ari_score.ages[0]
max_age = ari_score.ages[1]
print(f"Recommended ages: {min_age} - {max_age}")



Recommended ages: 9 - 10


In [None]:
import openai

# Evaluation criteria
criteria = [
    "Does the output fully and clearly address the input topic?",
    "Does the conversation flow naturally from one speaker to the next?",
    "Are transitions smooth, ideas logically connected, and no abrupt topic shifts?",
    "Penalize unclear, off-topic, or awkward parts."
]

def build_prompt(input_text, output_text):
    prompt = f"""Y ou are a conversation quality evaluator. Given an input and a generated podcast script, evaluate it against the following criteria:
        {chr(10).join(f"{i+1}. {c}" for i, c in enumerate(criteria))}

        Provide a score for each (0 to 1), followed by a one-line justification. Keep scores to one decimal point of precision.

        Input:
        {input_text}

        Output:
        {output_text}

        Respond in the following format:

        1. [score] - [short reason]
        2. [score] - [short reason]
        3. [score] - [short reason]
        4. [score] - [short reason]

        Then output the average score like this:
        Average: [score]
        """
    return prompt

def evaluate_podcast(input_text, output_text):
    prompt = build_prompt(input_text, output_text)
    
    response = OPENAI_client.responses.create(
        model="gpt-4o",
        input=prompt
    )

    print(response.output_text)

    eval_text = response.output_text.strip()
    print("Evaluation Results:\n")
    print(eval_text)


# Example usage
input_text = input_query  # Use the query generated earlier
output_text = podcast_script  # Use the script generated earlier

evaluate_podcast(input_text, output_text)


1. 1.0 - The output fully and clearly addresses the topic of why podcasts are important.

2. 1.0 - The conversation flows naturally with smooth exchanges between the host and guest.

3. 1.0 - Transitions are smooth, and ideas are logically connected without abrupt topic shifts.

4. 1.0 - The script is clear, on-topic, and free of awkward parts.

Average: 1.0
Evaluation Results:

1. 1.0 - The output fully and clearly addresses the topic of why podcasts are important.

2. 1.0 - The conversation flows naturally with smooth exchanges between the host and guest.

3. 1.0 - Transitions are smooth, and ideas are logically connected without abrupt topic shifts.

4. 1.0 - The script is clear, on-topic, and free of awkward parts.

Average: 1.0


In [None]:
from google import genai
from google.genai import types
import wave


# Set up the wave file to save the output:
def wave_file(filename, pcm, channels=1, rate=24000, sample_width=2):
   with wave.open(filename, "wb") as wf:
      wf.setnchannels(channels)
      wf.setsampwidth(sample_width)
      wf.setframerate(rate)
      wf.writeframes(pcm)

# --- Construct the preamble and full prompt ---
# This preamble is more general, as we're not parsing speaker names dynamically
preamble = "TTS the following conversation in an engaging and fun way:\n"

full_prompt = preamble + podcast_script

response = Google_client.models.generate_content(
   model="gemini-2.5-flash-preview-tts",
   contents=full_prompt,
   config=types.GenerateContentConfig(
      response_modalities=["AUDIO"],
      speech_config=types.SpeechConfig(
         multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(
            speaker_voice_configs=[
               types.SpeakerVoiceConfig(
                  speaker='Speaker 1',
                  voice_config=types.VoiceConfig(
                     prebuilt_voice_config=types.PrebuiltVoiceConfig(
                        voice_name='Kore',
                     )
                  )
               ),
               types.SpeakerVoiceConfig(
                  speaker='Speaker 2',
                  voice_config=types.VoiceConfig(
                     prebuilt_voice_config=types.PrebuiltVoiceConfig(
                        voice_name='Puck',
                     )
                  )
               ),
            ]
         )
      )
   )
)

data = response.candidates[0].content.parts[0].inline_data.data

file_name='output_conversation.wav' # Changed filename to avoid overwriting
wave_file(file_name, data) # Saves the file to current directory

print(f"Text-to-speech audio saved to '{file_name}'")


Text-to-speech audio saved to 'output_conversation.wav'


# TTS from File

Simple TTS from file

In [None]:
from google import genai
from google.genai import types
import wave


# Set up the wave file to save the output:
def wave_file(filename, pcm, channels=1, rate=24000, sample_width=2):
   with wave.open(filename, "wb") as wf:
      wf.setnchannels(channels)
      wf.setsampwidth(sample_width)
      wf.setframerate(rate)
      wf.writeframes(pcm)


# --- Read the entire script from the file ---
script_file_path = "samplescript.txt"
conversation_script = ""

try:
    with open(script_file_path, "r") as f:
        conversation_script = f.read().strip() # Read the whole file content
except FileNotFoundError:
    print(f"Error: The file '{script_file_path}' was not found.")
    print("Please make sure 'samplescript.txt' exists in the correct directory.")
    exit()

# --- Construct the preamble and full prompt ---
# This preamble is more general, as we're not parsing speaker names dynamically
preamble = "TTS the following conversation in an engaging way:\n"

# Combine preamble and script for the full prompt
full_prompt = preamble + conversation_script
print("--- Full Prompt ---")
print(full_prompt)
print("-------------------")


response = Google_client.models.generate_content(
   model="gemini-2.5-flash-preview-tts",
   contents=full_prompt,
   config=types.GenerateContentConfig(
      response_modalities=["AUDIO"],
      speech_config=types.SpeechConfig(
         multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(
            speaker_voice_configs=[
               types.SpeakerVoiceConfig(
                  speaker='Speaker 1',
                  voice_config=types.VoiceConfig(
                     prebuilt_voice_config=types.PrebuiltVoiceConfig(
                        voice_name='Kore',
                     )
                  )
               ),
               types.SpeakerVoiceConfig(
                  speaker='Speaker 2',
                  voice_config=types.VoiceConfig(
                     prebuilt_voice_config=types.PrebuiltVoiceConfig(
                        voice_name='Puck',
                     )
                  )
               ),
            ]
         )
      )
   )
)

data = response.candidates[0].content.parts[0].inline_data.data

file_name='output_conversation.wav' # Changed filename to avoid overwriting
wave_file(file_name, data) # Saves the file to current directory

print(f"Text-to-speech audio saved to '{file_name}'")


--- Full Prompt ---
TTS the following conversation in an engaging way:
Speaker 1: Welcome to BrainSpark! I’m your host, Kent.  
Speaker 2: And I’m Dr. Max Sparks, your friendly neighborhood scientist.  
Speaker 1: Today, we’re diving into something super cool—neural networks!  
Speaker 2: Ooh, sounds like we’re getting into robot brain stuff, huh?  
Speaker 1: Exactly! But let’s break it down for everyone. Imagine your brain—lots of little parts working together to help you think and learn. A neural network is like that, but for computers.  
Speaker 2: It’s like teaching a computer how to recognize stuff. So, let’s say you show it a picture of a cat. It looks at the picture, and through practice, it learns what makes a cat a cat.  
Speaker 1: Kind of like when you first learned to tell the difference between a dog and a cat, right, Bart?  
Speaker 2: Yeah! You probably learned it by seeing lots of pictures and noticing patterns—like a dog has floppy ears and a cat has pointy ears. A ne