# 🎙️ Generating a Podcast from a Blog Post with the Gemini APIs

This tutorial demonstrates how to create an audio podcast from a blog post using Google AI's Gemini models for summarization and text-to-speech (TTS). We will use the `google-genai` library to interact with the Gemini API.

**Workflow:**

1.  **Scrape Blog Content:** Extract the main text content from a given blog URL.
2.  **Summarize Content:** Use a Gemini model to generate a concise and engaging podcast script from the scraped text.
3.  **Generate Audio:** Use a Gemini TTS model to convert the podcast script into audio.
4.  **Save Audio:** Save the generated audio data as a WAV file.

## Setup and Configuration

First, let's install the necessary libraries and set up our configuration.

In [20]:
#@title Import required libraries
import os
import google.generativeai as genai
from google import genai as google_genai # Import the client library with an alias
from google.genai import types # Import types
import requests
from bs4 import BeautifulSoup
from google.colab import userdata
from uuid import uuid4
import wave

In [21]:
#@title Configuration Settings
GOOGLE_API_KEY = userdata.get("GOOGLE_API_KEY")
BLOG_URL = "https://research.google/blog/how-we-created-hov-specific-etas-in-google-maps/"
VOICE_CHOICE = "Kore" # Choose a pre-built voice (e.g., 'Echo', 'Onyx', 'Aurora', 'Nova', etc.)
LANGUAGE = "Japanese"

if GOOGLE_API_KEY == "YOUR_GOOGLE_API_KEY" or not GOOGLE_API_KEY:
    print("Please replace 'YOUR_GOOGLE_API_KEY' with your actual Google API key in Colab secrets.")
elif not BLOG_URL:
    print("Please provide a valid blog URL to process.")
else:
    print("Configuration loaded successfully.")

Configuration loaded successfully.


## Helper Functions

We'll define a couple of helper functions: one to save audio data to a WAV file and another to scrape the content from a blog URL.

In [22]:
# Save the audio data to a .wav file.
def wave_file(filename: str, pcm_data: bytes, channels=1, sample_width=2, rate=24000):
    """
    Saves PCM audio data to a .wav file.

    Args:
        filename: The path to save the file to.
        pcm_data: The audio data in bytes.
        channels: Number of audio channels.
        sample_width: Sample width in bytes.
        rate: The sampling rate (e.g., 24000 for Gemini's TTS).
    """
    os.makedirs(os.path.dirname(filename), exist_ok=True)
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(sample_width)
        wf.setframerate(rate)
        wf.writeframes(pcm_data)
    print(f"Successfully saved audio to: {filename}")

# Scrape blog content from a URL.
def scrape_blog_content(url: str) -> str | None:
    """
    Scrapes the main text content of a blog post from a given URL.

    Args:
        url: The URL of the blog post to scrape.

    Returns:
        The scraped text content as a single string, or None if scraping fails.
    """
    try:
        print(f"Scraping content from: {url}")
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        soup = BeautifulSoup(response.content, 'html.parser')
        paragraphs = soup.find_all('p')

        if not paragraphs:
            print("Warning: No paragraph tags found. Attempting to get text from the body.")
            return soup.body.get_text(separator='\n', strip=True) if soup.body else ""

        content = "\n".join([p.get_text() for p in paragraphs])
        print("Scraping successful.")
        return content
    except requests.exceptions.RequestException as e:
        print(f"Error: Failed to scrape the URL. {e}")
        return None

## Podcast Generation Function

This function orchestrates the entire process: scraping the blog, summarizing the content using a Gemini model, and generating audio from the summary using the Gemini TTS model.

In [18]:
def generate_podcast_from_url(blog_url: str, api_key: str, voice: str, language: str = "English"):
    """
    Generates a podcast from a blog URL using Gemini for summarization and TTS.

    Args:
        blog_url: The URL of the blog to process.
        api_key: Your Google API key.
        voice: The pre-built voice to use for the podcast.
               Available voices include 'Echo', 'Onyx', 'Aurora', 'Nova', etc.
        language: The desired language for the podcast script and audio (default is "English").
    """
    try:
        # Configure the generativeai library for summarization
        genai.configure(api_key=api_key) # Use genai for configure
        # Create a client for the genai library for TTS
        client = google_genai.Client(api_key=api_key) # Use the aliased google_genai for Client
    except Exception as e:
        print(f"Error configuring Google API or creating client: {e}")
        return

    # 1. Scrape the blog content
    blog_content = scrape_blog_content(blog_url)
    if not blog_content:
        return

    try:
        # 2. Generate a summary with a standard Gemini model
        print("Initializing summarization model...")
        summarizer_model = genai.GenerativeModel(model_name="gemini-2.5-flash")

        summary_prompt = (
            f"You are a podcast host. Create a concise, engaging script in {language} from the "
            "following blog content. Capture the main points conversationally. "
            "Do not mention style details like 'intro music fades in'. "
            f"The content should be optimized for text-to-speech generation. \n\n"
            f"BLOG CONTENT:\n---\n{blog_content}"
        )

        print(f"Generating summary in {language}...")
        summary_response = summarizer_model.generate_content(summary_prompt)

        # Check if the response contains valid text content
        if summary_response.candidates and summary_response.candidates[0].content.parts:
            summary_text = summary_response.text
            print(f"Generated Summary:\n---\n{summary_text}\n---")
        else:
            print("Error: Summarization failed. The model did not return valid text content.")
            if summary_response.candidates and summary_response.candidates[0].finish_reason:
                print(f"Finish reason: {summary_response.candidates[0].finish_reason}")
            return # Exit the function if summarization failed


        # 3. Generate audio using the dedicated text-to-speech model via the client
        print(f"Generating audio from summary with voice: {voice} in {language}...")
        audio_response = client.models.generate_content(
            model="gemini-2.5-flash-preview-tts",
            contents=summary_text,
            config=types.GenerateContentConfig(
                response_modalities=["AUDIO"],
                speech_config=types.SpeechConfig(
                    voice_config=types.VoiceConfig(
                        prebuilt_voice_config=types.PrebuiltVoiceConfig(
                            voice_name=voice,
                        )
                    )
                ),
            )
        )

        # 4. Save the generated audio to a file
        # Access the audio data from the response structure
        if audio_response.candidates and audio_response.candidates[0].content.parts:
             audio_data = audio_response.candidates[0].content.parts[0].inline_data.data
             if audio_data:
                output_dir = "audio_generations"
                output_filename = f"{output_dir}/podcast_{uuid4()}.wav"
                # Use the wave_file helper to save the PCM data
                wave_file(output_filename, audio_data)
             else:
                 print("Error: Audio data is empty.")
        else:
            print("Error: Audio generation failed. No audio data was returned in the expected format.")


    except Exception as e:
        print(f"An error occurred during the generation process: {e}")

## Execute the Podcast Generation

Finally, we call the `generate_podcast_from_url` function with our configuration to start the process.

In [19]:
# --- Execution ---
if GOOGLE_API_KEY != "YOUR_GOOGLE_API_KEY" and GOOGLE_API_KEY and BLOG_URL:
    generate_podcast_from_url(
        blog_url=BLOG_URL,
        api_key=GOOGLE_API_KEY,
        voice=VOICE_CHOICE,
        language=LANGUAGE
    )
else:
    print("Please ensure your API key is set and a valid blog URL is provided in the Configuration section.")

Scraping content from: https://research.google/blog/how-we-created-hov-specific-etas-in-google-maps/
Scraping successful.
Initializing summarization model...
Generating summary in Japanese...
Generated Summary:
---
皆さん、こんにちは！日々の通勤、移動でGoogle マップを使っている方、朗報です！

最近、EVや相乗り、公共交通機関など、環境に優しい移動手段へのシフトが進んでいますよね。特に、複数の乗客が乗る車専用の「HOVレーン」、日本語でいう「高乗車車両レーン」は、交通量の多い時間帯に一般レーンよりも速い傾向があります。でも、このHOVレーンを使った場合の正確な到着予測時間（ETA）を出すのは難しかったんです。

そこでGoogle マップが、HOVレーンを含むルートを選択でき、そのETAも表示する新機能を導入しました！

どうやってこれを実現したかというと、これがGoogle Researchの技術の真骨頂なんです。HOVレーンを使っているかどうかを正確に特定するのは実は簡単ではありません。速度データだけでは判断できない場合も多いんです。

そこで彼らは、教師なし学習というAIの手法を使いました。つまり、あらかじめHOVかそうでないか、という正解データがなくても、AIが自らパターンを見つけて分類するんです。速度データだけでなく、車線の中央からの横方向の距離、時間の経過などを複合的に分析し、さらに「ソフトクラスタリング」や複数のモデルを使う「混合エキスパート」といった高度な手法を組み合わせて、精度を高めています。

この新機能のおかげで、HOVレーンを利用するドライバーのETA精度はなんと75%も向上したそうです！通勤時間の予測がより正確になり、渋滞の緩和や排出ガスの削減にも貢献します。

この技術は、将来的に二輪車など他の交通手段にも応用できる可能性を秘めています。よりスマートで環境に優しい移動体験を可能にするGoogleの取り組みに、これからも注目ですね。
---
Generating audio from summary with voice: