<a href="https://colab.research.google.com/github/cburchett/podcastcreator/blob/main/PodcastCreator_Dia_1_6B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip -q install gradio

In [None]:
# Install directly from GitHub
!pip -q install git+https://github.com/nari-labs/dia.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [None]:
from dia.model import Dia

audio_model = Dia.from_pretrained("nari-labs/Dia-1.6B")

  WeightNorm.apply(module, name, dim)


In [None]:
import os
from google import genai

from google.colab import userdata
API_KEY = userdata.get('GOOGLE_API_KEY')

os.environ["GOOGLE_API_KEY"] = API_KEY

# Create a client
client = genai.Client(api_key=API_KEY)

In [None]:
#MODEL_ID = "gemini-2.5-pro-exp-03-25"
YOUTUBE_URL = "https://www.youtube.com/watch?v=rSCaiHFRx0k"

In [None]:
PROMPT = """Analyze the attached Youtube video.

Based on the key topics, information, and events presented in the video, generate a medium length, conversational podcast script between two speakers, labeled S1 and S2.

The script should summarize or discuss the main points of the video in a natural, back-and-forth dialogue format.

**Crucially, format the output *exactly* as follows:**

*   Each line of dialogue must start with either `[S1]` or `[S2]`.
*   Follow the speaker tag with a space, then their dialogue.
*   Present the dialogue turns sequentially, mimicking a conversation.
*   Don't add any prefix or suffix to the conversation

**Use this specific structure as your template:**

```
[S1] {Dialogue for speaker 1}
[S2] {Dialogue for speaker 2}
[S1] {Dialogue for speaker 1, potentially a reaction or follow-up}
[S2] {Dialogue for speaker 2}
[S1] {Dialogue for speaker 1}
```

**Example of the desired output format:**

```
[S1] Hey Sam, How are you? Let me tell you about Dia it's an open weights text to dialogue model.
[S2] You get full control over scripts and voices.
[S1] Wow. Amazing. (laughs)
[S2] Try it now on Git hub or Hugging Face.
[S1] You bet I will!
```

**Constraints:**

*   Keep the turns relatively short and conversational.
*   Focus on the core message or interesting aspects of the video.
*   Adhere strictly to the `[S1]` / `[S2]` formatting.
*   **Incorporate non-verbal cues where natural and appropriate.** These should be enclosed in parentheses within the dialogue line (e.g., `(laughs)` or `(sighs)`). You may use cues from this list: `(laughs)`, `(clears throat)`, `(sighs)`, `(gasps)`, `(coughs)`, `(singing)`, `(sings)`, `(mumbles)`, `(beep)`, `(groans)`, `(sniffs)`, `(claps)`, `(screams)`, `(inhales)`, `(exhales)`, `(applause)`, `(burps)`, `(humming)`, `(sneezes)`, `(whistles)`.
*   Do not add any introductory text, explanations, or summaries outside of the formatted script itself.

**Now, analyze the video and generate the script.**

---
"""

In [None]:
import os
from datetime import datetime

def ensure_folder_exists():
    now = datetime.now()
    timestamp = now.strftime('%Y-%m-%d %H:%M:%S')
    folder_path = '/content/' + timestamp
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
        print(f"Folder '{folder_path}' created.")
    else:
        print(f"Folder '{folder_path}' already exists.")
    return folder_path

In [None]:
from google.genai import types

def generate_podcast_script(youtube_url, model, prompt):
    response = client.models.generate_content(
        model=model,
        contents=types.Content(
            parts=[
                types.Part(text=prompt),
                types.Part(
                    file_data=types.FileData(file_uri=youtube_url)
                )
            ]
        )
    )
    return response.text

In [None]:
import re

def split_podcast_transcript(transcript, pairs):
    """Splits a podcast transcript into segments based on S1 and S2 pairs.

    Args:
        transcript: The podcast transcript as a string.

    Returns:
        A list of strings, where each string represents a segment of the transcript.
        Returns an empty list if the input is invalid or no valid segments are found.
    """

    segments = []
    try:
        # Split the transcript into lines
        lines = transcript.strip().split('\n')

        # Use regular expressions to find S1 and S2 pairs
        pattern = r"\[(S[12])\](.*)"
        s1_s2_pairs = []
        for line in lines:
          match = re.match(pattern, line)
          if match:
            s1_s2_pairs.append(match.groups())

        # Group lines into segments of three S1/S2 pairs
        for i in range(0, len(s1_s2_pairs), pairs):
            segment = ""
            for j in range(i, min(i + pairs, len(s1_s2_pairs))):
                segment += f"[{s1_s2_pairs[j][0]}] {s1_s2_pairs[j][1]}\n"
            segments.append(segment.strip())
    except Exception as e:
        print(f"Error processing transcript: {e}")
        return []

    return segments

In [None]:
import os
import soundfile as sf
from pydub import AudioSegment

def combine_mp3s(folder_path, output_file):
    """Combines all MP3 files in a folder into a single MP3 file.

    Args:
        folder_path: The path to the folder containing the MP3 files.
        output_file: The path to the output MP3 file.
    """
    combined = AudioSegment.empty()
    file_list = os.listdir(folder_path)
    file_list.sort()
    for filename in file_list:
        if filename.endswith(".mp3"):
            filepath = os.path.join(folder_path, filename)
            try:
                segment = AudioSegment.from_mp3(filepath)
                combined += segment
            except Exception as e:
                print(f"Error processing {filename}: {e}")
    combined.export(output_file, format="mp3")
    print(f"Combined {len(file_list)} MP3 files into {output_file}")
    return output_file

In [None]:
import soundfile as sf

def generate_podcast(transcript, pairs):

    folder_path = ensure_folder_exists()

    segments = split_podcast_transcript(transcript, pairs)
    print(f"Number of segments: " + str(len(segments)))

    for idx, sec in enumerate(segments):
      print(f"Generating segment {idx+1}")
      print(sec)
      output = audio_model.generate(sec)
      sf.write(folder_path + f"/podcast_{idx+1}.mp3", output, 44100)

    return combine_mp3s(folder_path, folder_path + "/finaL_podcast.mp3")


In [None]:
import gradio as gr

In [None]:
with gr.Blocks() as podcast_script_generator:
  gr.Markdown("## Podcast Generator")
  with gr.Tab("Script"):
    with gr.Row():
      with gr.Column():
        youtube_url = gr.Textbox(label="URL", value=YOUTUBE_URL)
        selected_model = gr.Dropdown(["gemini-2.5-pro-exp-03-25"])
        system_prompt = gr.Textbox(label="System prompt", value=PROMPT)
        generate_script_button = gr.Button("Generate script")
      with gr.Column():
        podcast_script = gr.Textbox(label="Podcast script", lines=20)
  generate_script_button.click(fn=generate_podcast_script, inputs=[youtube_url, selected_model, system_prompt], outputs=[podcast_script])

  with gr.Tab("Podcast"):
     with gr.Row():
       with gr.Column():
          final_podcast_script = gr.Textbox(label="Final podcast script", lines=20)
          pairs = gr.Number(label="Number of segment pairs", value=2)
          generate_podcast_button = gr.Button("Generate podcast")
       with gr.Column():
          podcast_audio = gr.Audio(label="Final podcast audio", type='filepath')
  generate_podcast_button.click(fn=generate_podcast, inputs=[final_podcast_script, pairs], outputs=[podcast_audio])
  podcast_script.change(fn=lambda x: x, inputs=podcast_script, outputs=final_podcast_script)

podcast_script_generator.launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://7da8ec8874d75971f0.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Folder '/content/2025-05-01 20:58:20' created.
Number of segments: 9
Generating segment 1
[S1]  Hey, did you see that NetApp video about AI and data infrastructure? It's clear AI is really shaking things up for storage.
[S2]  Totally. Tom Shields kicked it off talking about the pressure AI data pipelines put on traditional storage. It's not just about capacity anymore.
Generating segment 2
[S1]  Right, and Krish Vitaldevara dove into what 'AI-ready' really means. Especially for those massive training workloads.
[S2]  He said they need *super* high performance, like SuperPOD level, but here's the kicker – delivered over standard protocols IT already uses.
Generating segment 3
[S1]  Yeah, like PNFS and S3 over standard Ethernet. No need to rip and replace everything or learn totally new ways of doing things. (clears throat) That makes adoption way easier for IT Ops.
[S2]  And the need to scale storage, compute, and networking independently! That allows for better resource utilization.
Ge