## Notebook 4: TTS Workflow

We have the exact podcast transcripts ready now to generate our audio for the Podcast.

In this notebook, we will learn how to generate Audio using both [lucasnewman/f5-tts-mlx](https://huggingface.co/lucasnewman/f5-tts-mlx) models first. 

After that, we will use the output from Notebook 3 to generate our complete podcast

In [3]:
import IPython.display as ipd
from f5_tts_mlx.generate import generate, SAMPLE_RATE
from tqdm import tqdm


### Testing the Audio Generation

Let's try generating audio using the models to understand how they work. 

#### F5 TTS MLX Model

Let's try using the Model first and generate a short segment voice.

In [2]:
TEST_AUDIO_FILE = "./resources/f5_tts_mlx_test_audio.wav"
MODEL = "lucasnewman/f5-tts-mlx"

In [3]:
# Define text and description
text_prompt = """
Exactly! And the distillation part is where you take a LARGE-model,and compress-it down into a smaller, more efficient model that can run on devices with limited resources.
"""

generate(
    generation_text=text_prompt,
    model_name=MODEL,
    output_path=TEST_AUDIO_FILE
)

In [4]:
# Play audio in notebook
ipd.Audio(TEST_AUDIO_FILE, rate=SAMPLE_RATE)

## Bringing it together: Making the Podcast

Okay now that we understand everything-we can now use the complete pipeline to generate the entire podcast

Let's load in our pickle file from earlier and proceed:

In [5]:
import pickle

with open('./resources/podcast_ready_data.pkl', 'rb') as file:
    PODCAST_TEXT = pickle.load(file)

We will concatenate the generated segments of audio and also their respective sampling rates since we will require this to generate the final audio

In [6]:
outputs = []

Function generate text for speaker 1

In [7]:
def generate_speaker1_audio(text, output_path):
    generate(
        generation_text=text,
        model_name=MODEL,
        output_path=output_path
    )

Function to generate text for speaker 2

In [8]:
def generate_speaker2_audio(text, output_path):
    generate(
        generation_text=text,
        model_name=MODEL,
        output_path=output_path,
        ref_audio_path="./resources/test_en_2_ref_short.wav",
        ref_audio_text="Some call me nature, others call me mother nature."
    )

Helper function to convert the numpy output from the models into audio

In [9]:
PODCAST_TEXT

'[\n    ("Speaker 1", "Alright folks, welcome to our podcast, where we dive deep into the cutting-edge world of Large Language Models (LLMs) and knowledge distillation. Today, we\'re exploring how we can enhance the capabilities of smaller models by transferring knowledge from larger, proprietary models. Imagine if we could make your humble assistant bot as smart as the mighty GPT-4! That\'s what this talk is all about. So, let\'s get started!"),\n    ("Speaker 2", "Wow, that sounds amazing! So, are we talking about making open-source models smarter just like the fancy proprietary ones?"),\n    ("Speaker 1", "Exactly! You got it. Open-source models are incredibly accessible, but they often don\'t have the depth and breadth of knowledge that proprietary models have. So, our goal is to bridge that gap and make them more powerful without needing to pay the hefty price tag."),\n    ("Speaker 2", "That\'s awesome! How do we do that? Is it like transferring skills or something?"),\n    ("Spe

Most of the times we argue in life that Data Structures isn't very useful. However, this time the knowledge comes in handy. 

We will take the string from the pickle file and load it in as a Tuple with the help of `ast.literal_eval()`

In [10]:
import ast

ast.literal_eval(PODCAST_TEXT)

[('Speaker 1',
  "Alright folks, welcome to our podcast, where we dive deep into the cutting-edge world of Large Language Models (LLMs) and knowledge distillation. Today, we're exploring how we can enhance the capabilities of smaller models by transferring knowledge from larger, proprietary models. Imagine if we could make your humble assistant bot as smart as the mighty GPT-4! That's what this talk is all about. So, let's get started!"),
 ('Speaker 2',
  'Wow, that sounds amazing! So, are we talking about making open-source models smarter just like the fancy proprietary ones?'),
 ('Speaker 1',
  "Exactly! You got it. Open-source models are incredibly accessible, but they often don't have the depth and breadth of knowledge that proprietary models have. So, our goal is to bridge that gap and make them more powerful without needing to pay the hefty price tag."),
 ('Speaker 2',
  "That's awesome! How do we do that? Is it like transferring skills or something?"),
 ('Speaker 1',
  "Yes, it'

#### Generating the Final Podcast

Finally, we can loop over the Tuple and use our helper functions to generate the audio

In [13]:
final_audio = None

i = 1

for speaker, text in tqdm(ast.literal_eval(PODCAST_TEXT), desc="Generating podcast segments", unit="segment"):
    output_path = f"./resources/segments/_podcast_segment_{i}.wav"
    if speaker == "Speaker 1":
        generate_speaker1_audio(text, output_path)
    else:  # Speaker 2
        generate_speaker2_audio(text, output_path)
    i += 1

Generating podcast segments:   0%|          | 0/38 [00:00<?, ?segment/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2897 frames (24.92368507385254 secs) for generated speech.


Generating podcast segments:   3%|▎         | 1/38 [00:59<36:31, 59.23s/segment]

Generated speech in 0:00:57.306867
Generated 25.60 seconds of audio in 0:00:57.307218.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 1180 frames (10.156512260437012 secs) for generated speech.


Generating podcast segments:   5%|▌         | 2/38 [01:17<21:09, 35.25s/segment]

Generated speech in 0:00:17.387298
Generated 8.68 seconds of audio in 0:00:17.387499.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2538 frames (21.836891174316406 secs) for generated speech.


Generating podcast segments:   8%|▊         | 3/38 [02:04<23:45, 40.71s/segment]

Generated speech in 0:00:46.255701
Generated 21.77 seconds of audio in 0:00:46.255852.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 928 frames (7.988785266876221 secs) for generated speech.


Generating podcast segments:  11%|█         | 4/38 [02:19<17:16, 30.48s/segment]

Generated speech in 0:00:13.360618
Generated 5.99 seconds of audio in 0:00:13.360783.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2518 frames (21.665771484375 secs) for generated speech.


Generating podcast segments:  13%|█▎        | 5/38 [03:06<20:01, 36.41s/segment]

Generated speech in 0:00:45.776760
Generated 21.56 seconds of audio in 0:00:45.776979.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 915 frames (7.8765177726745605 secs) for generated speech.


Generating podcast segments:  16%|█▌        | 6/38 [03:21<15:34, 29.19s/segment]

Generated speech in 0:00:13.927156
Generated 5.86 seconds of audio in 0:00:13.927327.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2650 frames (22.79835319519043 secs) for generated speech.


Generating podcast segments:  18%|█▊        | 7/38 [04:11<18:37, 36.04s/segment]

Generated speech in 0:00:48.828595
Generated 22.97 seconds of audio in 0:00:48.829007.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 942 frames (8.111275672912598 secs) for generated speech.


Generating podcast segments:  21%|██        | 8/38 [04:27<14:43, 29.45s/segment]

Generated speech in 0:00:13.717775
Generated 6.14 seconds of audio in 0:00:13.717936.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2525 frames (21.721830368041992 secs) for generated speech.


Generating podcast segments:  24%|██▎       | 9/38 [05:13<16:48, 34.76s/segment]

Generated speech in 0:00:45.573098
Generated 21.63 seconds of audio in 0:00:45.573348.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 826 frames (7.1081366539001465 secs) for generated speech.


Generating podcast segments:  26%|██▋       | 10/38 [05:26<13:04, 28.03s/segment]

Generated speech in 0:00:11.734671
Generated 4.91 seconds of audio in 0:00:11.734816.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2439 frames (20.98783302307129 secs) for generated speech.


Generating podcast segments:  29%|██▉       | 11/38 [06:12<15:03, 33.48s/segment]

Generated speech in 0:00:43.760662
Generated 20.72 seconds of audio in 0:00:43.760819.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 972 frames (8.361327171325684 secs) for generated speech.


Generating podcast segments:  32%|███▏      | 12/38 [06:28<12:08, 28.03s/segment]

Generated speech in 0:00:14.401979
Generated 6.46 seconds of audio in 0:00:14.402132.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2494 frames (21.461475372314453 secs) for generated speech.


Generating podcast segments:  34%|███▍      | 13/38 [07:13<13:55, 33.40s/segment]

Generated speech in 0:00:44.872918
Generated 21.30 seconds of audio in 0:00:44.873119.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 1002 frames (8.623024940490723 secs) for generated speech.


Generating podcast segments:  37%|███▋      | 14/38 [07:30<11:19, 28.33s/segment]

Generated speech in 0:00:15.227747
Generated 6.78 seconds of audio in 0:00:15.227922.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2702 frames (23.247413635253906 secs) for generated speech.


Generating podcast segments:  39%|███▉      | 15/38 [08:23<13:40, 35.67s/segment]

Generated speech in 0:00:50.210597
Generated 23.52 seconds of audio in 0:00:50.210768.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 956 frames (8.227956771850586 secs) for generated speech.


Generating podcast segments:  42%|████▏     | 16/38 [08:38<10:50, 29.57s/segment]

Generated speech in 0:00:14.145576
Generated 6.29 seconds of audio in 0:00:14.145800.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2595 frames (22.327247619628906 secs) for generated speech.


Generating podcast segments:  45%|████▍     | 17/38 [09:28<12:27, 35.58s/segment]

Generated speech in 0:00:48.181516
Generated 22.38 seconds of audio in 0:00:48.181675.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 952 frames (8.190279006958008 secs) for generated speech.


Generating podcast segments:  47%|████▋     | 18/38 [09:45<10:00, 30.01s/segment]

Generated speech in 0:00:14.006098
Generated 6.25 seconds of audio in 0:00:14.006267.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2575 frames (22.152057647705078 secs) for generated speech.


Generating podcast segments:  50%|█████     | 19/38 [10:33<11:13, 35.42s/segment]

Generated speech in 0:00:46.980670
Generated 22.17 seconds of audio in 0:00:46.980883.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 932 frames (8.024895668029785 secs) for generated speech.


Generating podcast segments:  53%|█████▎    | 20/38 [10:48<08:49, 29.40s/segment]

Generated speech in 0:00:13.937840
Generated 6.04 seconds of audio in 0:00:13.938039.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2679 frames (23.051124572753906 secs) for generated speech.


Generating podcast segments:  55%|█████▌    | 21/38 [11:39<10:11, 35.97s/segment]

Generated speech in 0:00:50.121065
Generated 23.28 seconds of audio in 0:00:50.121355.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 911 frames (7.837332725524902 secs) for generated speech.


Generating podcast segments:  58%|█████▊    | 22/38 [11:55<07:56, 29.78s/segment]

Generated speech in 0:00:13.434457
Generated 5.81 seconds of audio in 0:00:13.434620.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2022 frames (17.400117874145508 secs) for generated speech.


Generating podcast segments:  61%|██████    | 23/38 [12:30<07:53, 31.53s/segment]

Generated speech in 0:00:33.221368
Generated 16.27 seconds of audio in 0:00:33.221526.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 1048 frames (9.01633358001709 secs) for generated speech.


Generating podcast segments:  63%|██████▎   | 24/38 [12:47<06:18, 27.05s/segment]

Generated speech in 0:00:15.204716
Generated 7.27 seconds of audio in 0:00:15.204896.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2410 frames (20.731569290161133 secs) for generated speech.


Generating podcast segments:  66%|██████▌   | 25/38 [13:30<06:55, 31.96s/segment]

Generated speech in 0:00:42.331162
Generated 20.41 seconds of audio in 0:00:42.331384.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 906 frames (7.794442653656006 secs) for generated speech.


Generating podcast segments:  68%|██████▊   | 26/38 [13:45<05:22, 26.91s/segment]

Generated speech in 0:00:13.357208
Generated 5.76 seconds of audio in 0:00:13.357407.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 1300 frames (11.18363094329834 secs) for generated speech.


Generating podcast segments:  71%|███████   | 27/38 [14:06<04:34, 24.95s/segment]

Generated speech in 0:00:19.315982
Generated 8.57 seconds of audio in 0:00:19.316132.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 874 frames (7.526532173156738 secs) for generated speech.


Generating podcast segments:  74%|███████▎  | 28/38 [14:20<03:37, 21.74s/segment]

Generated speech in 0:00:12.931972
Generated 5.42 seconds of audio in 0:00:12.932225.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2488 frames (21.40501594543457 secs) for generated speech.


Generating podcast segments:  76%|███████▋  | 29/38 [15:06<04:20, 28.95s/segment]

Generated speech in 0:00:44.576981
Generated 21.24 seconds of audio in 0:00:44.577520.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 874 frames (7.522214889526367 secs) for generated speech.


Generating podcast segments:  79%|███████▉  | 30/38 [15:20<03:17, 24.65s/segment]

Generated speech in 0:00:12.918948
Generated 5.42 seconds of audio in 0:00:12.919128.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 1543 frames (13.278390884399414 secs) for generated speech.


Generating podcast segments:  82%|████████▏ | 31/38 [15:45<02:52, 24.68s/segment]

Generated speech in 0:00:23.688661
Generated 11.16 seconds of audio in 0:00:23.688817.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 790 frames (6.799903869628906 secs) for generated speech.


Generating podcast segments:  84%|████████▍ | 32/38 [15:58<02:06, 21.16s/segment]

Generated speech in 0:00:11.687772
Generated 4.52 seconds of audio in 0:00:11.687941.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 2499 frames (21.49710464477539 secs) for generated speech.


Generating podcast segments:  87%|████████▋ | 33/38 [16:44<02:22, 28.51s/segment]

Generated speech in 0:00:44.581119
Generated 21.36 seconds of audio in 0:00:44.581265.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 1070 frames (9.204694747924805 secs) for generated speech.


Generating podcast segments:  89%|████████▉ | 34/38 [17:01<01:40, 25.10s/segment]

Generated speech in 0:00:15.396875
Generated 7.51 seconds of audio in 0:00:15.397123.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 1551 frames (13.343647956848145 secs) for generated speech.


Generating podcast segments:  92%|█████████▏| 35/38 [17:26<01:15, 25.13s/segment]

Generated speech in 0:00:24.145989
Generated 11.24 seconds of audio in 0:00:24.146192.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 671 frames (5.7765889167785645 secs) for generated speech.


Generating podcast segments:  95%|█████████▍| 36/38 [17:37<00:41, 20.93s/segment]

Generated speech in 0:00:09.892879
Generated 3.25 seconds of audio in 0:00:09.893045.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 5.33 seconds
Got duration of 688 frames (5.92486047744751 secs) for generated speech.


Generating podcast segments:  97%|█████████▋| 37/38 [17:48<00:17, 17.98s/segment]

Generated speech in 0:00:10.065694
Generated 2.04 seconds of audio in 0:00:10.065862.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Got reference audio with duration: 3.94 seconds
Got duration of 598 frames (5.146483898162842 secs) for generated speech.


Generating podcast segments: 100%|██████████| 38/38 [17:58<00:00, 28.39s/segment]

Generated speech in 0:00:09.115681
Generated 2.47 seconds of audio in 0:00:09.115826.





In [1]:
# Combine the segments ./resources/segments
import re
import os
import numpy as np
import soundfile as sf

audio_files = sorted([f"./resources/segments/{file}" for file in os.listdir("./resources/segments")],
                     key=lambda x: int(re.search(r'segment_(\d+)\.wav', x).group(1)))

print("audio_files -> ", audio_files)
audio_data = []
for file in audio_files:
    data, rate = sf.read(file)
    audio_data.append(data)

audio_data = np.concatenate(audio_data)

audio_files ->  ['./resources/segments/_podcast_segment_1.wav', './resources/segments/_podcast_segment_2.wav', './resources/segments/_podcast_segment_3.wav', './resources/segments/_podcast_segment_4.wav', './resources/segments/_podcast_segment_5.wav', './resources/segments/_podcast_segment_6.wav', './resources/segments/_podcast_segment_7.wav', './resources/segments/_podcast_segment_8.wav', './resources/segments/_podcast_segment_9.wav', './resources/segments/_podcast_segment_10.wav', './resources/segments/_podcast_segment_11.wav', './resources/segments/_podcast_segment_12.wav', './resources/segments/_podcast_segment_13.wav', './resources/segments/_podcast_segment_14.wav', './resources/segments/_podcast_segment_15.wav', './resources/segments/_podcast_segment_16.wav', './resources/segments/_podcast_segment_17.wav', './resources/segments/_podcast_segment_18.wav', './resources/segments/_podcast_segment_19.wav', './resources/segments/_podcast_segment_20.wav', './resources/segments/_podcast_s

### Output the Podcast

We can now save this as a wav file

In [4]:
sf.write("./resources/_podcast.wav", audio_data, SAMPLE_RATE)

### Suggested Next Steps:

- Experiment with the prompts: Please feel free to experiment with the SYSTEM_PROMPT in the notebooks
- Extend workflow beyond two speakers
- Test other TTS Models
- Experiment with Speech Enhancer models as a step 5.

In [None]:
#fin