Welcome to Tortoise! 🐢🐢🐢🐢

Before you begin, I **strongly** recommend you turn on a GPU runtime.

There's a reason this is called "Tortoise" - this model takes up to a minute to perform inference for a single sentence on a GPU. Expect waits on the order of hours on a CPU.

In [0]:
#first follow the instructions in the README.md file under Local Installation
!pip3 install -r requirements.txt
# !python3 setup.py install

In [0]:
# Imports used through the rest of the notebook.
import torch
import torchaudio
import torch.nn as nn
import torch.nn.functional as F

import IPython

from tortoise.api import TextToSpeech
from tortoise.utils.audio import load_audio, load_voice, load_voices

# This will download all the models used by Tortoise from the HF hub.
# tts = TextToSpeech()
# If you want to use deepspeed the pass use_deepspeed=True nearly 2x faster than normal
tts = TextToSpeech(use_deepspeed=False, kv_cache=True)

In [0]:
# This is the text that will be spoken.

long_text = """
Good morning/evening to all our listeners. Welcome to Geopop, the place where we explore and discuss current topics and global issues. Today, we will delve into the complex and longstanding conflict between Israel and Palestine.

To fully understand the conflict, it is essential to examine the historical context. The issue dates back many decades, with deep roots in the struggle for territorial control and coexistence between two peoples.

The heart of the dispute is the land, with both states claiming rights to specific areas. Israel, created in 1948, is recognized by many countries, but Palestine seeks its own independence and international recognition.

The conflict has been characterized by episodes of violence and constant tensions. Both parties have experienced suffering and loss of human lives, creating a cycle of revenge that has made reaching a lasting solution challenging.

Over the years, various attempts have been made to resolve the conflict through peace negotiations, but so far, a definitive solution has not been achieved. The international community continues to work to facilitate dialogue and promote lasting peace.

In conclusion, the conflict between Israel and Palestine is a complex and delicate issue that requires a balanced and multilateral approach. The hope is that, through dialogue and mutual understanding, we can envision a future where both peoples can coexist peacefully.

Thank you for listening. Keep following us on Geopop for further insights into global issues. Until the next episode!
"""

# Here's something for the poetically inclined.. (set text=)
"""
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,"""

# Pick a "preset mode" to determine quality. Options: {"ultra_fast", "fast" (default), "standard", "high_quality"}. See docs in api.py
preset = "ultra_fast"

In [0]:
# Tortoise will attempt to mimic voices you provide. It comes pre-packaged
# with some voices you might recognize.

# Let's list all the voices available. These are just some random clips I've gathered
# from the internet as well as a few voices from the training dataset.
# Feel free to add your own clips to the voices/ folder.
%ls tortoise/voices

IPython.display.Audio('tortoise/voices/tom/1.wav')

## Single Text

In [0]:
# Pick one of the voices from the output above
voice = 'tom'
text = "Joining two modalities results in a surprising increase in generalization! What would happen if we combined them all?"
# Load it and send it through Tortoise.
voice_samples, conditioning_latents = load_voice(voice)
gen = tts.tts_with_preset(long_text, voice_samples=voice_samples, conditioning_latents=conditioning_latents, 
                          preset=preset)
torchaudio.save('generated.wav', gen.squeeze(0).cpu(), 24000)
IPython.display.Audio('generated.wav')

In [0]:
# Tortoise can also generate speech using a random voice. The voice changes each time you execute this!
# (Note: random voices can be prone to strange utterances)
gen = tts.tts_with_preset(text, voice_samples=None, conditioning_latents=None, preset=preset)
torchaudio.save('generated.wav', gen.squeeze(0).cpu(), 24000)
IPython.display.Audio('generated.wav')

In [0]:
# You can also combine conditioning voices. Combining voices produces a new voice
# with traits from all the parents.
#
# Lets see what it would sound like if Picard and Kirk had a kid with a penchant for philosophy:
voice_samples, conditioning_latents = load_voices(['pat', 'william'])

gen = tts.tts_with_preset("They used to say that if man was meant to fly, he’d have wings. But he did fly. He discovered he had to.", 
                          voice_samples=None, conditioning_latents=None, preset=preset)
torchaudio.save('captain_kirkard.wav', gen.squeeze(0).cpu(), 24000)
IPython.display.Audio('captain_kirkard.wav')

In [0]:
del tts  # Will break other cells, but necessary to conserve RAM if you want to run this cell.

# Tortoise comes with some scripts that does a lot of the lifting for you. For example,
# read.py will read a text file for you.
!python3 tortoise/read.py --voice=train_atkins --textfile=tortoise/data/riding_hood.txt --preset=ultra_fast --output_path=.

IPython.display.Audio('train_atkins/combined.wav')
# This will take awhile..

## Long text

In [0]:
import nltk
from pyspark.sql import SparkSession
from pyspark.sql.functions import monotonically_increasing_id

# Download NLTK data for sentence tokenization
nltk.download('punkt')

from nltk.tokenize import sent_tokenize

long_text = """
Good morning/evening to all our listeners. Welcome to Geopop, the place where we explore and discuss current topics and global issues. Today, we will delve into the complex and longstanding conflict between Israel and Palestine.

To fully understand the conflict, it is essential to examine the historical context. The issue dates back many decades, with deep roots in the struggle for territorial control and coexistence between two peoples.

The heart of the dispute is the land, with both states claiming rights to specific areas. Israel, created in 1948, is recognized by many countries, but Palestine seeks its own independence and international recognition.

The conflict has been characterized by episodes of violence and constant tensions. Both parties have experienced suffering and loss of human lives, creating a cycle of revenge that has made reaching a lasting solution challenging.

Over the years, various attempts have been made to resolve the conflict through peace negotiations, but so far, a definitive solution has not been achieved. The international community continues to work to facilitate dialogue and promote lasting peace.

In conclusion, the conflict between Israel and Palestine is a complex and delicate issue that requires a balanced and multilateral approach. The hope is that, through dialogue and mutual understanding, we can envision a future where both peoples can coexist peacefully.

Thank you for listening. Keep following us on Geopop for further insights into global issues. Until the next episode!
"""


# Use NLTK to tokenize the text into sentences
sentences = sent_tokenize(long_text)

# Create a Spark session

# Create a DataFrame with sentences and their indices
data = [(index, sentence) for index, sentence in enumerate(sentences)]
df = spark.createDataFrame(data, ["index", "sentence"])

# Add a unique ID column to the DataFrame
df = df.withColumn("id", monotonically_increasing_id())

# Show the resulting DataFrame
df.show(truncate=False)
sentences = df.orderBy("index").select("sentence").rdd.flatMap(lambda x: x).collect()


In [0]:
from tortoise.utils.text import split_and_recombine_text
from time import time
import os

outpath = "results/longform/"
seed=1
voice_outpath = os.path.join(outpath, voice)
os.makedirs(voice_outpath, exist_ok=True)

voice_samples, conditioning_latents = load_voice(voice)

all_parts = []
for j, text in enumerate(sentences):
    gen = tts.tts_with_preset(text, voice_samples=voice_samples, conditioning_latents=conditioning_latents,
                              preset="fast", k=1, use_deterministic_seed=seed)
    gen = gen.squeeze(0).cpu()
    torchaudio.save(os.path.join(voice_outpath, f'{j}.wav'), gen, 24000)
    all_parts.append(gen)

full_audio = torch.cat(all_parts, dim=-1)
torchaudio.save(os.path.join(voice_outpath, 'combined.wav'), full_audio, 24000)
IPython.display.Audio(os.path.join(voice_outpath, 'combined.wav'))