<a href="https://colab.research.google.com/github/deedeeharris/AI/blob/main/bark/bark_text_to_speech_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text to Speech with [bark](https://github.com/suno-ai/bark)


**Yedidya Harris, April 2023**

[LinkedIn](https://www.linkedin.com/in/yedidya-harris/)

In this Jupyter notebook tutorial, we will explore the capabilities of **Bark, a transformer-based text-to-audio model** created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio, including music, background noise, and simple sound effects. 

We will learn how to use Bark to **generate audio from text prompts, including nonverbal communications** like laughing, sighing, and crying. 

## **Setup**

In [None]:
#@title 1. Mount drive and set cache folder in your Drive  (Run me, patience) { display-mode: "form" }

# mount your Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Set a path for the future downloaded models (download only once)
import os
# Set the path to the folder you want to create
folder_path = '/content/drive/MyDrive/projects/bark-voice/cache'  #@param {type:"string"}


# Create the folder if it doesn't exist
if not os.path.exists(folder_path):
    os.makedirs(folder_path)

# Set the XDG_CACHE_HOME environment variable to the folder path
os.environ['XDG_CACHE_HOME'] = folder_path
!pip install git+https://github.com/suno-ai/bark.git

# libs
from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio



In [None]:
#@title 2. Download models to cache folder, uncheck after first time (patience){ display-mode: "form" }
download = True #@param {type:"boolean"}

# download and load all models
# they'll be downloaded to your cache folder, comment the following line after first use
if download:
  preload_models()

## **Generate Voice**

You can add 'Metatags' within the words in order to generate sounds of laughter, sadness, etc. Use the following in square brackets: [Sad],[laughter], [laughs], [sighs], [music], [gasps], [clears throat].

If you add "-" or "...", it'll cause hesitations. Add "♪" before and after the words for song lyrics.

Capitalize for emphasis of a word.


In [15]:
#@title 3. Try it out (fill in, run, and wait){ display-mode: "form" }

# vars setup
gender = "MAN: " #@param ["MAN: ", "WOMAN: "]
text = "In this Jupyter notebook tutorial, we will explore the capabilities of Bark, a transformer-based text-to-audio model created by Suno." #@param {type:"string"}
text_prompt = gender + text
language = "en" #@param ["en", "de", "es", "fr", "hi", "it", "ja", "ko", "pl", "pt", "ru", "tr", "zh"]
speaker_number = "5" #@param ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
speaker = language+"_speaker_"+speaker_number

# generate
audio_array = generate_audio(text_prompt, history_prompt=speaker)
Audio(audio_array, rate=SAMPLE_RATE, autoplay=True)

100%|██████████| 100/100 [00:09<00:00, 10.79it/s]
100%|██████████| 36/36 [00:33<00:00,  1.06it/s]


In [16]:
#@title 4. Download Generated Audio (run and wait) { display-mode: "form" }
from scipy.io.wavfile import write as write_wav
from google.colab import files

# Save audio data to a WAV file
file_path = "/content/audio.wav"
write_wav(file_path, SAMPLE_RATE, audio_array)

# Download the audio file
files.download(file_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>