<div style="width: 100%; overflow: hidden;">
    <div style="width: 150px; float: left;"> <img src="https://raw.githubusercontent.com/DataForScience/Networks/master/data/D4Sci_logo_ball.png" alt="Data For Science, Inc" align="left" border="0" width=150px> </div>
    <div style="float: left; margin-left: 10px;"> <h1>LLMs for Data Science</h1>
    <h1>Text to Speech with OpenAI</h1>
        <p>Bruno Gonçalves<br/>
        <a href="http://www.data4sci.com/">www.data4sci.com</a><br/>
            @bgoncalves, @data4sci</p></div>
</div>

In [1]:
from collections import Counter, defaultdict
import random

import pandas as pd
import numpy as np

import matplotlib
import matplotlib.pyplot as plt 

import openai
from openai import OpenAI

import tqdm as tq
from tqdm.notebook import tqdm

import watermark

%load_ext watermark
%matplotlib inline

We start by printing out the versions of the libraries we're using for future reference

In [2]:
%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.12.3

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 23.6.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Git hash: 029419f525238e1b9a7c22f96809155546a701ad

pandas    : 1.5.3
numpy     : 1.26.4
tqdm      : 4.66.4
matplotlib: 3.8.0
watermark : 2.4.3
openai    : 1.30.5



Load default figure style

In [3]:
plt.style.use('d4sci.mplstyle')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

# Audio to Text

In [4]:
client = OpenAI()

Let us parse a small local file

In [5]:
transcript = client.audio.transcriptions.create(
    file = open("data/gettysburg10.wav", "rb"),
    model = "whisper-1",
    response_format="text",
    language="en"
)

And the transcript is simply:

In [6]:
print(transcript)

Four score and seven years ago, our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal.



We can also ask for SRT formatted output, that includes time indices

In [7]:
%%time
transcript = client.audio.transcriptions.create(
    file = open("data/gettysburg10.wav", "rb"),
    model = "whisper-1",
    response_format="srt",
    language="en"
)

CPU times: user 4.18 ms, sys: 2.96 ms, total: 7.14 ms
Wall time: 843 ms


In [8]:
print(transcript)

1
00:00:00,000 --> 00:00:05,660
Four score and seven years ago, our fathers brought forth on this continent a new nation,

2
00:00:05,660 --> 00:00:09,880
conceived in liberty and dedicated to the proposition that all men are created equal.





And ask it to translate the text directly into Spanish

In [9]:
%%time
transcript = client.audio.transcriptions.create(
    file = open("data/gettysburg10.wav", "rb"),
    model = "whisper-1",
    response_format="text",
    language="es"
)

CPU times: user 5.96 ms, sys: 3.79 ms, total: 9.75 ms
Wall time: 993 ms


In [10]:
print(transcript)

Hace cuatro y siete años, nuestros padres trajeron a este continente una nueva nación concebida en libertad y dedicada a la proposición de que todos los hombres son creados iguales.



# Text to Speech

Now the opposite approach, going from written text to high quality audio

In [11]:
quote = """Scientists have calculated that the chances of something so patently absurd 
actually existing are millions to one.
But magicians have calculated that million-to-one chances crop up nine times out of ten."""

You can learn more about text to speech (and sample the various voices) in the [Official documentation](https://platform.openai.com/docs/guides/text-to-speech/quickstart)

In [12]:
%%time
audio = client.audio.speech.create(
    input=quote, 
    model="tts-1", 
    voice='fable',
    response_format='mp3')

CPU times: user 14.7 ms, sys: 6.61 ms, total: 21.3 ms
Wall time: 2.49 s


Which we can write directly to a file

In [13]:
audio.write_to_file('data/pratchett.mp3')

In [14]:
!open data/pratchett.mp3

<center>
     <img src="https://raw.githubusercontent.com/DataForScience/Networks/master/data/D4Sci_logo_full.png" alt="Data For Science, Inc" align="center" border="0" width=300px> 
</center>