# Voice
The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. They can be used to:

Transcribe audio into whatever language the audio is in.
Translate and transcribe the audio into english.
File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

# Transcriptions
The transcriptions API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio. We currently support multiple input and output file formats.

By default, the response type will be json with the raw text included.

{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger.
....
}

In [9]:
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv
_ : bool = load_dotenv(find_dotenv()) # 

client: OpenAI = OpenAI()

audio_file = open("urdu.mp3", "rb")
transcript = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file, 
  response_format="text"
)

In [10]:
transcript

'میرا نام عریب احمد ہے\n'

# Translations
The translations API takes as input the audio file in any of the supported languages and transcribes, if necessary, the audio into English. This differs from our /Transcriptions endpoint since the output is not in the original input language and is instead translated to English text.

In [11]:
from openai import OpenAI
client = OpenAI()

audio_file= open("urdu.mp3", "rb")
transcript = client.audio.translations.create(
  model="whisper-1", 
  file=audio_file
)

transcript

Translation(text='My name is Areeb Ahmed.')

### We only support translation into english at this time.

In [5]:
%pip install pydub ffprobe --upgrade --force

Collecting pydubNote: you may need to restart the kernel to use updated packages.

  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting ffprobe
  Downloading ffprobe-0.5.zip (3.5 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: ffprobe
  Building wheel for ffprobe (setup.py): started
  Building wheel for ffprobe (setup.py): finished with status 'done'
  Created wheel for ffprobe: filename=ffprobe-0.5-py3-none-any.whl size=3414 sha256=8c7da580c5862c7215890c12a6affe1696b61687c9bdca1bfb757325b245a489
  Stored in directory: c:\users\dell\appdata\local\pip\cache\wheels\69\73\0b\d157d05e5a665857ca8aaf2ab607f09fcb60e361467d2574fa
Successfully built ffprobe
Installing collected packages: pydub, ffprobe
Successfully installed ffprobe-0.5 pydub-0.25.1


In [2]:
!brew install ffmpeg
!ffprobe -version

'brew' is not recognized as an internal or external command,
operable program or batch file.
'ffprobe' is not recognized as an internal or external command,
operable program or batch file.


# Longer inputs
By default, the Whisper API only supports files that are less than 25 MB. If you have an audio file that is longer than that, you will need to break it up into chunks of 25 MB's or less or used a compressed audio format. To get the best performance, we suggest that you avoid breaking the audio up mid-sentence as this may cause some context to be lost.

In [12]:
from pydub import AudioSegment

song = AudioSegment.from_mp3("./urdu.mp3")

# PyDub handles time in milliseconds
ten_minutes = 10 * 60 * 1000

first_10_minutes = song[:ten_minutes]

first_10_minutes.export("chuck_10_25mb.mp3", format="mp3")
first_10_minutes

FileNotFoundError: [WinError 2] The system cannot find the file specified

# Post-processing with GPT-4
The second method involves a post-processing step using GPT-4 or GPT-3.5-Turbo.

We start by providing instructions for GPT-4 through the system_prompt variable. Similar to what we did with the prompt parameter earlier, we can define our company and product names.

In [13]:
from openai import OpenAI

system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."
fake_company_filepath : str = "./urdu.mp3"

client = OpenAI()


def generate_corrected_transcript(temperature, system_prompt, audio_file):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        temperature=temperature,
        messages=[
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": audio_file.text
            }
        ]
    )
    display(response)
    return response.choices[0].message.content

corrected_text = generate_corrected_transcript(0, system_prompt, transcript)
corrected_text

ChatCompletion(id='chatcmpl-8Xouha6nPOOJoIrImGWwaAgIl8FQP', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='Nice to meet you, Areeb Ahmed. How can I assist you today?', role='assistant', function_call=None, tool_calls=None), logprobs=None)], created=1703071227, model='gpt-3.5-turbo-1106', object='chat.completion', system_fingerprint='fp_772e8125bb', usage=CompletionUsage(completion_tokens=17, prompt_tokens=145, total_tokens=162))

'Nice to meet you, Areeb Ahmed. How can I assist you today?'