# Audio transcription and translation (preview) example

The example shows how to use the Azure OpenAI Whisper model to transcribe and translate audio files.

### Install dependencies

First, we install the necessary dependencies.

In [None]:
! pip install "openai>=0.28.1"
! pip install python-dotenv

### Setup

Next, we'll import our libraries and configure the Python OpenAI SDK to work with the Azure OpenAI service.

In [1]:
import os
import openai
import dotenv

dotenv.load_dotenv()

True

In [9]:
openai.api_base = os.environ["OPENAI_API_BASE"]
openai.api_key = os.environ["OPENAI_API_KEY"]
openai.api_type = "azure"
openai.api_version = "2023-09-01-preview"

deployment_id = os.environ["WHISPER_DEPLOYMENT_ID"]

### Audio transcription

Audio transcription, or speech-to-text, is the process of converting spoken words into text. Use the `openai.Audio.transcribe` method to transcribe an audio file stream to text. 

In [10]:
transcription = openai.Audio.transcribe(
    file=open("recordings/audio_en.wav", "rb"),
    model="whisper-1",
    deployment_id=deployment_id,
)
print(transcription.text)

The ocelot, Lepardus paradalis, is a small wild cat native to the southwestern United States, Mexico, and Central and South America. This medium-sized cat is characterized by solid black spots and streaks on its coat, round ears, and white neck and undersides. It weighs between 8 and 15.5 kilograms, 18 and 34 pounds, and reaches 40 to 50 centimeters – 16 to 20 inches – at the shoulders. It was first described by Carl Linnaeus in 1758. Two subspecies are recognized, L. p. paradalis and L. p. mitis. Typically active during twilight and at night, the ocelot tends to be solitary and territorial. It is efficient at climbing, leaping, and swimming. It preys on small terrestrial mammals such as armadillo, opossum, and lagomorphs.


### Audio translation

Audio translation can be used to translate the given audio file into English (Note: only English supported currently). In this example, we provide an audio file about Custom Speech recorded in Spanish. The result contains the English translation of the audio file.

In [6]:
translation = openai.Audio.translate(
    file=open("recordings/audio_es.wav", "rb"),
    model="whisper-1",
    deployment_id=deployment_id
)
print(translation.text)

The content, such as data, models, tests and connection points, is organized into projects in the Custom Speech Portal. Each project is specific to a domain and a country or language. For example, you can create a project for call centers that use English in the United States. To create your first project, select speech-to-text-custom-speech. Then, click on New Project. Follow the assistant's instructions to create the project. After creating the project, you will see four tabs, data, tests, training and implementation. Use the links included in the following steps to learn how to use each tab.
