# Overview

This tutorial will guide you through the process of converting spoken language from podcasts into written text using the Open Whisper model. The Whisper model is useful for:

- Interpreting spoken words in multiple languages.
- Transcribing spoken words from podcasts into text.
- Translating spoken language from one to another.
- Identifying the language being spoken.

### Goals

- Understand how to transcribe an audio file using the Whisper model.
- Learn to incorporate and integrate Python library-based models into the Copilot application.

### Steps

1. **Include Necessary Python Library:** Begin by importing the required Python library to utilize the Whisper model.

2. **Audio Transcription:** Employ Whisper to transcribe spoken words from audio into text.

3. **Retrieve Transcribed Text:** Obtain the transcribed text as the final output and store it in a file for subsequent processing.

### Requirements.txt
Before running the code, make sure that you have installed the necessary Python libraries by executing `pip install -r requirements.txt` in the Terminal.


### Get input file
This Python code snippet is used to read podcast audio file.

In [None]:
# Inputs about the podcast
podcast_url = "https://www.microsoft.com/behind-the-tech"
podcast_audio_file = "../data/PodcastSnippet.mp3"

## 1. Transcribe the Audio with Azure OpenAI Whisper model

In this task, we will transcribe the audio to text using the Whisper model!


### Connect to Azure OpenAI service

The provided code is importing the `load_dotenv` function from the `dotenv` module and using it to load environment variables from a `.env`

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found Azure OpenAI API Base Endpoint: " + os.getenv("AZURE_OPENAI_ENDPOINT"))
else: 
    print("Azure OpenAI API Base Endpoint not found. Have you configured the .env file?")

This Python code snippet is used to create a client for the Azure OpenAI service. The client is used to interact with the OpenAI API, which provides various AI capabilities such as natural language processing, machine learning models, and more.

The `AzureOpenAI` class from the `openai` module is imported at the beginning. This class is used to create a client that can interact with the Azure OpenAI service.

An instance of the `AzureOpenAI` class is created and assigned to the variable `client`. The constructor for this class takes three arguments: `azure_endpoint`, `api_key`, and `api_version`. These values are fetched from the environment variables using the `os.getenv` function.

- `azure_endpoint` is the URL of the Azure OpenAI service that the client will interact with.
- `api_key` is the secret key used for authentication with the Azure OpenAI service.
- `api_version` is the version of the Azure OpenAI API that the client will use.

Next, the Whisper model to be used for transcription is fetched from the environment variables and stored in the `model` variable.

In [None]:
#create client to Azure OpenAI servoce
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key = os.getenv("AZURE_OPENAI_API_KEY"),
    api_version = os.getenv("AZURE_OPENAI_API_VERSION")
)

model =  os.getenv("AZURE_OPENAI_WHISPER_MODEL")
print("Model: ", model)

### Send request to Whisper model

This Python code snippet is used to transcribe audio from a file using the Whisper model from the Azure OpenAI service.

The `client.audio.transcriptions.create` method is called to create a new transcription. This method takes two arguments: the audio file to transcribe and the model to use for transcription. The audio file is opened in binary mode for reading and passed to the method. The model to use for transcription is specified by the `model` variable, which is expected to be 'whisper'.

The `create` method returns a result object, which contains the transcription result. The transcribed text can be accessed through the `text` attribute of the result object. This transcribed text is stored in the `transcript` variable.

In [None]:
#Call Whisper model for transcription

result = client.audio.transcriptions.create(
            file=open(podcast_audio_file, "rb"),            
            model=model
            )

transcript = result.text

print(transcript)

### Save transcription into file

This Python code snippet is used to write the transcribed text to `"transcript.txt"` file.

In [None]:
# Specify the file path
file_path = "transcript.txt"

# Write the content to the file
with open(file_path, "w") as file:
    file.write(transcript)

## 2. Transcribe chunked file with whisper library
In the subsequent section of this Lab, we'll explore an alternative method for transcribing audio. If you're dealing with a lengthy audio file, it might be beneficial to first divide it into smaller segments and then process each segment individually.

Before proceeding, ensure that the `ffmpeg` library is installed on your system. You can do this by executing the following commands in the terminal:
```sh
sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get install ffmpeg
```

### Import required Python libraries
We will start by inserting the required libraries. These libraries help support the manipulation and translation of the media we pass in, in our case an MP3 podcast.  

Lets learn about what our imports do:

`import whisper:`

The Whisper model is a speech to text model from OpenAI that you can use to transcribe audio files. The model is trained on a large dataset of English audio and text. The model is optimized for transcribing audio files that contain speech in English.

`from pydub import AudioSegment and from pydub.silence import split_on_silence:`

A library to manipulate audio with a simple and easy high-level interface. AudioSegment is used for representing audio segments, and split_on_silence is a function for splitting audio based on detected silence.

In [None]:
import whisper
from pydub import AudioSegment
from pydub.silence import split_on_silence

### Chunk up the audio file 
This Python code snippet is used to split an audio file into chunks based on silence.

The `split_on_silence` function splits the audio into chunks at points of silence that are at least 1000 milliseconds (or 1 second) long and quieter than -40 dBFS. The chunks are stored in the `audio_chunks` list.

In [None]:
# Chunk up the audio file 
sound_file = AudioSegment.from_mp3(podcast_audio_file)
audio_chunks = split_on_silence(sound_file, min_silence_len=1000, silence_thresh=-40 )
count = len(audio_chunks)
print("Audio split into " + str(count) + " audio chunks")

### Call Whisper to transcribe audio
This Python code snippet is used to transcribe each chunk of audio and concatenate the transcriptions.

- The `whisper.load_model` method is called with `"base"` as the argument to load the base model of Whisper. This model is stored in the `model` variable.

- We iterate over `audio_chunks` in loop where each chunk is transcribed, and the transcription is added to `transcript`.

In [None]:
# Call Whisper to transcribe audio
model = whisper.load_model("base")
transcript = ""
for i, chunk in enumerate(audio_chunks):
    # If you have a long audio file, you can enable this to only run for a subset of chunks
    if i < 10 or i > count - 10:
        out_file = "chunk{0}.wav".format(i)
        print("Exporting", out_file)
        chunk.export(out_file, format="wav")
        result = model.transcribe(out_file)
        transcriptChunk = result["text"]
        print(transcriptChunk)
        
        # Append transcript in memory if you have sufficient memory
        transcript += " " + transcriptChunk

print("Transcript: \n")
print(transcript)
print("\n")