# Introduction
In this lab, you'll learn how to transform spoken words from podcasts into written text. To make this happen, we'll utilize the Open Whisper model. Whisper comes in handy for the following tasks:

- Understanding spoken words in various languages.
- Converting spoken words from podcasts into written text.
- Translating speech from one language to another.
- Determining the language being spoken

# Objective
- Learn to transcribe an audio file using Whisper model
- Learn how to bring in and integrate library-based python models into the Copilot application

### 1. Import required Python library: 
Start by including the necessary Python library to access the Whisper model.
### 2. Transcribe the audio: 
Use Whisper to convert spoken words from audio into written text.
### 3. Get the transcribed text: 
Retrieve the transcribed text as the final output.

## Import python libraries

### Task 1: Import required Python libraries
We will start by inserting the required libraries for the Lab. These libraries help support the manipulation and translation of the media we pass in, in our case an MP3 podcast.  

Jn
#### Task 1.1 Requirements.txt
Let's start by adding our required libraries into `requirements.txt`

### Task 1.2 Import Summary
Lets learn about what our imports do:

`import whisper:`

The Whisper model is a speech to text model from OpenAI that you can use to transcribe audio files. The model is trained on a large dataset of English audio and text. The model is optimized for transcribing audio files that contain speech in English.

`from pydub import AudioSegment and from pydub.silence import split_on_silence:`

A library to manipulate audio with a simple and easy high-level interface. AudioSegment is used for representing audio segments, and split_on_silence is a function for splitting audio based on detected silence.

`ort.set_default_logger_severity(3):`

This line sets the default logging severity for ONNX Runtime to 3, which corresponds to a specific level of logging detail (like errors only).

`imports from langchain.prompts:`

LangChain is a library related to language processing or generation. The imports here are different types of prompt templates (AIMessagePromptTemplate, ChatPromptTemplate, etc.) used in the application, for AI interactions.

`from langchain.schema import AIMessage, HumanMessage, SystemMessage:`

This structures or generates a schema for messages in a chatbot or AI system. They define how different types of messages (AI-generated, human-generated, system messages) are structured.

In [None]:
%pip install -r requirements.txt

In [None]:
import whisper
from pydub import AudioSegment
from pydub.silence import split_on_silence

from langchain.prompts import (AIMessagePromptTemplate, ChatPromptTemplate,
                               HumanMessagePromptTemplate, PromptTemplate,
                               SystemMessagePromptTemplate)
from langchain.schema import AIMessage, HumanMessage, SystemMessage

## Task 2: Get input file and chunk it into pieces

In [5]:
# Inputs about the podcast
podcast_url = "https://www.microsoft.com/behind-the-tech"
podcast_audio_file = "../data/PodcastSnippet.mp3"

In [None]:
# Chunk up the audio file 
sound_file = AudioSegment.from_mp3(podcast_audio_file)
audio_chunks = split_on_silence(sound_file, min_silence_len=1000, silence_thresh=-40 )
count = len(audio_chunks)
print("Audio split into " + str(count) + " audio chunks")

In [2]:
import os
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found Azure OpenAI API Base Endpoint: " + os.getenv("AZURE_OPENAI_ENDPOINT"))
else: 
    print("Azure OpenAI API Base Endpoint not found. Have you configured the .env file?")

Found Azure OpenAI API Base Endpoint: https://swed-oai.openai.azure.com/


In [9]:
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key = os.getenv("AZURE_OPENAI_API_KEY"),
    api_version = os.getenv("AZURE_OPENAI_API_VERSION")
)

model =  os.getenv("AZURE_OPENAI_WHISPER_MODEL")
print(model)

whisper


## Task 3: Transcribe the Audio

 In this task we will transcribe the audio to text using the Whisper model!

In [10]:
import openai
import time
import os

openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")  # your endpoint should look like the following https://YOUR_RESOURCE_NAME.openai.azure.com/
openai.api_type = "azure"
openai.api_version = "2024-02-01"

model_name = "whisper"
deployment_id = model #This will correspond to the custom name you chose for your deployment when you deployed a model."
audio_language="en"

audio_test_file = podcast_audio_file

result = client.audio.transcribe(
            file=open(audio_test_file, "rb"),            
            model=model_name,
            deployment_id=deployment_id
        )

print(result)

AttributeError: 'Audio' object has no attribute 'transcribe'

In [None]:
# Call Whisper to transcribe audio
model = whisper.load_model("base")
transcript = ""
for i, chunk in enumerate(audio_chunks):
    # If you have a long audio file, you can enable this to only run for a subset of chunks
    if i < 10 or i > count - 10:
        out_file = "chunk{0}.wav".format(i)
        print("Exporting", out_file)
        chunk.export(out_file, format="wav")
        result = model.transcribe(out_file)
        transcriptChunk = result["text"]
        print(transcriptChunk)
        
        # Append transcript in memory if you have sufficient memory
        transcript += " " + transcriptChunk

print("Transcript: \n")
print(transcript)
print("\n")

## Task 4: Save transcription into file

In [None]:
# Specify the file path
file_path = "transcript.txt"

# Write the content to the file
with open(file_path, "w") as file:
    file.write(transcript)