# Introduction
In this lab, you'll learn how to transform spoken words from podcasts into written text. To make this happen, we'll utilize the Open Whisper model. Whisper comes in handy for the following tasks:

- Understanding spoken words in various languages.
- Converting spoken words from podcasts into written text.
- Translating speech from one language to another.
- Determining the language being spoken

# Objective
- Learn to transcribe an audio file using Whisper model
- Learn how to bring in and integrate library-based python models into the Copilot application

### 1. Import required Python library: 
Start by including the necessary Python library to access the Whisper model.
### 2. Transcribe the audio: 
Use Whisper to convert spoken words from audio into written text.
### 3. Get the transcribed text: 
Retrieve the transcribed text as the final output.

## Import python libraries

### Task 1: Import required Python libraries
We will start by inserting the required libraries for the Lab. These libraries help support the manipulation and translation of the media we pass in, in our case an MP3 podcast.  

Jn
#### Task 1.1 Requirements.txt
Let's start by adding our required libraries into `requirements.txt`

### Task 1.2 Import Summary
Lets learn about what our imports do:

`import whisper:`

The Whisper model is a speech to text model from OpenAI that you can use to transcribe audio files. The model is trained on a large dataset of English audio and text. The model is optimized for transcribing audio files that contain speech in English.

`from pydub import AudioSegment and from pydub.silence import split_on_silence:`

A library to manipulate audio with a simple and easy high-level interface. AudioSegment is used for representing audio segments, and split_on_silence is a function for splitting audio based on detected silence.

`ort.set_default_logger_severity(3):`

This line sets the default logging severity for ONNX Runtime to 3, which corresponds to a specific level of logging detail (like errors only).

`imports from langchain.prompts:`

LangChain is a library related to language processing or generation. The imports here are different types of prompt templates (AIMessagePromptTemplate, ChatPromptTemplate, etc.) used in the application, for AI interactions.

`from langchain.schema import AIMessage, HumanMessage, SystemMessage:`

This structures or generates a schema for messages in a chatbot or AI system. They define how different types of messages (AI-generated, human-generated, system messages) are structured.

In [None]:
%pip install -r requirements.txt

In [12]:
#import whisper
from pydub import AudioSegment
from pydub.silence import split_on_silence

#from langchain.prompts import (AIMessagePromptTemplate, ChatPromptTemplate,
#                               HumanMessagePromptTemplate, PromptTemplate,
#                               SystemMessagePromptTemplate)
#from langchain.schema import AIMessage, HumanMessage, SystemMessage

## Get input file

In [8]:
# Inputs about the podcast
podcast_url = "https://www.microsoft.com/behind-the-tech"
podcast_audio_file = "../data/PodcastSnippet.mp3"

## Transcribe the Audio with Azure OpenAI Whisper model

 In this task we will transcribe the audio to text using the Whisper model!

In [3]:
import os
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found Azure OpenAI API Base Endpoint: " + os.getenv("AZURE_OPENAI_ENDPOINT"))
else: 
    print("Azure OpenAI API Base Endpoint not found. Have you configured the .env file?")

Found Azure OpenAI API Base Endpoint: https://swed-oai.openai.azure.com/


In [17]:
#create client to Azure OpenAI servoce
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key = os.getenv("AZURE_OPENAI_API_KEY"),
    api_version = os.getenv("AZURE_OPENAI_API_VERSION")
)

model =  os.getenv("AZURE_OPENAI_WHISPER_MODEL")
print("Model: ", model)

Model:  whisper


In [24]:
#Call Whisper model for transcription

result = client.audio.transcriptions.create(
            file=open(podcast_audio_file, "rb"),            
            model=model
            )

transcript = result.text

print(transcript)

Neil deGrasse Tyson is one of America's best-known astrophysicists and a beloved educator and advocate for the sciences. He has a great talent for presenting complex concepts in a clear and accessible manner. He's the head of the Hayden Planetarium and has been the director there since 1996. He's hosted numerous space-related TV and radio programs, published several books, and hosts the podcast StarTalk Radio. I am thrilled to have you on the podcast today, Neil. Well, thanks for having me. Why did it take you this long to invite me? I don't know. Shame, shame, shame on me. I'm not hidden, right? No, you aren't. And I will say, when I started this podcast and when I wrote my book and I started doing this very uncomfortable thing for me, which is trying to talk more about technology in the public, you were literally my role model. I said, Neil deGrasse Tyson does such a wonderful job communicating about the importance and the value of science to the public. We don't have people doing th

In [26]:
# Specify the file path
file_path = "transcript.txt"

# Write the content to the file
with open(file_path, "w") as file:
    file.write(transcript)

## Transcribe audio with Azure AI services,  Speech-to-Text model

## Transcribe chunked file with whisper library

In [13]:
# Chunk up the audio file 
sound_file = AudioSegment.from_mp3(podcast_audio_file)
audio_chunks = split_on_silence(sound_file, min_silence_len=1000, silence_thresh=-40 )
count = len(audio_chunks)
print("Audio split into " + str(count) + " audio chunks")

Audio split into 2 audio chunks


In [16]:
import whisper
# Call Whisper to transcribe audio
model = whisper.load_model("base")
transcript = ""
for i, chunk in enumerate(audio_chunks):
    # If you have a long audio file, you can enable this to only run for a subset of chunks
    if i < 10 or i > count - 10:
        out_file = "chunk{0}.wav".format(i)
        print("Exporting", out_file)
        chunk.export(out_file, format="wav")
        result = model.transcribe(out_file)
        transcriptChunk = result["text"]
        print(transcriptChunk)
        
        # Append transcript in memory if you have sufficient memory
        transcript += " " + transcriptChunk

print("Transcript: \n")
print(transcript)
print("\n")

Exporting chunk0.wav
 Neil deGrasse Tyson is one of America's best known astrophysicist and a beloved educator and advocate for the sciences. He has a great talent for presenting complex concepts in a clear and accessible manner. He's the head of the Hayden Planetarium and has been the director there since 1996. He's hosted numerous space-related TV and radio programs, published several books, and hosts the podcast StarTalk Radio. I am thrilled to have you on the podcast today.
Exporting chunk1.wav
 Well, thanks for having me. So why do you take you this long to invite me? I just want to know. Shame, shame on me. I'm not hidden. Right. All right. And I will say, when I started this podcast, and when I wrote my book, and I started doing this very uncomfortable thing for me, which is trying to talk more about technology in the public, you were literally my role model. I said, Neil deGrasse Tyson does such a wonderful job communicating about the importance and the value of science to the 



Save transcription into file