# Introduction
In this lab, you'll learn how to transform spoken words from podcasts into written text. To make this happen, we'll utilize the Open Whisper model. Whisper comes in handy for the following tasks:

- Understanding spoken words in various languages.
- Converting spoken words from podcasts into written text.
- Translating speech from one language to another.
- Determining the language being spoken

# Objective
- Learn to transcribe an audio file using Whisper model
- Learn how to bring in and integrate library-based python models into the Copilot application

### 1. Import required Python library: 
Start by including the necessary Python library to access the Whisper model.
### 2. Transcribe the audio: 
Use Whisper to convert spoken words from audio into written text.
### 3. Get the transcribed text: 
Retrieve the transcribed text as the final output.

## Import python libraries

### Task 1: Import required Python libraries
We will start by inserting the required libraries for the Lab. These libraries help support the manipulation and translation of the media we pass in, in our case an MP3 podcast.  

Jn
#### Task 1.1 Requirements.txt
Let's start by adding our required libraries into `requirements.txt`

### Task 1.2 Import Summary
Lets learn about what our imports do:

`import whisper:`

The Whisper model is a speech to text model from OpenAI that you can use to transcribe audio files. The model is trained on a large dataset of English audio and text. The model is optimized for transcribing audio files that contain speech in English.

`from pydub import AudioSegment and from pydub.silence import split_on_silence:`

A library to manipulate audio with a simple and easy high-level interface. AudioSegment is used for representing audio segments, and split_on_silence is a function for splitting audio based on detected silence.

`ort.set_default_logger_severity(3):`

This line sets the default logging severity for ONNX Runtime to 3, which corresponds to a specific level of logging detail (like errors only).

`imports from langchain.prompts:`

LangChain is a library related to language processing or generation. The imports here are different types of prompt templates (AIMessagePromptTemplate, ChatPromptTemplate, etc.) used in the application, for AI interactions.

`from langchain.schema import AIMessage, HumanMessage, SystemMessage:`

This structures or generates a schema for messages in a chatbot or AI system. They define how different types of messages (AI-generated, human-generated, system messages) are structured.

In [6]:
%pip install -r requirements.txt

Collecting python-dotenv==1.0.1
  Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Collecting pydub==0.25.1
  Using cached pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting openai-whisper==20231117
  Using cached openai-whisper-20231117.tar.gz (798 kB)
  Installing build dependencies ... [?25l- \ | / done
[?25h  Getting requirements to build wheel ... [?25l- done
[?25h    Preparing wheel metadata ... [?25l- done
[?25hCollecting openai==1.14.0
  Using cached openai-1.14.0-py3-none-any.whl (257 kB)
Collecting langchain==0.1.12
  Using cached langchain-0.1.12-py3-none-any.whl (809 kB)
Collecting triton<3,>=2.0.0
  Downloading triton-2.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (167.9 MB)
[K     |████████████████████████████████| 167.9 MB 19 kB/s s eta 0:00:01
Collecting tiktoken
  Downloading tiktoken-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 84.0 MB/s eta 0:00:

In [7]:
import whisper
from pydub import AudioSegment
from pydub.silence import split_on_silence

from langchain.prompts import (AIMessagePromptTemplate, ChatPromptTemplate,
                               HumanMessagePromptTemplate, PromptTemplate,
                               SystemMessagePromptTemplate)
from langchain.schema import AIMessage, HumanMessage, SystemMessage

## Task 2: Get input file and chunk it into pieces

In [4]:
# Inputs about the podcast
podcast_url = "https://www.microsoft.com/behind-the-tech"
podcast_audio_file = "../data/PodcastSnippet.mp3"

In [5]:
# Chunk up the audio file 
sound_file = AudioSegment.from_mp3(podcast_audio_file)
audio_chunks = split_on_silence(sound_file, min_silence_len=1000, silence_thresh=-40 )
count = len(audio_chunks)
print("Audio split into " + str(count) + " audio chunks")

NameError: name 'AudioSegment' is not defined

## Task 3: Transcribe the Audio

 In this task we will transcribe the audio to text using the Whisper model!

In [6]:
import openai
import time
import os

openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")  # your endpoint should look like the following https://YOUR_RESOURCE_NAME.openai.azure.com/
openai.api_type = "azure"
openai.api_version = "2024-02-01"

model_name = "whisper"
deployment_id = "YOUR-DEPLOYMENT-NAME-HERE" #This will correspond to the custom name you chose for your deployment when you deployed a model."
audio_language="en"

audio_test_file = podcast_audio_file

result = openai.Audio.transcribe(
            file=open(audio_test_file, "rb"),            
            model=model_name,
            deployment_id=deployment_id
        )

print(result)

APIRemovedInV1: 

You tried to access openai.Audio, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


In [6]:
# Call Whisper to transcribe audio
model = whisper.load_model("base")
transcript = ""
for i, chunk in enumerate(audio_chunks):
    # If you have a long audio file, you can enable this to only run for a subset of chunks
    if i < 10 or i > count - 10:
        out_file = "chunk{0}.wav".format(i)
        print("Exporting", out_file)
        chunk.export(out_file, format="wav")
        result = model.transcribe(out_file)
        transcriptChunk = result["text"]
        print(transcriptChunk)
        
        # Append transcript in memory if you have sufficient memory
        transcript += " " + transcriptChunk

print("Transcript: \n")
print(transcript)
print("\n")

Exporting chunk0.wav
 Neil deGrasse Tyson is one of America's best known astrophysicist and a beloved educator and advocate for the sciences. He has a great talent for presenting complex concepts in a clear and accessible manner. He's the head of the Hayden Planetarium and has been the director there since 1996. He's hosted numerous space-related TV and radio programs, published several books, and hosts the podcast StarTalk Radio. I am thrilled to have you on the podcast today.
Exporting chunk1.wav
 Well, thanks for having me. So why do you take you this long to invite me? I just want to know. Shame, shame on me. I'm not hidden. Right. All right. And I will say, when I started this podcast, and when I wrote my book, and I started doing this very uncomfortable thing for me, which is trying to talk more about technology in the public, you were literally my role model. I said, Neil deGrasse Tyson does such a wonderful job communicating about the importance and the value of science to the 



## Task 4: Save transcription into file

In [None]:
# Specify the file path
file_path = "transcript.txt"

# Write the content to the file
with open(file_path, "w") as file:
    file.write(transcript)