### Transcribing Longer Videos

<font size='4'>Whisper AI has a limit on how large of a file it can process, so in order to be able to process longer videos, we have to convert the MP4 files to MP3. Video files contain both audio and video, making them much larger. Converting to MP3 keeps the audio quality high while significantly reducing file size.</font>

<font size='4'>Import necessary libraries.</font>

In [38]:
import os
import whisper

In [39]:
from moviepy.editor import (VideoClip, VideoFileClip, AudioFileClip)
import moviepy

<font size='4'>
The video file can be downloaded from [this item in the Industry Documents Collection](https://www.industrydocuments.ucsf.edu/tobacco/docs/#id=flkx0287)
We use moviepy to load the video and extract the audio. Since Whisper AI has a limit on how large a file it can process at once, if the audio file is too large, the model might fail or take an extremely long time.

The bitrate="40k" reduces file size while keeping speech clear for transcription.</font>

In [40]:
vid_path = "./all_hands.mp4"
audio_output = "audio_transcription.mp3"

vid_clip = VideoFileClip(vid_path)

audio_clip = vid_clip.audio

In [41]:
audio_clip.write_audiofile(audio_output, bitrate="40k")


[A                                                                    

MoviePy - Writing audio in audio_transcription.mp3


                                                                        
[A                                                                    

MoviePy - Done.


<font size='4'>Load and Run Whisper Model.

We load the Whisper model and transcribe the extracted MP3 file.</font>

In [43]:
# Load the model
model = whisper.load_model("base")
result = model.transcribe("./audio_transcription.mp3", fp16=False, word_timestamps=True)

# Print the transcription
print(result["text"])

 you One of the few companies that sort of did both they built up a what looked like for a while a pretty sizable business now of course in contrast to us not quite so. But they did pretty well with direct. So effectively not a ton of differentiation there but they were one of the earliest players in the e-commerce marketplace and they got a lot of the early traction there. They built up a lot of direct sales a lot of work with a lot of third parties. So this is interesting stuff and they were creative in building out all these things they're also pretty creative in building out different kinds of products they went so broad that they. They explored a whole bunch of different things. So as one of the remaining the few remaining independence in the market we saw an opportunity to. So to tap into some of the areas that they've explored over a number of years some of those things are potentially beneficial some are definitely beneficial to in the right hands here with us at jewel. So we'r

<font size='4'>Format Timestamps

Whisper returns timestamps in seconds, so we convert them to MM:SS format for better readability. We also format the transcription and store it in a structured list. 

Whisper would normally segment the timestamps by pauses in the audio, however, we have formatted the timestamps into 30-second intervals to ensure consistency and easy readability of the transcription. The interval length can be adjusted as desired!</font>

In [44]:
# add timestamps
def format_time(seconds):
    minutes, seconds = divmod(int(seconds), 60)
    return f"{minutes:02}:{seconds:02}"

# group timings into 30 second intervals
interval = 30
current_start = 0
current_text = ""

transcription_lines = []

for segment in result["segments"]:
    start = segment["start"]
    end = segment["end"]
    text = segment["text"]

    # add the text only if we are in the current interval
    if start < current_start + interval:
        current_text += " " + text
    else:
        # save previous interval's text
        end_time = current_start + interval
        transcription_lines.append(f"[{format_time(current_start)} - {format_time(end_time)}] {current_text.strip()}")
        
        # start new interval
        current_start += interval
        current_text = text

# add remaining text
if current_text:
    end_time = current_start + interval
    transcription_lines.append(f"[{format_time(current_start)} - {format_time(end_time)}] {current_text.strip()}")

<font size='4'>Let's look at the first 10 lines. Sometimes background noises get interpreted as speech, which is seen in the first line.</font>

In [47]:
for line in transcription_lines[:5]:
    print(line)

[00:00 - 00:30] you
[00:30 - 01:00] One of the few companies that sort of did both they built up a what looked like for a while a pretty sizable business now of course in contrast to us not quite so.  But they did pretty well with direct.
[01:00 - 01:30] So effectively not a ton of differentiation there but they were one of the earliest players in the e-commerce marketplace and they got a lot of the early traction there.  They built up a lot of direct sales a lot of work with a lot of third parties.
[01:30 - 02:00] So this is interesting stuff and they were creative in building out all these things they're also pretty creative in building out different kinds of products they went so broad that they.  They explored a whole bunch of different things. So as one of the remaining the few remaining independence in the market we saw an opportunity to.  So to tap into some of the areas that they've explored over a number of years some of those things are potentially beneficial some are definit

## Write Transcription to a Text File

<font size='4'>Now, we write the transcription into a .txt file for easy access.</font>

In [46]:
# Save the transcription to a text file
with open("all_hands_transcription.txt", "w", encoding="utf-8") as f:
    f.write("\n".join(transcription_lines))

print(f"Transcription saved to text file.")

Transcription saved to text file.
