### Transcribing Longer Videos

Whisper AI has a limit on how large of a file it can process, so in order to be able to process longer videos, we have to convert the MP4 files to MP3. Video files contain both audio and video, making them much larger. Converting to MP3 keeps the audio quality high while significantly reducing file size. 

Import necessary libraries.

In [2]:
import os
import whisper

In [3]:
from moviepy.editor import (VideoClip, VideoFileClip, AudioFileClip)
import moviepy

We use moviepy to load the video and extract the audio. Since Whisper AI has a limit on how large a file it can process at once, if the audio file is too large, the model might fail or take an extremely long time.

The bitrate="40k" reduces file size while keeping speech clear for transcription.

In [4]:
vid_path = "./all_hands.mp4"
audio_output = "audio_transcription.mp3"

vid_clip = VideoFileClip(vid_path)

audio_clip = vid_clip.audio

In [5]:
audio_clip.write_audiofile(audio_output, bitrate="40k")

MoviePy - Writing audio in audio_transcription.mp3


chunk:   0%|          | 0/56913 [00:00<?, ?it/s, now=None]

                                                                        

MoviePy - Done.




Load and Run Whisper Model.

We load the Whisper model and transcribe the extracted MP3 file.

In [6]:
# Load the model
model = whisper.load_model("base")
result = model.transcribe("./audio_transcription.mp3", fp16=False)

# Print the transcription
print(result["text"])

  checkpoint = torch.load(fp, map_location=device)


 you One of the few companies that sort of did both they built up a what looked like for a while a pretty sizable business now of course in contrast to us not quite so. But they did pretty well with direct. Effectively not a ton of differentiation there, but they were one of the earliest players in the e-commerce marketplace and they got a lot of the early traction there. Built up a lot of direct sales a lot of work with a lot of third parties. This is interesting stuff and they were creative in building out all these things they're also pretty creative in building out different kinds of products they went so broad that they. You know pretty they explored a whole bunch of different things so as one of the remaining the few remaining independence in the market we saw an opportunity to. And then we went back into some of the areas that they've explored over a number of years some of those things are potentially beneficial. Some are definitely beneficial to in the right hands here with us

Format Timestamps

Whisper returns timestamps in seconds, so we convert them to MM:SS format for better readability. We also format the transcription and store it in a structured list.

In [16]:
# add timestamps
def format_time(seconds):
    minutes, seconds = divmod(int(seconds), 60)
    return f"{minutes:02}:{seconds:02}"

# Print transcription with formatted timestamps
transcription_lines = []
for segment in result["segments"]:
    start_time = format_time(segment["start"])
    end_time = format_time(segment["end"])
    text = segment["text"]
    transcription_lines.append(f"[{start_time} - {end_time}] {text}")

Write Transcription to a Text File

Now, we write the transcription into a .txt file for easy access.

In [19]:
# Save the transcription to a text file
with open("all_hands_transcription.txt", "w", encoding="utf-8") as f:
    f.write("\n".join(transcription_lines))

print(f"Transcription saved to text file.")

Transcription saved to text file.
