# Radio Transcrtiption using Whisper

This notebook will implement the necessary code for making the transcription of all our team message radios.


First, I need to **import the modules** and **verifying cuda is detecting my GPU**:

---

In [36]:
import os
import glob
import pandas as pd
import torch
import whisper
from tqdm.notebook import tqdm
import librosa
import numpy as np
import matplotlib.pyplot as plt

In [37]:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

CUDA available: True
CUDA device: NVIDIA GeForce RTX 4060 Laptop GPU


## Loading audio files

---

In [38]:
def load_audio_files(audio_dir = "../../f1-strategy/data/audio"):
    """
    Load all audio files from the directory structure.
    
    Parameters:
    -----------
    audio_dir : str
        Path to the main audio directory that contains driver subdirectories
        
    Returns:
    --------
    list
        List of dictionaries containing audio file paths and metadata
    """
    audio_files = []


    # Find all driver directories. For this, we use driver_* for matching every directory

    driver_dirs = glob.glob(os.path.join(audio_dir, 'driver_*'))

    for driver_dir in driver_dirs:
        # Extract the driver number out of the directory name

        driver_num = os.path.basename(driver_dir).replace('driver_(', '').replace(',)', '')
        # Find all audio files in this driver directory.
        # For now, only mp3 is in our directory. 
        # However, we add all those extensions if the directory changes in the future
        files = glob.glob(os.path.join(driver_dir, '*.mp3')) + \
                glob.glob(os.path.join(driver_dir, '*.wav')) + \
                glob.glob(os.path.join(driver_dir, '*.m4a')) + \
                glob.glob(os.path.join(driver_dir, '*.ogg'))
        
        for file_path in files:
            # Get the filename without the extension
            filename = os.path.basename(file_path)

            # Add to the list of audio files
            audio_files.append(
                {
                    "driver": driver_num,
                    "file_path": file_path,
                    "filename": filename
                }
            )
    print(f"Found {len(audio_files)} audio files across {len(driver_dir)} driver directories")

    return audio_files

Now, I´ll store all this audio files in a variable for transcribing them with Whisper

In [39]:
# EXECUTION CELL

audio_files = load_audio_files()

# Display the first few files with a simple verifying
if audio_files:
    print("\nFirst 5 audio files")
    for file in audio_files[:5]:
        print(f"Driver: {file['driver']}, File: {file['filename']}")
else:
    print("No audio files found. Check the path or add more debugging messages")

Found 210 audio files across 41 driver directories

First 5 audio files
Driver: 1, File: driver_(1,)_belgium_radio_39.mp3
Driver: 1, File: driver_(1,)_belgium_radio_40.mp3
Driver: 1, File: driver_(1,)_belgium_radio_60.mp3
Driver: 1, File: driver_(1,)_belgium_radio_62.mp3
Driver: 1, File: driver_(1,)_belgium_radio_63.mp3


## Transcribring the audios with Whisper

In [40]:
def transcribe_audio_files(audio_files, model_name="medium", output_csv="../outputs/week4/radios_raw.csv"):
    """
    Transcribe audio files using Whisper and save one entry per file.
    """
    # Load Whisper model
    print(f"Loading Whisper {model_name} model...")
    model = whisper.load_model(model_name)
    
    # Initialize results list
    results = []
    
    # Process each audio file
    print(f"Transcribing {len(audio_files)} audio files...")
    for i, audio_file in enumerate(audio_files):
        try:
            print(f"Processing file {i+1}/{len(audio_files)}: {audio_file['filename']}")
            
            # Normalize file path
            file_path = os.path.normpath(audio_file['file_path'])
            
            # Load audio file
            audio, sr = librosa.load(file_path, sr=16000, mono=True)
            
            # Get audio duration
            duration = librosa.get_duration(y=audio, sr=sr)
            
            # Perform transcription 
            result = model.transcribe(audio, task="transcribe", language="en", fp16=torch.cuda.is_available())
            
            # Combine all segments into a single text
            full_text = " ".join([segment["text"].strip() for segment in result["segments"]])
            
            # Add a single entry for the entire file
            results.append({
                'driver': audio_file['driver'],
                'filename': audio_file['filename'],
                'file_path': file_path,
                'text': full_text,
                'duration': duration
            })
                
            # Print the transcription
            print(f"Full transcription: {full_text}")
                
        except Exception as e:
            print(f"Error processing {audio_file['filename']}: {str(e)}")
    
    # Convert to DataFrame
    df = pd.DataFrame(results)
    
    # Save to CSV
    os.makedirs(os.path.dirname(output_csv), exist_ok=True)
    df.to_csv(output_csv, index=False)
    
    print(f"Saved {len(df)} transcriptions to {output_csv}")
    return df

#### Quick test with the tiny model to see everything is working fine


In [41]:
# Create output directory if it doesn't exist
os.makedirs("../../outputs/week4", exist_ok=True)

# Test with just a few files first to ensure everything works
if audio_files:
    test_files = audio_files[:2]  # Just take the first 2 files for a quick test
    print(f"Testing transcription with {len(test_files)} files...")
    test_df = transcribe_audio_files(
        test_files, 
        model_name="tiny",  # Use the tiny model for quick testing
        output_csv="../../outputs/week4/radios_test.csv"
    )
    
    # Show the results
    print("\nTranscription test results:")
    print(test_df.head())
else:
    print("No audio files available for transcription")

Testing transcription with 2 files...
Loading Whisper tiny model...


  checkpoint = torch.load(fp, map_location=device)


Transcribing 2 audio files...
Processing file 1/2: driver_(1,)_belgium_radio_39.mp3
Full transcription: So don't forget about Sisihe please. Are we both doing a lot? You just follow my instruction. Cute. Don't want to know what's going on to it. Max, please follow my instruction and trust it. Thank you.
Processing file 2/2: driver_(1,)_belgium_radio_40.mp3
Full transcription: Okay, back to our 20th grade, about 9 or 10 minutes. What do you thought? Can you get there or should we box? We need to box this last. To cover the land. I can't see the weather, we don't.
Saved 2 transcriptions to ../../outputs/week4/radios_test.csv

Transcription test results:
  driver                          filename  \
0      1  driver_(1,)_belgium_radio_39.mp3   
1      1  driver_(1,)_belgium_radio_40.mp3   

                                           file_path  \
0  ..\..\f1-strategy\data\audio\driver_(1,)\drive...   
1  ..\..\f1-strategy\data\audio\driver_(1,)\drive...   

                                

In [42]:
# Only run this once you've verified the test works correctly

# Uncomment the following lines to process all files
# print("Transcribing all audio files with medium model...")
# full_df = transcribe_audio_files(
#     audio_files,
#     model_name="medium",  # Using medium model for better quality
#     output_csv="../outputs/week4/radios_raw.csv"
# )
# 
# print("\nFull transcription complete!")
# print(f"Total segments: {len(full_df)}")