# Transcribe_audio notebook

## Purpose
---
- Uses openai's whisper-large-v3 model to take sample audio files, then automate the transcription process.
- Coqui-Ai XTTS fine-tuning process requires a text-transcription for each audio file. If an audio sample does not have this, it would be difficult to write, by hand, the text needed.
- Will be used when annotating speech from personal audio samples as well.
---

## How to use
---
- Requires torch and HuggingFace's transformers API to use the whisper-large-v3 model.
- Define an import dir path where all your .wav audio files exist. 
- Define an output path for a csv file. Here, as each audio file is transcribed, its file-name and transcription will be written to the output CSV. This can be used as the metadata file for the fine-tuning process.

---

In [1]:
'''Requires FFMEG to be installed for whipser model'''
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import csv
import os

In [2]:
'''Load in whipster model using transformers api'''
# Set device and torch data type
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
print(device)

# Model identifier
model_id = "openai/whisper-large-v3" # Was about 3G

# Load the model and move it to the selected device
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, 
    torch_dtype=torch_dtype, 
    low_cpu_mem_usage=False, 
    use_safetensors=True
)
model.to(device)

# Load the processor
processor = AutoProcessor.from_pretrained(model_id, language='en')

# Create the speech recognition pipeline
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    batch_size=32, 
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

cuda:0


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [6]:
'''Step up paths for imput and output files'''
# Define a path to an output CSV to save transcriptions
outputPath = "datasets/normalExample/metadata.csv"

# Define where sample audio files are coming from
audioDir = "datasets/normalExample/wavs/"

# Read in all files from chosen dir
fileList = os.listdir(audioDir)


In [7]:
'''Transcribe sample files'''
# Init list to hold all samples 
samples = []

'''Loop here to go through multiple .wav files if needed'''
for i in range(len(fileList)): 
    # Specify the path to your local .wav file
    fileName = fileList[i]
    audioPath = audioDir + fileName
    # Msg to show transcription is proceeding
    if i % 25 == 0:
        print(f"Transcribing {fileName}...")
    # Run the pipeline on the .wav file
    result = pipe(audioPath)["text"]
    # LJ speech format (filename, transcript, normalised transcript)
    samples.append((fileName.split('.')[0], result, result)) # no need to normalzied when fine-tuning. Just duplicate 2nd col


Transcribing chunk_0000.wav...
Transcribing chunk_0001.wav...
Transcribing chunk_0002.wav...
Transcribing chunk_0003.wav...
Transcribing chunk_0004.wav...
Transcribing chunk_0005.wav...
Transcribing chunk_0006.wav...
Transcribing chunk_0007.wav...
Transcribing chunk_0008.wav...
Transcribing chunk_0009.wav...
Transcribing chunk_0010.wav...




Transcribing chunk_0011.wav...
Transcribing chunk_0012.wav...
Transcribing chunk_0013.wav...
Transcribing chunk_0014.wav...
Transcribing chunk_0015.wav...
Transcribing chunk_0016.wav...
Transcribing chunk_0017.wav...
Transcribing chunk_0018.wav...
Transcribing chunk_0019.wav...
Transcribing chunk_0020.wav...
Transcribing chunk_0021.wav...
Transcribing chunk_0022.wav...
Transcribing chunk_0023.wav...
Transcribing chunk_0024.wav...
Transcribing chunk_0025.wav...
Transcribing chunk_0026.wav...
Transcribing chunk_0027.wav...
Transcribing chunk_0028.wav...
Transcribing chunk_0029.wav...
Transcribing chunk_0030.wav...
Transcribing chunk_0031.wav...
Transcribing chunk_0032.wav...
Transcribing chunk_0033.wav...
Transcribing chunk_0034.wav...
Transcribing chunk_0035.wav...
Transcribing chunk_0036.wav...
Transcribing chunk_0037.wav...
Transcribing chunk_0038.wav...
Transcribing chunk_0039.wav...
Transcribing chunk_0040.wav...
Transcribing chunk_0041.wav...
Transcribing chunk_0042.wav...
Transcri

Transcribing chunk_0276.wav...
Transcribing chunk_0277.wav...
Transcribing chunk_0278.wav...
Transcribing chunk_0279.wav...
Transcribing chunk_0280.wav...
Transcribing chunk_0281.wav...
Transcribing chunk_0282.wav...
Transcribing chunk_0283.wav...
Transcribing chunk_0284.wav...
Transcribing chunk_0285.wav...
Transcribing chunk_0286.wav...
Transcribing chunk_0287.wav...
Transcribing chunk_0288.wav...
Transcribing chunk_0289.wav...
Transcribing chunk_0290.wav...
Transcribing chunk_0291.wav...
Transcribing chunk_0292.wav...
Transcribing chunk_0293.wav...
Transcribing chunk_0294.wav...
Transcribing chunk_0295.wav...
Transcribing chunk_0296.wav...
Transcribing chunk_0297.wav...
Transcribing chunk_0298.wav...
Transcribing chunk_0299.wav...
Transcribing chunk_0300.wav...
Transcribing chunk_0301.wav...
Transcribing chunk_0302.wav...
Transcribing chunk_0303.wav...
Transcribing chunk_0304.wav...
Transcribing chunk_0305.wav...
Transcribing chunk_0306.wav...
Transcribing chunk_0307.wav...
Transcri

Transcribing chunk_0541.wav...
Transcribing chunk_0542.wav...
Transcribing chunk_0543.wav...
Transcribing chunk_0544.wav...
Transcribing chunk_0545.wav...
Transcribing chunk_0546.wav...
Transcribing chunk_0547.wav...
Transcribing chunk_0548.wav...
Transcribing chunk_0549.wav...
Transcribing chunk_0550.wav...
Transcribing chunk_0551.wav...
Transcribing chunk_0552.wav...
Transcribing chunk_0553.wav...
Transcribing chunk_0554.wav...
Transcribing chunk_0555.wav...
Transcribing chunk_0556.wav...
Transcribing chunk_0557.wav...
Transcribing chunk_0558.wav...
Transcribing chunk_0559.wav...
Transcribing chunk_0560.wav...
Transcribing chunk_0561.wav...
Transcribing chunk_0562.wav...
Transcribing chunk_0563.wav...
Transcribing chunk_0564.wav...
Transcribing chunk_0565.wav...
Transcribing chunk_0566.wav...
Transcribing chunk_0567.wav...
Transcribing chunk_0568.wav...
Transcribing chunk_0569.wav...
Transcribing chunk_0570.wav...
Transcribing chunk_0571.wav...
Transcribing chunk_0572.wav...
Transcri

Transcribing chunk_0806.wav...
Transcribing chunk_0807.wav...
Transcribing chunk_0808.wav...
Transcribing chunk_0809.wav...
Transcribing chunk_0810.wav...
Transcribing chunk_0811.wav...
Transcribing chunk_0812.wav...
Transcribing chunk_0813.wav...
Transcribing chunk_0814.wav...
Transcribing chunk_0815.wav...
Transcribing chunk_0816.wav...
Transcribing chunk_0817.wav...
Transcribing chunk_0818.wav...
Transcribing chunk_0819.wav...
Transcribing chunk_0820.wav...
Transcribing chunk_0821.wav...
Transcribing chunk_0822.wav...
Transcribing chunk_0823.wav...
Transcribing chunk_0824.wav...
Transcribing chunk_0825.wav...
Transcribing chunk_0826.wav...
Transcribing chunk_0827.wav...
Transcribing chunk_0828.wav...
Transcribing chunk_0829.wav...
Transcribing chunk_0830.wav...
Transcribing chunk_0831.wav...
Transcribing chunk_0832.wav...
Transcribing chunk_0833.wav...
Transcribing chunk_0834.wav...
Transcribing chunk_0835.wav...
Transcribing chunk_0836.wav...
Transcribing chunk_0837.wav...
Transcri

Transcribing chunk_1071.wav...
Transcribing chunk_1072.wav...
Transcribing chunk_1073.wav...
Transcribing chunk_1074.wav...
Transcribing chunk_1075.wav...
Transcribing chunk_1076.wav...
Transcribing chunk_1077.wav...
Transcribing chunk_1078.wav...
Transcribing chunk_1079.wav...
Transcribing chunk_1080.wav...
Transcribing chunk_1081.wav...
Transcribing chunk_1082.wav...
Transcribing chunk_1083.wav...
Transcribing chunk_1084.wav...
Transcribing chunk_1085.wav...
Transcribing chunk_1086.wav...
Transcribing chunk_1087.wav...
Transcribing chunk_1088.wav...
Transcribing chunk_1089.wav...
Transcribing chunk_1090.wav...
Transcribing chunk_1091.wav...
Transcribing chunk_1092.wav...
Transcribing chunk_1093.wav...
Transcribing chunk_1094.wav...
Transcribing chunk_1095.wav...
Transcribing chunk_1096.wav...
Transcribing chunk_1097.wav...
Transcribing chunk_1098.wav...
Transcribing chunk_1099.wav...
Transcribing chunk_1100.wav...
Transcribing chunk_1101.wav...
Transcribing chunk_1102.wav...
Transcri

Transcribing chunk_1336.wav...
Transcribing chunk_1337.wav...
Transcribing chunk_1338.wav...
Transcribing chunk_1339.wav...
Transcribing chunk_1340.wav...
Transcribing chunk_1341.wav...
Transcribing chunk_1342.wav...
Transcribing chunk_1343.wav...
Transcribing chunk_1344.wav...
Transcribing chunk_1345.wav...
Transcribing chunk_1346.wav...
Transcribing chunk_1347.wav...
Transcribing chunk_1348.wav...
Transcribing chunk_1349.wav...
Transcribing chunk_1350.wav...
Transcribing chunk_1351.wav...
Transcribing chunk_1352.wav...
Transcribing chunk_1353.wav...
Transcribing chunk_1354.wav...
Transcribing chunk_1355.wav...
Transcribing chunk_1356.wav...
Transcribing chunk_1357.wav...
Transcribing chunk_1358.wav...
Transcribing chunk_1359.wav...
Transcribing chunk_1360.wav...
Transcribing chunk_1361.wav...
Transcribing chunk_1362.wav...
Transcribing chunk_1363.wav...
Transcribing chunk_1364.wav...
Transcribing chunk_1365.wav...
Transcribing chunk_1366.wav...
Transcribing chunk_1367.wav...
Transcri

In [8]:
# Write the samples list to output csv
with open(outputPath, 'w', newline='', encoding='utf-8-sig') as f:
    # create csv writer
    csvWriter = csv.writer(f, delimiter='|')
    
    # Note: No need for headers in LJ sppech format...
    
    # Write each sample to the CSV file
    for entry in samples:
        csvWriter.writerow(entry)

print("Transcriptions written to:", outputPath)

Transcriptions written to: datasets/normalExample/metadata.csv
