# WhisperX template

## Step 1: Change kernel
Make sure in the top right of this window the selected kernel is `whisperX (ipykernel)`. If you see `Python 3 (ipykernel)`, just click on it to change it to the WhisperX kernel.

## Step 2: Create input and output folders
It is recommended to store audio files in a (new) folder on a storage volume. The storage volume can be found here: `~/data/volume_2`. 
- Navigate to the storage volume in the panel on the left.
- Create a new folder called `my_transcription_project` by clicking on the `New Folder` button (folder with a plus sign) in the panel on the left.
    - _if you choose a different folder name, remember to change the foldernames in the code cells below_
- Go into this folder by clicking on it in the panel on the left.
- Create a folder called `data`
- Create another folder called `output` 

## Step 3: Upload audio
- To upload an audio file, navigate to `data` folder (previous step) 
- Press the 'Upload Files' button (upward arrow) in the panel on the left to upload the file. 

## Step 4: Run the code cell below to import whisperX and define required variables

In [None]:
import whisper
import whisperx
import gc

device = "cuda" 
batch_size = 16                 # reduce if low on GPU mem
compute_type = "float16"        # change to "int8" if low on GPU mem (may reduce accuracy)
whisperx_model = "large-v2"     # options: "base", "small", "medium"

writer_options = {"max_line_width":None,
                  "max_line_count":None,
                  "highlight_words":None}

## Step 5: Specify audio file

In the cell below, change `audio.mp3` to the name of your audio file

In [None]:
filename = "audio.mp3" # change audio.mp3 to the file name of your file
audio_file = "/data/volume_2/my_transcription_project/data/" + filename # change the path when relevant (e.g. when the you chose a different folder name for this project)

## Step 6: Transcribe audio file

Run the code cell below to transcribe the audio file with the model selected above. 

In [None]:
audio_whisperx = whisperx.load_audio(audio_file)
model = whisperx.load_model(whisperx_model, device, compute_type=compute_type)
result = model.transcribe(audio_whisperx, batch_size=batch_size)
print(result["segments"])

## Step 7:  Save output to files
Run the cell below to save the transcript in all file formats that are supported by Whisper

In [None]:
output_directory = "/data/volume_2/my_transcription_project/output/"
writer = whisper.utils.get_writer("all", output_directory)
writer(result, audio_file, writer_options)

## Additional tasks: Translate
Run the code cell below to get a translated transcript in English from audio in a different language using the model selected above. Other target languages are currently not supported. Use step 7 to save the translated transcript in all file formats.

In [None]:
model = whisperx.load_model(whisperx_model, device, compute_type=compute_type)
result = model.transcribe(audio_whisperx, batch_size=batch_size, task="translate")
print(result["segments"])

## Additional Tasks: Align whisper output

Improved alignment of the transcript and word level timings can be obtained by running the code cell below. Use step 7 to save the translated transcript in all file formats.

In [None]:
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio_whisperx, device, return_char_alignments=False)

print(result["segments"]) # after alignment

## Additional Tasks: Speaker diarization

To be able to use this pipeline, you will need a Huggingface token and accept the terms for the relevant models. 

Step 1: create a token [here](https://huggingface.co/settings/tokens)

Step 2: Enter your token in the code cell below and run it. If you haven't accepted the terms earlier, you will get an error message with a link to accept the terms for the relevant model (Segmentation , Voice Activity Detection (VAD), and Speaker Diarization), follow the link and accept the terms and rerun the code cell below. You should now get another error with a new link. Repeat the process until you have accepted the terms for all models actually get output instead of an error message.  

Use step 7 to save the translated transcript in all file formats.

In [None]:
YOUR_HF_TOKEN = '<insert your huggingface token here>'

diarize_model = whisperx.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)

# add min/max number of speakers if known
diarize_segments = diarize_model(audio_whisperx)
# diarize_model(audio_file, min_speakers=min_speakers, max_speakers=max_speakers)

result = whisperx.assign_word_speakers(diarize_segments, result)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDs

## GPU Memory issues

Dependening on the length of the audio file, the model used, and the GPU device used, you may run into memory issues. If this happens, you can delete the model and re-load it with a different `compute-type`. You can also try a smaller batch size, see step 4.

In [None]:
# delete model if low on GPU resources
import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model_a