# Audio Transcription Tool - Google Colab (GPU Accelerated)

This notebook demonstrates how to use the GPU-accelerated audio transcription tool in Google Colab.

**Important:** Make sure you're using a GPU runtime!
- Go to Runtime → Change runtime type → Hardware accelerator → GPU (T4 or A100)


In [None]:
# Clone the repository
!rm -rf repo
!git clone https://github.com/chr-wn/transcribe-colab-2.git repo

In [None]:
# Run the GPU-optimized Colab setup script
%cd /content/repo
!chmod +x colab_setup.sh
!./colab_setup.sh

In [None]:
# from google.colab import files

# # Upload audio files
# uploaded = files.upload()

# # List uploaded files
# for filename in uploaded.keys():
#     print(f"Uploaded: {filename} ({len(uploaded[filename])} bytes)")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

!ls /content/drive/MyDrive/pows/
!ls "/content/drive/MyDrive/pows/01-1-pows.mp3"
filename="/content/drive/MyDrive/pows/01-1-pows.mp3"
print(filename)

In [11]:
!whisper.cpp/build/bin/whisper-cli -h


usage: whisper.cpp/build/bin/whisper-cli [options] file0 file1 ...
supported audio formats: flac, mp3, ogg, wav

options:
  -h,        --help              [default] show this help message and exit
  -t N,      --threads N         [2      ] number of threads to use during computation
  -p N,      --processors N      [1      ] number of processors to use during computation
  -ot N,     --offset-t N        [0      ] time offset in milliseconds
  -on N,     --offset-n N        [0      ] segment index offset
  -d  N,     --duration N        [0      ] duration of audio to process in milliseconds
  -mc N,     --max-context N     [-1     ] maximum number of text context tokens to store
  -ml N,     --max-len N         [0      ] maximum segment length in characters
  -sow,      --split-on-word     [false  ] split on word rather than on token
  -bo N,     --best-of N         [5      ] number of best candidates to keep
  -bs N,     --beam-size N       [5      ] beam size for beam search
  -ac N,

In [None]:
# usage: transcribe.py [-h] [-m {tiny,base,small,medium,large}] [-o OUTPUT] [-t]
#                      [-b] [-v]
#                      files [files ...]

# Convert audio files to text transcripts using whisper.cpp

# positional arguments:
#   files                 Audio file(s) to transcribe

# options:
#   -h, --help            show this help message and exit
#   -m {tiny,base,small,medium,large}, --model {tiny,base,small,medium,large}
#                         Whisper model to use (default: base)
#   -o OUTPUT, --output OUTPUT
#                         Output filename (default: replace extension with .txt)
#   -t, --timestamps      Include timestamps in output
#   -b, --batch           Batch mode: concatenate multiple files into single
#                         output
#   -v, --verbose         Show detailed information

# Examples:
#   transcribe.py audio.mp3                    # Basic transcription
#   transcribe.py -m large audio.mp3           # Use large model
#   transcribe.py -t audio.mp3                 # Include timestamps
#   transcribe.py -o transcript.txt audio.mp3  # Custom output filename
#   transcribe.py *.mp3                        # Batch process multiple files
#   transcribe.py -b -o all.txt *.mp3          # Concatenate all into single file
#   transcribe.py -v audio.mp3                 # Verbose output

# Supported Models:
#   tiny   - Fastest, least accurate (~39MB)
#   base   - Good balance (default) (~74MB)
#   small  - Better accuracy (~244MB)
#   medium - High accuracy (~769MB)
#   large  - Best accuracy (~1550MB)

mp3_files = [
  '01-1-pows',
  '02-1-pows-p1-7',
  '03-1-pows-p8-17',
  '04-1-pows-p18-41',
  '05-1-pows-p42-56',
  '06-2-pows-p57-74',
  '07-2-pows-p75-90',
  '08-2-pows-p91-134',
  '09-3-pows-p135-146',
  '10-3-pows-p147-158',
  '11-3-pows-p159-163',
  '12-3-pows-p164-184',
  '13-4-pows-p185-190',
  '14-4-pows-p191-197',
  '15-4-pows-p197-200',
  '16-4-pows-p201-206',
  '17-5-pows-p207-213',
  '18-5-pows-p214-222',
  '19-5-pows-p223-232'
]

for filename in mp3_files:
  !./transcribe.py -v -m medium "/content/drive/MyDrive/pows/{filename}.mp3"
  print("\n"*5)