# üéß Whisper High‚ÄëAccuracy Transcriber (Free)

**What this does:**
- Upload a long audio file (MP3/WAV/M4A/etc.)
- Transcribe with **OpenAI Whisper ‚Äî `large` model** (most accurate)
- Auto‚Äëdetects language by default
- Exports **TXT**, **SRT** (subtitles), and **VTT**

**How to use:**
1. Run each cell top‚Äëto‚Äëbottom.
2. Upload your audio when prompted.
3. Wait for transcription to finish, then download the files.

> Tip: If a GPU is available (Runtime ‚Üí Change runtime type ‚Üí T4/L4/A100), it will be much faster. CPU also works; it just takes longer.


In [1]:
%%bash
echo "Checking GPU (ok if none is shown)‚Ä¶" >&2
nvidia-smi || true
echo "\nInstalling dependencies‚Ä¶" >&2
pip -q install --upgrade pip >/dev/null 2>&1
pip -q install openai-whisper==20231117 >/dev/null 2>&1
apt -y -qq update >/dev/null 2>&1 || true
apt -y -qq install ffmpeg >/dev/null 2>&1
echo "Done."


Sat Jan 10 03:11:27 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   41C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

Checking GPU (ok if none is shown)‚Ä¶
\nInstalling dependencies‚Ä¶


In [2]:
# üîß Settings (change if needed)
MODEL_NAME = "medium"          # Most accurate open-source Whisper model
FORCE_LANGUAGE = "en"           # e.g., "en" or "Portuguese". Leave blank "" to auto-detect
TASK = "transcribe"            # or "translate" to translate non-English ‚Üí English
TEMPERATURE = 0.0              # 0.0 = most deterministic
FP16 = True                    # Set False for CPU-only / errors with half precision
PRINT_SEGMENTS = True          # Print timecoded segments to the notebook output


In [3]:
from google.colab import files
import os

print("üì§ Choose your audio file to upload (mp3/wav/m4a/etc.)‚Ä¶")
uploaded = files.upload()
assert uploaded, "No file was uploaded."
AUDIO_PATH = next(iter(uploaded))
print(f"Uploaded: {AUDIO_PATH} ‚Ä¢ Size: {os.path.getsize(AUDIO_PATH)/1e6:.2f} MB")


üì§ Choose your audio file to upload (mp3/wav/m4a/etc.)‚Ä¶


Saving machine_translation.mp3 to machine_translation.mp3
Uploaded: machine_translation.mp3 ‚Ä¢ Size: 114.05 MB


In [None]:
import whisper, torch, os, math
from datetime import timedelta

device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cpu":
    # CPU can't use fp16
    FP16 = False
print("üñ•Ô∏è Using device:", device)

print("üì• Loading model:", MODEL_NAME)
model = whisper.load_model(MODEL_NAME, device=device)

kwargs = {
    "task": TASK,
    "temperature": TEMPERATURE,
    "fp16": FP16,
}
if FORCE_LANGUAGE.strip():
    kwargs["language"] = FORCE_LANGUAGE.strip()

print("üìù Transcribing‚Ä¶ this can take a while for long files.")
result = model.transcribe(AUDIO_PATH, **kwargs)
text = result.get("text", "").strip()

base = os.path.splitext(os.path.basename(AUDIO_PATH))[0]
txt_path = f"{base}.txt"
with open(txt_path, "w", encoding="utf-8") as f:
    f.write(text)
print("‚úÖ Transcript saved:", txt_path)

def format_timestamp(seconds: float, always_include_hours: bool = True, decimal_marker: str = ','):
    if seconds < 0:
        seconds = 0
    milliseconds = round(seconds * 1000.0)
    hours = milliseconds // 3_600_000
    milliseconds -= hours * 3_600_000
    minutes = milliseconds // 60_000
    milliseconds -= minutes * 60_000
    secs = milliseconds // 1000
    milliseconds -= secs * 1000
    hours_marker = f"{hours:02d}:" if always_include_hours or hours > 0 else ""
    return f"{hours_marker}{minutes:02d}:{secs:02d}{decimal_marker}{milliseconds:03d}"

srt_path = f"{base}.srt"
with open(srt_path, "w", encoding="utf-8") as srt:
    for i, seg in enumerate(result.get("segments", []), start=1):
        start = format_timestamp(seg['start'], always_include_hours=True, decimal_marker=',')
        end = format_timestamp(seg['end'], always_include_hours=True, decimal_marker=',')
        srt.write(f"{i}\n{start} --> {end}\n{seg['text'].strip()}\n\n")
print("üé¨ Subtitles saved:", srt_path)

vtt_path = f"{base}.vtt"
with open(vtt_path, "w", encoding="utf-8") as vtt:
    vtt.write("WEBVTT\n\n")
    for seg in result.get("segments", []):
        start = format_timestamp(seg['start'], always_include_hours=True, decimal_marker='.')
        end = format_timestamp(seg['end'], always_include_hours=True, decimal_marker='.')
        vtt.write(f"{start} --> {end}\n{seg['text'].strip()}\n\n")
print("üìù WebVTT saved:", vtt_path)

if PRINT_SEGMENTS:
    for seg in result.get("segments", []):
        print(f"[{format_timestamp(seg['start'])} ‚Üí {format_timestamp(seg['end'])}] {seg['text'].strip()}")


üñ•Ô∏è Using device: cuda
üì• Loading model: medium


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1.42G/1.42G [00:31<00:00, 48.1MiB/s]


üìù Transcribing‚Ä¶ this can take a while for long files.


In [None]:
from google.colab import files
print("‚¨áÔ∏è Preparing downloads‚Ä¶")
for ext in (".txt", ".srt", ".vtt"):
    path = f"{os.path.splitext(os.path.basename(AUDIO_PATH))[0]}{ext}"
    if os.path.exists(path):
        files.download(path)
print("All set. If your downloads didn't auto-start, open the Files tab (left) and right-click to download.")


‚¨áÔ∏è Preparing downloads‚Ä¶


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

All set. If your downloads didn't auto-start, open the Files tab (left) and right-click to download.


### üß© Troubleshooting & Tips
- **Runtime crashed / out of memory?** Use a GPU runtime (Runtime ‚Üí Change runtime type ‚Üí GPU) or switch to a smaller model (e.g., `medium`).
- **Wrong language detected?** Set `FORCE_LANGUAGE = "en"` (or your language) in the Settings cell.
- **Punctuation/Names not perfect?** That‚Äôs normal for automated ASR. You can lightly edit the TXT afterward.
- **Very long files?** Whisper can handle multi-hour files. If you still hit limits, you can split with FFmpeg:
  ```bash
  ffmpeg -i long_audio.mp3 -f segment -segment_time 1800 -c copy part_%03d.mp3
  ```
- **Translate to English?** Set `TASK = "translate"`.
