# üéôÔ∏è Audio ‚Üí SRT Subtitles (with Translation)

This notebook:
1. **Transcribes** uploaded audio using [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) with word-level timestamps via the ForcedAligner
2. **Generates** an SRT subtitle file from the transcription
3. **Translates** the SRT to a target language using [Helsinki-NLP/opus-mt](https://huggingface.co/Helsinki-NLP), Gemini or Google Translate

Configure `SOURCE_LANGUAGE` and `TARGET_LANGUAGE` in the config cell below (default: Japanese ‚Üí English).

**Requirements:** A Colab runtime with a **T4 GPU** (free tier works).

> ‚ö†Ô∏è Make sure you've selected **Runtime ‚Üí Change runtime type ‚Üí T4 GPU** before running.

## 0 ¬∑ CONFIG

General Config

In [None]:
# ‚ö†Ô∏è IMPORTANT ‚ö†Ô∏è
# Path to the file to transcribe ‚Äî Supported formats:
# .wav, .mp3, .flac, .ogg, .m4a, etc.
AUDIO_PATH = "ja_audio.mp3"  # @param {type:"string"}

# Source and target languages for transcription and translation
SOURCE_LANGUAGE = "Japanese"  # @param {type:"string"}
TARGET_LANGUAGE = "English"  # @param {type:"string"}

# Which translation methods to run
TRANSLATE_USING_GEMINI = True  # Gemini (Using your API key)
TRANSLATE_USING_GT = True  # Google Translate

# Legacy: Translate using a small open source model
TRANSLATE_USING_OPUS = False

# ISO 639-1 language codes ‚Äî used for file naming and translation APIs.
# Add more as needed.
LANG_CODES = {
    "Arabic": "ar",
    "Chinese": "zh",
    "Czech": "cs",
    "Danish": "da",
    "Dutch": "nl",
    "English": "en",
    "Finnish": "fi",
    "French": "fr",
    "German": "de",
    "Greek": "el",
    "Hebrew": "he",
    "Hindi": "hi",
    "Hungarian": "hu",
    "Indonesian": "id",
    "Italian": "it",
    "Japanese": "ja",
    "Korean": "ko",
    "Malay": "ms",
    "Norwegian": "no",
    "Polish": "pl",
    "Portuguese": "pt",
    "Romanian": "ro",
    "Russian": "ru",
    "Spanish": "es",
    "Swedish": "sv",
    "Thai": "th",
    "Turkish": "tr",
    "Ukrainian": "uk",
    "Vietnamese": "vi",
}
SRC_CODE = LANG_CODES[SOURCE_LANGUAGE]
TGT_CODE = LANG_CODES[TARGET_LANGUAGE]

Technical Parameters

In [None]:
# Chunk length (seconds) ‚Äî Each audio chunk is processed separately
# to fit in GPU memory. Shorter = less VRAM but more chunks.
# 20 s works on a free-tier T4 (15 GB). Increase if possible.
CHUNK_SEC = 200

# Maximum batch size for the ASR Model
MAX_INFERENCE_BATCH_SIZE = 32  # TEST IF 32 WORKS, CHANGE TO 1 AGAIN IF NOT

# Gemini translation batch size ‚Äî Number of subtitle lines sent
# per API call. Larger = fewer calls but longer prompts.
GEMINI_BATCH_SIZE = 100

## 1 ¬∑ Install Dependencies

In [None]:
%pip install -q qwen-asr transformers sentencepiece sacremoses deep-translator google-genai

In [None]:
%pip install -U -q https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3%2Bcu128torch2.10-cp312-cp312-linux_x86_64.whl

## 2 ¬∑ Upload Audio File

Supported formats: `.wav`, `.mp3`, `.flac`, `.ogg`, `.m4a`, etc.

In [None]:
from IPython.display import display, Audio
import os

# Preview the uploaded audio
if AUDIO_PATH and os.path.exists(AUDIO_PATH):
    display(Audio(AUDIO_PATH))
else:
    print("‚ö†Ô∏è  Please upload an audio file in the cell above first.")

## 3 ¬∑ Transcribe with Qwen3-ASR-1.7B

To fit on a T4 (15 GB VRAM), we run ASR and alignment as **two separate steps** so both models are never loaded at the same time.

In [None]:
import gc
import os
import numpy as np
import torch
import librosa
from qwen_asr import Qwen3ASRModel

# Help PyTorch reuse freed VRAM fragments
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

assert AUDIO_PATH and os.path.exists(AUDIO_PATH), (
    "No audio file found. Run the upload cell above first."
)

# --- Load and split audio manually ---
# The audio tower's attention is O(n¬≤) on sequence length, so we split
# into short segments and feed each one individually.
SR = 16_000  # qwen-asr expects 16 kHz

print(f"Loading audio: {os.path.basename(AUDIO_PATH)} ‚Ä¶")
full_wav, _ = librosa.load(AUDIO_PATH, sr=SR, mono=True)
total_dur = len(full_wav) / SR
print(f"Duration: {total_dur:.1f} s  ({total_dur / 60:.1f} min)")

chunk_samples = CHUNK_SEC * SR
audio_chunks = []
for start in range(0, len(full_wav), chunk_samples):
    chunk = full_wav[start : start + chunk_samples]
    if len(chunk) < SR // 2:  # skip tiny tail < 0.5 s
        continue
    audio_chunks.append((float(start) / SR, chunk))

print(f"Split into {len(audio_chunks)} chunks of ‚â§{CHUNK_SEC} s each.")

# --- Load ASR model ---
print("\nLoading Qwen3-ASR-1.7B ‚Ä¶")
asr_model = Qwen3ASRModel.from_pretrained(
    "Qwen/Qwen3-ASR-1.7B",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    attn_implementation="flash_attention_2",
    max_inference_batch_size=MAX_INFERENCE_BATCH_SIZE,
    max_new_tokens=4096,
)
print("‚úÖ ASR model loaded.")

In [None]:
# Transcribe each chunk individually to stay within T4 VRAM
all_texts = []
for i, (offset, chunk_wav) in enumerate(audio_chunks):
    print(
        f"  Chunk {i + 1}/{len(audio_chunks)}  "
        f"[{offset:.1f}s ‚Äì {offset + len(chunk_wav) / SR:.1f}s] ‚Ä¶",
        end=" ",
    )
    r = asr_model.transcribe(
        audio=(chunk_wav, SR),
        language=SOURCE_LANGUAGE,
        return_time_stamps=False,
    )
    text = r[0].text.strip()
    all_texts.append(text)
    print(text[:80])

transcribed_text = "".join(all_texts)
print(f"\n{'‚îÄ' * 60}")
print(f"Full transcription ({len(transcribed_text)} chars):\n{transcribed_text}")

# Free ASR model before loading the aligner
del asr_model
gc.collect()
torch.cuda.empty_cache()
print("\n‚úÖ ASR model unloaded ‚Äî GPU memory freed.")

In [None]:
# --- Step 2: Forced Aligner for word-level timestamps ---
from dataclasses import replace
from qwen_asr import Qwen3ForcedAligner

print("Loading Qwen3-ForcedAligner-0.6B ‚Ä¶")
aligner = Qwen3ForcedAligner.from_pretrained(
    "Qwen/Qwen3-ForcedAligner-0.6B",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    attn_implementation="flash_attention_2",
)
print("‚úÖ Aligner loaded.")

# Align each chunk separately (same chunking as ASR) and shift timestamps
print("Aligning timestamps ‚Ä¶")
time_stamps = []
for i, (offset, chunk_wav) in enumerate(audio_chunks):
    chunk_text = all_texts[i]
    if not chunk_text.strip():
        continue
    print(f"  Aligning chunk {i + 1}/{len(audio_chunks)} ‚Ä¶")
    alignment = aligner.align(
        audio=(chunk_wav, SR),
        text=chunk_text,
        language=SOURCE_LANGUAGE,
    )
    # Shift timestamps by the chunk's offset (stamps are frozen dataclasses)
    for stamp in alignment[0]:
        shifted = replace(
            stamp,
            start_time=stamp.start_time + offset,
            end_time=stamp.end_time + offset,
        )
        time_stamps.append(shifted)

print(f"\nTimestamp segments: {len(time_stamps)}")
if time_stamps:
    print(
        f"First: {time_stamps[0].text} [{time_stamps[0].start_time:.2f}s ‚Äì {time_stamps[0].end_time:.2f}s]"
    )

# Free aligner
del aligner
gc.collect()
torch.cuda.empty_cache()
print("\n‚úÖ Aligner unloaded ‚Äî GPU memory freed.")

## 4 ¬∑ Generate SRT Subtitles

In [None]:
from datetime import timedelta


def format_srt_time(seconds: float) -> str:
    """Convert seconds to SRT timestamp format: HH:MM:SS,mmm"""
    td = timedelta(seconds=seconds)
    total_seconds = int(td.total_seconds())
    hours = total_seconds // 3600
    minutes = (total_seconds % 3600) // 60
    secs = total_seconds % 60
    millis = int(td.microseconds / 1000)
    return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"


def group_timestamps_to_subtitles(
    stamps, max_chars: int = 40, max_duration: float = 7.0, gap_threshold: float = 0.6
):
    """
    Group word-level timestamps into subtitle segments.

    Args:
        stamps: list of timestamp objects with .text, .start_time, .end_time
        max_chars: max characters per subtitle line
        max_duration: max duration (seconds) per subtitle
        gap_threshold: silence gap (seconds) that forces a new subtitle
    """
    if not stamps:
        return []

    subtitles = []
    current_text = ""
    current_start = stamps[0].start_time
    current_end = stamps[0].end_time

    for i, stamp in enumerate(stamps):
        # Decide whether to start a new subtitle
        start_new = False
        if i == 0:
            current_text = stamp.text
            current_start = stamp.start_time
            current_end = stamp.end_time
            continue

        # Check gap between previous and current word
        gap = stamp.start_time - current_end
        new_duration = stamp.end_time - current_start
        new_len = len(current_text) + len(stamp.text)

        if gap > gap_threshold or new_duration > max_duration or new_len > max_chars:
            start_new = True

        if start_new:
            subtitles.append((current_start, current_end, current_text.strip()))
            current_text = stamp.text
            current_start = stamp.start_time
            current_end = stamp.end_time
        else:
            current_text += stamp.text
            current_end = stamp.end_time

    # Don't forget the last segment
    if current_text.strip():
        subtitles.append((current_start, current_end, current_text.strip()))

    return subtitles


def build_srt(subtitles) -> str:
    """Build SRT string from list of (start, end, text) tuples."""
    lines = []
    for idx, (start, end, text) in enumerate(subtitles, 1):
        lines.append(str(idx))
        lines.append(f"{format_srt_time(start)} --> {format_srt_time(end)}")
        lines.append(text)
        lines.append("")  # blank line separator
    return "\n".join(lines)


# Build source-language SRT
subtitles_src = group_timestamps_to_subtitles(time_stamps)

srt_src = build_srt(subtitles_src)

# Save
base_name = os.path.splitext(os.path.basename(AUDIO_PATH))[0]
srt_src_path = f"/content/{base_name}_{SRC_CODE}.srt"
with open(srt_src_path, "w", encoding="utf-8") as f:
    f.write(srt_src)

print(f"‚úÖ {SOURCE_LANGUAGE} SRT saved to: {srt_src_path}")
print(f"   {len(subtitles_src)} subtitle segments\n")
print("--- Preview (first 10 segments) ---")
print("\n".join(srt_src.split("\n")[:40]))

## 5 ¬∑ Translate Subtitles (opus-mt, local)

Uses a lightweight MarianMT model from [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) for translation. Fast and runs entirely on-device, but quality is limited for nuanced text. Honestly, this kinda sucks.

> Model is auto-selected as `Helsinki-NLP/opus-mt-{SRC}-{TGT}`. Not all language pairs are available ‚Äî check [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) if you get errors.

In [None]:
if TRANSLATE_USING_OPUS:
    from transformers import MarianMTModel, MarianTokenizer

    TRANSLATION_MODEL = f"Helsinki-NLP/opus-mt-{SRC_CODE}-{TGT_CODE}"

    print(f"Loading translation model: {TRANSLATION_MODEL} ‚Ä¶")
    trans_tokenizer = MarianTokenizer.from_pretrained(TRANSLATION_MODEL)
    trans_model = MarianMTModel.from_pretrained(TRANSLATION_MODEL).to("cuda")
    print("‚úÖ Translation model loaded.")
else:
    print("‚è≠Ô∏è  opus-mt translation skipped (TRANSLATE_USING_OPUS = False)")

In [None]:
subtitles_tgt = []
srt_tgt_opus_path = None

if TRANSLATE_USING_OPUS:

    def translate_texts(texts: list[str], batch_size: int = 32) -> list[str]:
        """Translate a list of texts in batches using opus-mt."""
        translations = []
        for i in range(0, len(texts), batch_size):
            batch = texts[i : i + batch_size]
            inputs = trans_tokenizer(
                batch,
                return_tensors="pt",
                padding=True,
                truncation=True,
                max_length=512,
            ).to("cuda")
            with torch.no_grad():
                output_ids = trans_model.generate(**inputs, max_length=512)
            decoded = trans_tokenizer.batch_decode(output_ids, skip_special_tokens=True)
            translations.extend(decoded)
        return translations

    # Extract source-language texts from subtitles
    src_texts = [text for _, _, text in subtitles_src]
    print(f"Translating {len(src_texts)} subtitle segments ‚Ä¶")
    tgt_texts = translate_texts(src_texts)
    print("‚úÖ Translation complete.")

    # Build target-language subtitles with original timings
    subtitles_tgt = [
        (start, end, tgt_text)
        for (start, end, _), tgt_text in zip(subtitles_src, tgt_texts)
    ]

    srt_tgt = build_srt(subtitles_tgt)

    # Save
    srt_tgt_opus_path = f"/content/{base_name}_{TGT_CODE}_opus.srt"
    with open(srt_tgt_opus_path, "w", encoding="utf-8") as f:
        f.write(srt_tgt)

    print(f"\n‚úÖ {TARGET_LANGUAGE} SRT (opus-mt) saved to: {srt_tgt_opus_path}")
    print(f"   {len(subtitles_tgt)} subtitle segments\n")
    print("--- Preview (first 10 segments) ---")
    print("\n".join(srt_tgt.split("\n")[:40]))

    # Free opus-mt model
    del trans_model, trans_tokenizer
    gc.collect()
    torch.cuda.empty_cache()
else:
    print("‚è≠Ô∏è  opus-mt translation skipped.")

## 6 ¬∑ Translate Subtitles with Gemini (free via Google AI Studio)

Get an API key at https://aistudio.google.com/app/api-keys. By default, it uses
a free tier, where billing isn't set up, so you shouldn't worry about getting
charged. You can check the rate limits at https://aistudio.google.com/rate-limit?timeRange=last-28-days, and change the model accordingly by setting `GEMINI_MODEL` in the cell below.

As of now, Gemini 3 Flash produces the highest quality translations (and are much higher-quality than the small opus-mt model ‚Äî especially for nuanced or conversational text).

In [None]:
subtitles_tgt_gemini = []
srt_tgt_gemini_path = None

if TRANSLATE_USING_GEMINI:
    from google import genai
    import json
    import time

    from google.colab import userdata

    try:
        api_key = userdata.get("GOOGLE_API_KEY")
    except userdata.SecretNotFoundError:
        import os

        api_key = os.environ.get("GOOGLE_API_KEY", "")

    assert api_key, (
        "No Gemini API key found. In Colab, go to üîë Secrets (left sidebar) "
        "and add GOOGLE_API_KEY."
    )

    client = genai.Client(api_key=api_key)
    GEMINI_MODEL = "gemini-3-flash-preview"

    SYSTEM_PROMPT = (
        f"You are a professional {SOURCE_LANGUAGE}-to-{TARGET_LANGUAGE} subtitle translator. "
        f"You will receive numbered {SOURCE_LANGUAGE} subtitle lines. "
        f"Return ONLY a JSON array of strings ‚Äî one {TARGET_LANGUAGE} translation per line, "
        "in the same order. Follow professional subtitle standards:\n"
        "- Write natural, concise translations optimized for on-screen reading.\n"
        "- Preserve the original meaning, tone, and intent.\n"
        "- DO NOT use quotation marks for spoken dialogue.\n"
        "- If a line contains multiple speakers, separate them with a leading dash:\n"
        "- Keep punctuation simple and subtitle-appropriate.\n"
        "- Avoid unnecessary words, filler, or formal phrasing.\n"
        "- Maintain line readability and timing (short and clear).\n"
        "- Keep sound effects or non-speech text concise and natural.\n"
        "- Preserve speaker intent (emotion, politeness level, formality) when relevant.\n\n"
        "Formatting rules:\n"
        "- Output ONLY a valid JSON array of strings.\n"
        "- No numbering.\n"
        "- No extra explanations or comments.\n"
        "- No additional quotation marks around dialogue.\n"
        "- Do not merge or split lines unless absolutely necessary for clarity.\n\n"
        "Your translations should look like professional streaming subtitles, not written prose."
    )

    def translate_with_gemini(
        texts: list[str], batch_size: int = GEMINI_BATCH_SIZE
    ) -> list[str]:
        """Translate texts using Gemini in batches."""
        all_translations = []

        for batch_start in range(0, len(texts), batch_size):
            batch = texts[batch_start : batch_start + batch_size]
            batch_num = batch_start // batch_size + 1
            total_batches = (len(texts) + batch_size - 1) // batch_size
            print(
                f"  Batch {batch_num}/{total_batches} ({len(batch)} lines) ‚Ä¶", end=" "
            )

            # Build numbered input
            numbered = "\n".join(f"{i + 1}. {t}" for i, t in enumerate(batch))
            prompt = f"{SYSTEM_PROMPT}\n\nSubtitle lines:\n{numbered}"

            for attempt in range(3):
                try:
                    response = client.models.generate_content(
                        model=GEMINI_MODEL,
                        contents=prompt,
                    )
                    raw = response.text.strip()
                    # Strip markdown code fences if present
                    if raw.startswith("```"):
                        raw = raw.split("\n", 1)[1]
                        raw = raw.rsplit("```", 1)[0]
                    translations = json.loads(raw)
                    assert isinstance(translations, list) and len(translations) == len(
                        batch
                    )
                    all_translations.extend(translations)
                    print("‚úÖ")
                    break
                except (json.JSONDecodeError, AssertionError, Exception) as e:
                    if attempt < 2:
                        print(f"‚ö†Ô∏è retry ({e.__class__.__name__}) ‚Ä¶", end=" ")
                        time.sleep(2**attempt)
                    else:
                        # Fallback: return originals for this batch
                        print(f"‚ùå fallback (kept {SOURCE_LANGUAGE})")
                        all_translations.extend(batch)

            # Respect free-tier rate limits
            if batch_start + batch_size < len(texts):
                time.sleep(1)

        return all_translations

    # --- Translate ---
    src_texts_gemini = [text for _, _, text in subtitles_src]
    print(
        f"Translating {len(src_texts_gemini)} subtitles with Gemini ({GEMINI_MODEL}) ‚Ä¶\n"
    )
    tgt_texts_gemini = translate_with_gemini(src_texts_gemini)
    print("\n‚úÖ Gemini translation complete.")

    # Build target-language subtitles with original timings
    subtitles_tgt_gemini = [
        (start, end, tgt_text)
        for (start, end, _), tgt_text in zip(subtitles_src, tgt_texts_gemini)
    ]

    srt_tgt_gemini = build_srt(subtitles_tgt_gemini)

    # Save
    srt_tgt_gemini_path = f"/content/{base_name}_{TGT_CODE}_gemini.srt"
    with open(srt_tgt_gemini_path, "w", encoding="utf-8") as f:
        f.write(srt_tgt_gemini)

    print(f"‚úÖ Gemini {TARGET_LANGUAGE} SRT saved to: {srt_tgt_gemini_path}")
    print(f"   {len(subtitles_tgt_gemini)} subtitle segments\n")
    print("--- Preview (first 10 segments) ---")
    print("\n".join(srt_tgt_gemini.split("\n")[:40]))
else:
    print("‚è≠Ô∏è  Gemini translation skipped (TRANSLATE_USING_GEMINI = False)")

## 7 ¬∑ Translate Subtitles with Google Translate (free)

Uses the free Google Translate API via `deep-translator`. No API key needed. Quality sits between opus-mt and Gemini.

In [None]:
subtitles_tgt_gt = []
srt_tgt_gt_path = None

if TRANSLATE_USING_GT:
    from deep_translator import GoogleTranslator
    import time

    gtranslator = GoogleTranslator(source=SRC_CODE, target=TGT_CODE)

    def translate_with_google(texts: list[str], batch_size: int = 50) -> list[str]:
        """Translate texts using Google Translate (free).

        deep-translator's GoogleTranslator.translate_batch() has a ~5000 char
        limit per request, so we send in small batches.
        """
        all_translations = []

        for batch_start in range(0, len(texts), batch_size):
            batch = texts[batch_start : batch_start + batch_size]
            batch_num = batch_start // batch_size + 1
            total_batches = (len(texts) + batch_size - 1) // batch_size
            print(
                f"  Batch {batch_num}/{total_batches} ({len(batch)} lines) ‚Ä¶", end=" "
            )

            try:
                translated = gtranslator.translate_batch(batch)
                all_translations.extend(translated)
                print("‚úÖ")
            except Exception as e:
                # Fallback: translate one-by-one for this batch
                print(f"‚ö†Ô∏è batch failed ({e.__class__.__name__}), trying one-by-one ‚Ä¶")
                for t in batch:
                    try:
                        all_translations.append(gtranslator.translate(t))
                    except Exception:
                        all_translations.append(t)
                    time.sleep(0.3)

            # Small delay to avoid rate-limiting
            if batch_start + batch_size < len(texts):
                time.sleep(0.5)

        return all_translations

    # --- Translate ---
    src_texts_gt = [text for _, _, text in subtitles_src]
    print(f"Translating {len(src_texts_gt)} subtitles with Google Translate ‚Ä¶\n")
    tgt_texts_gt = translate_with_google(src_texts_gt)
    print("\n‚úÖ Google Translate complete.")

    # Build target-language subtitles with original timings
    subtitles_tgt_gt = [
        (start, end, tgt_text)
        for (start, end, _), tgt_text in zip(subtitles_src, tgt_texts_gt)
    ]

    srt_tgt_gt = build_srt(subtitles_tgt_gt)

    # Save
    srt_tgt_gt_path = f"/content/{base_name}_{TGT_CODE}_gtranslate.srt"
    with open(srt_tgt_gt_path, "w", encoding="utf-8") as f:
        f.write(srt_tgt_gt)

    print(f"‚úÖ Google Translate SRT saved to: {srt_tgt_gt_path}")
    print(f"   {len(subtitles_tgt_gt)} subtitle segments\n")
    print("--- Preview (first 10 segments) ---")
    print("\n".join(srt_tgt_gt.split("\n")[:40]))
else:
    print("‚è≠Ô∏è  Google Translate skipped (TRANSLATE_USING_GT = False)")

## 8 ¬∑ Side-by-Side Comparison

In [None]:
from IPython.display import display, HTML as IPyHTML

# Build dynamic columns based on which translations ran
cols = [(SOURCE_LANGUAGE, subtitles_src)]
if subtitles_tgt:
    cols.append(("opus-mt", subtitles_tgt))
if subtitles_tgt_gemini:
    cols.append(("Gemini 3 Flash", subtitles_tgt_gemini))
if subtitles_tgt_gt:
    cols.append(("Google Translate", subtitles_tgt_gt))

ncols = 2 + len(cols)  # #, Time, + translation cols
header_cells = "".join(f"<th>{name}</th>" for name, _ in cols)
header = f"<tr><th>#</th><th>Time</th>{header_cells}</tr>"

rows = []
for i, (s, e, src) in enumerate(subtitles_src):
    time_str = f"{format_srt_time(s)} ‚Üí {format_srt_time(e)}"
    data_cells = ""
    for _, subs in cols:
        text = subs[i][2] if i < len(subs) else ""
        data_cells += f"<td>{text}</td>"
    rows.append(
        f"<tr><td>{i + 1}</td><td style='white-space:nowrap'>{time_str}</td>{data_cells}</tr>"
    )
    if i >= 29:
        remaining = len(subtitles_src) - 30
        if remaining > 0:
            rows.append(
                f"<tr><td colspan='{ncols}'><i>‚Ä¶ and {remaining} more segments</i></td></tr>"
            )
        break

html = (
    "<table border='1' cellpadding='4' style='border-collapse:collapse;font-size:13px'>"
    + header
    + "\n".join(rows)
    + "</table>"
)
display(IPyHTML(html))

## 9 ¬∑ Download SRT Files

In [None]:
try:
    from google.colab import files

    print(f"Downloading {SOURCE_LANGUAGE} SRT ‚Ä¶")
    files.download(srt_src_path)
    if srt_tgt_opus_path:
        print(f"Downloading {TARGET_LANGUAGE} SRT (opus-mt) ‚Ä¶")
        files.download(srt_tgt_opus_path)
    if srt_tgt_gemini_path:
        print(f"Downloading {TARGET_LANGUAGE} SRT (Gemini) ‚Ä¶")
        files.download(srt_tgt_gemini_path)
    if srt_tgt_gt_path:
        print(f"Downloading {TARGET_LANGUAGE} SRT (Google Translate) ‚Ä¶")
        files.download(srt_tgt_gt_path)
except ImportError:
    print("Not running in Colab ‚Äî files saved at:")
    print(f"  {SOURCE_LANGUAGE}: {srt_src_path}")
    if srt_tgt_opus_path:
        print(f"  {TARGET_LANGUAGE} (opus-mt): {srt_tgt_opus_path}")
    if srt_tgt_gemini_path:
        print(f"  {TARGET_LANGUAGE} (Gemini): {srt_tgt_gemini_path}")
    if srt_tgt_gt_path:
        print(f"  {TARGET_LANGUAGE} (Google Trans.): {srt_tgt_gt_path}")

## 10 ¬∑ Cleanup (Optional)

Free GPU memory if you want to run other things in this session.

In [None]:
import gc

gc.collect()
torch.cuda.empty_cache()
print("‚úÖ GPU memory freed.")