# VTT AI Translator
## About
This is a notebook to translate VTT subtitle / caption files fluently using the OpenAI API and webvtt. It can be switched to Deepseek easily in the future for more cost-efficient processing. Made by Connor Wright for Georgia Tech's Buzz Studios Filmmaking Club. 

## How to Use 
* Clone the repo
* Change the folder / file paths to the respective vtt
* Set a language using the ISO language code
* Put in an OpenAI API key (or ask for mine)
* Run all the cells

In [1]:
### pip installations
%pip install openai
%pip install webvtt-py

Collecting openai
  Downloading openai-1.79.0-py3-none-any.whl.metadata (25 kB)
Collecting anyio<5,>=3.5.0 (from openai)
  Downloading anyio-4.9.0-py3-none-any.whl.metadata (4.7 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.10.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.2 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Downloading pydantic-2.11.4-py3-none-any.whl.metadata (66 kB)
Collecting sniffio (from openai)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting tqdm>4 (from openai)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting idna>=2.8 (from anyio<5,>=3.5.0->openai)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting certifi (from httpx<1,>=0.23.0->openai)
  Downloading certifi-2025.4.26

In [11]:
### Package imports
from openai import OpenAI
import os
import webvtt
import copy

In [None]:
### Setup variables
## Set file paths / language output 
folder_path = "/Users/connorwright/Downloads/GT.CS.CodeFiles/BuzzStudios/Assets/Subtitles/"
vtt_name = "frisbee-fables-cc.vtt"
trans_lang = "nl"
language = "Dutch"

## Set API Key 
api_key = ""

vtt_path = os.path.join(folder_path, vtt_name)

In [None]:
### Turn original captions into single string for GPT input 
captions_list = []
captions = []

curr_chars = 0
max_tokens = 32000 # 4o-mini limit is 16000 tokens. 4 chars per token. Divide by 2 for safety

for caption in webvtt.read(vtt_path):
    #print(caption.start)  # start timestamp in text format
    #print(caption.end)  # end timestamp in text format
    #print(caption.text)  # caption text
    #print(caption.voice)  # voice span if present
    curr_chars += len(caption.text)
    curr_tokens = curr_chars / 4
    
    captions.append(caption.text)
    
    if (curr_tokens > max_tokens):
        captions_list.append(copy.deepcopy(captions))
        captions.clear()

if captions:
    captions_list.append(copy.deepcopy(captions))

captions_list = [
    "\n".join(c) if isinstance(c, list) else str(c)
    for c in captions_list
]

print(captions_list[0])



[Beat heavy, tense music playing]
ALFRED: The dragon draws back,
releasing a terrible roar
as it prepares to let out its fire breath.
You’re battered but you’re still standing.
You can do this.
The dragon’s horde glimmers in the darkness of the room.
What do you do?
EMMA: I draw my sword and aim for the dragon’s tail.
ALFRED: [muttering] Tail, okay.
MICHELLE: I use my wizard staff to...repel the dragon’s fire!
ALFRED: Okay, okay.
[clatter of dice being rolled]
Okay!
The dragon is almost defeated.
As you prepare to attack-
[sound of record scratch]
LEO: Hey,
what are you freaks doing?
ALFRED: Oh, uh.
Hey...Leo.
EMMA: We were about to beat the dragon before you got here.
LEO: [scoffing] No you weren’t.
[sound of DND board being flipped]
ALFRED: HEY!
LEO: [mockingly] Are you mad?
You big baby!
This is why you can never make the ultimate frisbee team!
ALFRED: [stammering] Well, uh-
you’re...not gonna make it to practice if you keep messing around!
LEO: [scoffing] Yeah.
Nice comeback nerd.


In [23]:
### Setup OpenAI client and context
#client = OpenAI(api_key="", base_url="https://api.deepseek.com")
client = OpenAI(api_key=api_key)

system_message = f"""You are a professional subtitle translator. \
            You will only receive a string transcription of a vtt file containing subtitles in English. \
            You will only output a {language} translation of the subtitles and bracketed actions. \
            Do not add anything else to your reply.\
            Do not merge sentences, translate each line individually. \
            Return the translated subtitles in the same order and length as the input. \
            Your steps are as follows: \
            1. Parse the input subtitles \
            2. Translate the input subtitles into {trans_lang} \
            3. Alter the translated subtitles into more fluent sentences \
            4. Use the setResult method to output the translated subtitles as a string[].
"""

responses = []
for captions in captions_list:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": captions}
        ]
    )
    responses.append(response)

print(response)

ChatCompletion(id='chatcmpl-BZ42lbIVyhV0rnHtWhrEC3HSBiZgD', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="[Zware, gespannen muziek speelt]  \nALFRED: De draak trekt zich terug,  \nen laat een verschrikkelijke brul horen  \nterwijl hij zich voorbereidt om zijn vuuradem uit te laten.  \nJe bent verslagen maar staat nog steeds.  \nJe kunt dit.  \nDe schat van de draak glinstert in de duisternis van de kamer.  \nWat doe je?  \nEMMA: Ik trek mijn zwaard en richt op de staart van de draak.  \nALFRED: [mompelend] Staart, oké.  \nMICHELLE: Ik gebruik mijn toverstaf om... de vuuradem van de draak af te weren!  \nALFRED: Oké, oké.  \n[geluid van dobbelstenen die worden gegooid]  \nOké!  \nDe draak is bijna verslagen.  \nTerwijl je je voorbereidt om aan te vallen-  \n[geluid van een schrapend record]  \nLEO: Hé,  \nwat doen jullie, freaks?  \nALFRED: Oh, eh.  \nHé... Leo.  \nEMMA: We waren van plan de draak te verslaan voordat je hier kwam.  

In [24]:
### Save translated captions as new vtt file 

## Get GPT response as string, split into list
full_str = ''
for response in responses:
    trans_str = str(response.choices[0].message.content)
    full_str += trans_str

trans_list = full_str.split("\n")
print(trans_list)

## Edit caption files to match translations, accounting for multi-line texts 
trans_vtt = webvtt.read(vtt_path)
line_index = 0
for caption in trans_vtt:
    num_lines = len(caption.text.split("\n"))
    trans_lines = trans_list[line_index:line_index+num_lines]
    caption.text = "\n".join(trans_lines)
    line_index += num_lines

## Save as new file w/ specified language name 
trans_filename = str(os.path.splitext(vtt_name)[0]) + '-' + str(trans_lang) + '.vtt'
trans_path = os.path.join(folder_path, trans_filename)
trans_vtt.save(trans_path)

['[Zware, gespannen muziek speelt]  ', 'ALFRED: De draak trekt zich terug,  ', 'en laat een verschrikkelijke brul horen  ', 'terwijl hij zich voorbereidt om zijn vuuradem uit te laten.  ', 'Je bent verslagen maar staat nog steeds.  ', 'Je kunt dit.  ', 'De schat van de draak glinstert in de duisternis van de kamer.  ', 'Wat doe je?  ', 'EMMA: Ik trek mijn zwaard en richt op de staart van de draak.  ', 'ALFRED: [mompelend] Staart, oké.  ', 'MICHELLE: Ik gebruik mijn toverstaf om... de vuuradem van de draak af te weren!  ', 'ALFRED: Oké, oké.  ', '[geluid van dobbelstenen die worden gegooid]  ', 'Oké!  ', 'De draak is bijna verslagen.  ', 'Terwijl je je voorbereidt om aan te vallen-  ', '[geluid van een schrapend record]  ', 'LEO: Hé,  ', 'wat doen jullie, freaks?  ', 'ALFRED: Oh, eh.  ', 'Hé... Leo.  ', 'EMMA: We waren van plan de draak te verslaan voordat je hier kwam.  ', 'LEO: [spotten] Nee, dat waren jullie niet.  ', '[geluid van DND-bord dat wordt omgedraaid]  ', 'ALFRED: HEY!  ', 