<a href="https://colab.research.google.com/github/alexfazio/srt-GPT-translator/blob/main/srt_gpt_translator_claude.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# srt-GPT-Translator (Claude 3)

By Alex Fazio (https://www.linkedin.com/in/alxfazio)

Github repo: https://github.com/alexfazio/srt-GPT-Translator

## Description

Translate `.srt` files from any language to any language using Anthropic (Claude) or OpenAI APIs.

This notebook provides a seamless way to translate `.srt` subtitle files between any supported languages using state-of-the-art language models from Anthropic (Claude) or OpenAI. With just a few simple steps, you can quickly and accurately translate your subtitles, making your content accessible to a wider audience.

Feel free to contribute to the project by submitting pull requests or reporting issues on the GitHub repository. Your feedback and contributions are greatly appreciated!

## Instructions

To translate an `.srt` file:

1. Upload the `.srt` file you want to translate to your Colab folders.
2. In the notebook, insert your Claude API key in the designated cell.
3. Select the desired version of Claude 3 from the `GENERATION_MODEL` dropdown.
4. Specify the pathname of the uploaded `.srt` file in the `SRT_PATHNAME` variable.
5. Choose the target language for translation by selecting or typing the desired language in the `OUTPUT_LANGUAGE` variable.
6. Run all the cells in the notebook by selecting `Runtime > Run all` from the menu or executing each cell individually.
7. The translated `.srt` file will be generated and available for download.

## Upcoming Enhancements

- Implement an API selector to choose between OpenAI and Anthropic APIs
- (Optional and Low Priority) Implement a string cleaner for improved output (Reference: https://x.com/mattshumer_/status/1777541303288340613)
- Make the API key either an environment variable or provide a custom form space for the end user to input their key
- Implement context length selector, a toggle or slider that decides how many context words are given to the text. This is the number of segments used to flank the target segments and pass to the LLM for richer context reference during translation. The higher the context segments the slower the process, and more expensive in terms of API calls usage, but more precise
- Support for bilingual subtitles output.

## API Documentation

For more information on using the Anthropic API, refer to the official documentation:
https://docs.anthropic.com/claude/page/polyglot-superpowers

In [None]:
# @title Adjust settings here

ANTHROPIC_API_KEY = '' #@param {type: 'string'}
GENERATION_MODEL = "claude-3-haiku-20240307" # @param ["claude-3-haiku-20240307", "claude-3-sonnet-20240229", "claude-3-opus-20240229"] {type:"string"}
SRT_PATHNAME = 'altman-sub.srt' #@param {type: 'string'}
OUTPUT_LANGUAGE = "Italiano" # @param ["Italiano", "English", "中文 (Zhōngwén)", "हिन्दी (Hindī)", "Español", "Français"] {type:"string"}



In [None]:
# @title Install Required Libraries and Import Necessary Modules
!pip install anthropic
from tqdm import tqdm

In [None]:
# @title Convert `.srt` File Into Segments

def srt_to_segments(SRT_PATHNAME):
    all_text = []  # Store all text lines for the 'text' key
    segments = []
    segment_id = 0  # Initialize segment ID
    with open(SRT_PATHNAME, 'r') as file:
        srt_content = file.read().strip()

    entries = srt_content.split('\n\n')

    for entry in entries:
        lines = entry.split('\n')
        if len(lines) >= 3:
            time_range, text_lines = lines[1], lines[2:]
            start_str, end_str = time_range.split(' --> ')
            start = srt_time_to_seconds(start_str)
            end = srt_time_to_seconds(end_str)
            text = ' '.join(text_lines).replace('\\N', '\n')
            all_text.append(text)

            # Use None for placeholders for the additional info
            segments.append({
                'id': segment_id,
                'seek': None,  # Placeholder for 'seek' value now None
                'start': start,
                'end': end,
                'text': text,
                # Updating placeholders to None
                'tokens': None,  # Placeholder for tokens now None
                'temperature': None,  # Placeholder for temperature now None
                'avg_logprob': None,  # Placeholder for avg_logprob now None
                'compression_ratio': None,  # Placeholder for compression ratio now None
                'no_speech_prob': None,  # Placeholder for no_speech_prob now None
            })
            segment_id += 1

    # Compile the final dictionary with all text and segments
    final_output = {
        'text': ' '.join(all_text),
        'segments': segments
    }

    return final_output

def srt_time_to_seconds(time_str):
    parts = time_str.split(':')
    seconds, milliseconds = parts[2].split(',')
    total_seconds = 3600 * int(parts[0]) + 60 * int(parts[1]) + int(seconds) + float(milliseconds) / 1000
    return total_seconds

In [None]:
# @title Extract segments of subtitles and the surrounding context for the purpose of translation

def context(subtitles_dict):
    """
    Generator function to yield the context and target segment for each subtitle segment in the dictionary,
    specifically treating the first and last segments according to the specified requirements.

    Args:
        subtitles_dict (dict): A dictionary containing subtitle segments with their text, start, and end times.

    Yields:
        tuple: A tuple containing the context text, the target segment text, and the target segment id.
    """

    segments = subtitles_dict['segments']  # Shortcut for easier access
    num_segments = len(segments)

    for i in range(num_segments):
        # First segment translation case
        if i == 0:
            context_text = segments[i]['text']
            target_text = segments[i]['text']
            target_id = segments[i]['id']

        # Last segment translation case
        elif i == num_segments - 1:
            context_texts = [segments[i-1]['text'], segments[i]['text']]
            context_text = " ".join(context_texts)
            target_text = segments[i]['text']
            target_id = segments[i]['id']

        # Middle segments translation case
        else:
            context_texts = [segments[i-1]['text'], segments[i]['text'], segments[i+1]['text']]
            context_text = " ".join(context_texts)
            target_text = segments[i]['text']
            target_id = segments[i]['id']

        yield context_text, target_text, target_id

In [None]:
# @title API Call for translation

import anthropic

def translate_segment(context_texts, target_segment_text):
    """
    Translates a subtitle segment to target language using the Claude-3 model from Anthropic.

    Args:
        context_texts (str): The surrounding text for context.
        target_segment_text (str): The subtitle segment to be translated.

    Returns:
        str: The translated subtitle segment in target language.
    """

    client = anthropic.Anthropic(
        api_key=ANTHROPIC_API_KEY,
    )
    system_prompt = f"Translate the provided subtitle segment, '{target_segment_text}', into '{OUTPUT_LANGUAGE}'. Use the surrounding context, '{context_texts}', to inform your translation and ensure it fits cohesively within the larger conversation. Craft your translation to capture the meaning, tone, and any idiomatic expressions present in the original text. Do not translate the entire context, only the specified segment.\n\nWhen translating, follow these guidelines:\n- Only output the translated text, without any labels or quotes.\n- If the subtitle segment does not contain a complete sentence, do not end the translation with a period or other punctuation.\n- However, if the segment contains one or more complete sentences, end each sentence with a period, even if the final sentence is a fragment."

    try:
        message = client.messages.create(
            model=GENERATION_MODEL,
            max_tokens=2000,
            temperature=0.2,
            system=system_prompt,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": f"'{target_segment_text}' --> '{OUTPUT_LANGUAGE}'"
                        }
                    ]
                }
            ]
        )
        # Extract the translated text from the API response
        translated_text = message.content[0].text.strip()
        return translated_text
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return None

In [None]:
# @title Main

from tqdm import tqdm  # Ensure tqdm is imported

# RESETS DICTIONARIES
dict = srt_to_segments(SRT_PATHNAME)
trans_dict = dict

# Initialize the context generator with the source dictionary to maintain its integrity
context_generator = context(dict)  # Use 'dict' for generating context_texts

# Wrap the generator with tqdm to display the progress bar
# Ensure that you pass the total number of items if known, to improve the accuracy of the progress estimation
total_segments = len(dict['segments'])  # Assuming dict structure contains a 'segments' list
for context_texts, t_seg_text, t_target_id in tqdm(context_generator, total=total_segments):
    translated_segment = translate_segment(context_texts, t_seg_text)
    segment = next((seg for seg in trans_dict['segments'] if seg['id'] == t_target_id), None)
    if segment is not None:
        segment['text'] = translated_segment
    else:
        print(f"Segment with ID {t_target_id} not found in trans_dict.")

In [None]:
# @title Parse result to output srt file

def write_translated_srt(translated_dict, output_file_path):
    """
    Writes the translated subtitles back into an SRT file format.

    Args:
        translated_dict (dict): Dictionary containing the translated subtitles and their timings.
        output_file_path (str): The path where the translated SRT file should be saved.
    """
    with open(output_file_path, 'w') as file:
        for segment in translated_dict['segments']:
            # SRT segment number
            file.write(f"{segment['id'] + 1}\n")
            # SRT timing format
            start_srt = seconds_to_srt_time(segment['start'])
            end_srt = seconds_to_srt_time(segment['end'])
            file.write(f"{start_srt} --> {end_srt}\n")
            # Translated text
            file.write(f"{segment['text']}\n\n")

def seconds_to_srt_time(seconds):
    """
    Converts seconds to SRT time format (HH:MM:SS,MMM).

    Args:
        seconds (float): Time in seconds.

    Returns:
        str: Time in SRT format.
    """
    hours, remainder = divmod(int(seconds), 3600)
    minutes, seconds = divmod(remainder, 60)
    milliseconds = int((seconds - int(seconds)) * 1000)
    return f"{hours:02}:{minutes:02}:{int(seconds):02},{milliseconds:03}"

# Example usage:
output_srt_path = 'translated_subtitles.srt'  # Specify the output file name
write_translated_srt(trans_dict, output_srt_path)