In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# File structure
'''mt5_HuggingFace/
├── clean/
│   ├── es-CL/
│   ├── es-CL/
│   └── ...
└── mt5/
    ├── mt5_finetune.ipynb
    └── my_saved_mt5_model/'''

'mt5_HuggingFace/\n├── clean/\n│   ├── es-CL/\n│   ├── es-CL/\n│   └── ...\n└── mt5/\n    ├── mt5_finetune.ipynb\n    └── my_saved_mt5_model/'

# Instructions to Run This Notebook (Using Pre-trained Model)

These instructions will guide you through running the notebook to use the *already saved* pre-trained MT5 model for translation, skipping the training steps to save 15+ minutes.

### 1. Data and Notebook Access
*   **Share the Saved Model:** Ensure the `mt5` folder containing the saved model and notebook, this is also the default folder to save the model.
*   **Original Data (Optional):** The original data folder (`clean`) are only needed if you intend to run trainning model. __And you also need to modify the data path__.

### 2. Everything Runs in Google Colab

### 3. Mount Google Drive

### 4. Verify Model Path
*   Ensure that the `model_save_path` variable points to the intended location of saved model in Google Drive. Based on __my__ steps, this is `/content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model`.

### 5. Install Required Libraries

### 6. Set Up GPU Runtime

### 7. Run Necessary Cells in Order
*   Since you're using a pre-trained model, you will skip the entire training process.
*   **Minimum cells to run:**
    *   **Mount Drive**
    *   **Load Model & Tokenizer:** This cell should look something like this in my path:
        ```python
        model_path = "/content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model"
        tokenizer = T5TokenizerFast.from_pretrained(model_path)
        model = MT5ForConditionalGeneration.from_pretrained(model_path)
        print(f"Model and tokenizer loaded from: {model_path}")
        ```
    *   **Define `translate_mt5` function**
    *   **Define decoding configs**
    *   **Run translation examples**


### 8. View Output
*   The translation outputs will be printed directly below the relevant cells.

## Overall Notebook Logic and Process Flow

Fine-tuning a pre-trained mT5 model for English to Spanish machine translation, specifically for dialectal variations found in Gnome project data

1.  **Data Loading and Preparation**:
    *   **Source Data**: It loads into Hugging Face `Dataset` objects.
    *   **Dataset Addition**: Multiple dialectal datasets can be loaded and concatenated into a single `all_pairs`.
    *   **Train/Validation Split**

2.  **Model and Tokenizer Initialization**:
    *   **Base Model**: A pre-trained `google/mt5-small` model and its corresponding `T5TokenizerFast` are loaded from Hugging Face Hub. mT5 (Massive Text-to-Text Transfer Transformer) is a multilingual encoder-decoder model suitable for translation tasks.
    *   **Task Prefix**: A `task_prefix` ("translate English to Spanish: ") is defined.
    *   **Tokenization**: A `preprocess_batch` function is defined to tokenize both the English source and Spanish target sentences. __It also adds the task prefix to the English input.__

3.  **Model Training**:
    *   **Data Collator**: `DataCollatorForSeq2Seq`
    *   **Training Arguments**: `Seq2SeqTrainingArguments`
    *   **Trainer Setup**: A `Seq2SeqTrainer`
    *   **Training Execution**: `trainer.train()`

4.  **Model saved to a specified directory on Google Drive**

5.  **Inference and Decoding Strategies**:
    *   **`translate_mt5` Function**: performs translations. It takes an English text, the model, and tokenizer, along with various decoding parameters.
        *   **Greedy Decoding**: Selects the most probable token at each step.
        *   **Beam Search**: Keeps track of multiple probable sequences to find a globally better translation.
        *   **Length Penalty**: Adjusts the likelihood of longer or shorter sequences.


云端硬盘挂载成功后，请提供您要加载的数据文件的完整路径（例如，`/content/drive/My Drive/your_folder/your_file.csv`），我将帮助您将其加载到 pandas DataFrame 中。

# Data Retrieval

In [14]:
import os
from datasets import Dataset
from transformers import MT5ForConditionalGeneration, T5TokenizerFast
from transformers import DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer
import os
import shutil # Import shutil for directory deletion


In [15]:
data_path = "/content/drive/MyDrive/CS4120/clean"
print("Folders:", os.listdir(data_path))

Folders: ['es-CO', 'es-VE', 'es-SV', 'es-UY', 'es-PA', 'std_es', 'es-PE', 'es-HN', 'es-NI', 'es-CR', 'es-DO', 'es-EC', 'es-PR', 'es-CL', 'es-AR']


In [16]:
# Load datasets for all dialects
region_data = {}

if os.path.exists(data_path):
    sub_folders = sorted(os.listdir(data_path))

    for folder_name in sub_folders:
        folder_full_path = os.path.join(data_path, folder_name)

        # Only process folders like es-AR, es-CO, ...
        if os.path.isdir(folder_full_path) and folder_name.startswith("es-"):

            path_en = os.path.join(folder_full_path, "all.en")
            path_es = os.path.join(folder_full_path, "all.es")

            if os.path.exists(path_en) and os.path.exists(path_es):
                with open(path_en, "r", encoding="utf-8") as f:
                    lines_en = f.read().strip().split("\n")

                with open(path_es, "r", encoding="utf-8") as f:
                    lines_es = f.read().strip().split("\n")

                current_pairs = []
                if len(lines_en) == len(lines_es):
                    for en, es in zip(lines_en, lines_es):
                        if en.strip() and es.strip():
                            current_pairs.append({"en": en.strip(), "es": es.strip()})

                region_data[folder_name] = current_pairs

print("Loaded regions that meet the requirements:", list(region_data.keys()))

Loaded regions: ['es-AR', 'es-CL', 'es-CR', 'es-HN', 'es-PA', 'es-PR']


In [None]:
'''all_pairs = []

for region, pairs in region_data.items():
    for p in pairs:
        all_pairs.append({
            "input_text": p["en"],     # English sentence (SOURCE)
            "target_text": p["es"],    # Spanish / dialect sentence (TARGET)
            "region": region,          # Show the region
        })

print("Total training pairs:", len(all_pairs))'''

'all_pairs = []\n\nfor region, pairs in region_data.items():\n    for p in pairs:\n        all_pairs.append({\n            "input_text": p["en"],     # English sentence (SOURCE)\n            "target_text": p["es"],    # Spanish / dialect sentence (TARGET)\n            "region": region,          # Show the region\n        })\n\nprint("Total training pairs:", len(all_pairs))'

In [None]:
'''dataset = Dataset.from_list(all_pairs)

dataset = dataset.train_test_split(test_size=0.1, shuffle=True)

train_ds = dataset["train"]
val_ds = dataset["test"]

train_ds, val_ds'''

'dataset = Dataset.from_list(all_pairs)\n\ndataset = dataset.train_test_split(test_size=0.1, shuffle=True)\n\ntrain_ds = dataset["train"]\nval_ds = dataset["test"]\n\ntrain_ds, val_ds'

In [17]:
# translation is like a text-to-text problem
# input: en_sentence
# output: es_sentence
# import multilingual translation model and the tool needed to prepare text
# the trainning process is to maximize the log-likelihood of the target sequence tokens (cross-entropy).
model_name = "google/mt5-small"

tokenizer = T5TokenizerFast.from_pretrained(model_name)
# model = MT5ForConditionalGeneration.from_pretrained(model_name)



In [18]:
# raw text -> token IDs for subword tokenization and SentencePiece
# Rather than BoW or fixed-length vectors, the model sees a sequence of (subword) indices;
# the transformer turns them into contextual embeddings via self-attention.

max_source_length = 128
max_target_length = 128
task_prefix = "translate English to Spanish: "

def preprocess_batch(batch):
    # 1. Build input (source) text with the translation prefix
    inputs = [task_prefix + s for s in batch["input_text"]]
    targets = batch["target_text"]

    # 2. Tokenize inputs (English to source IDs)
    model_inputs = tokenizer(
        inputs,
        max_length=max_source_length,
        padding="max_length",
        truncation=True,
        return_tensors="pt",
    )

    # 3. Tokenize targets (Spanish/dialect to labels)
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(
            targets,
            max_length=max_target_length,
            padding="max_length",
            truncation=True,
            return_tensors="pt",
        )

    # 4. Attach labels to model inputs
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# use batch and remove the original text col
'''train_tokenized = train_ds.map(
    preprocess_batch,
    batched=True,
    remove_columns=train_ds.column_names,
)

val_tokenized = val_ds.map(
    preprocess_batch,
    batched=True,
    remove_columns=val_ds.column_names,
)'''

'train_tokenized = train_ds.map(\n    preprocess_batch,\n    batched=True,\n    remove_columns=train_ds.column_names,\n)\n\nval_tokenized = val_ds.map(\n    preprocess_batch,\n    batched=True,\n    remove_columns=val_ds.column_names,\n)'

In [19]:
'''
Fore more efficient trainning,
MAX_EXAMPLES_PER_REGION = Large as possible
QUICK_DEBUG = False # epoch will be 3
'''

# Train a separate MT5 model for each dialect/region in `region_data`

from datasets import Dataset
from transformers import (
    MT5ForConditionalGeneration,
    T5TokenizerFast,
    DataCollatorForSeq2Seq,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
)
import os

# Base directory where per-region models will be saved
base_save_dir = "/content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model"
os.makedirs(base_save_dir, exist_ok=True)
MAX_EXAMPLES_PER_REGION = 500

def build_region_dataset(region_name):
    """
    Build a HuggingFace Dataset for a single region only.
    Direction: English (input) -> Spanish/dialect (target).
    """
    pairs = region_data[region_name]   # list of {"en": ..., "es": ...}
    region_examples = []

    for p in pairs:
        region_examples.append(
            {
                "input_text": p["en"],   # English source
                "target_text": p["es"],  # Spanish/dialect target
                "region": region_name,
            }
        )

    dataset_region = Dataset.from_list(region_examples)
    # Subsample to at most MAX_EXAMPLES_PER_REGION to speed up training
    if len(dataset_region) > MAX_EXAMPLES_PER_REGION:
        dataset_region = dataset_region.shuffle(seed=42).select(range(MAX_EXAMPLES_PER_REGION))

    # train/val split
    dataset_region = dataset_region.train_test_split(test_size=0.1, shuffle=True)
    return dataset_region["train"], dataset_region["test"]

# Toggle for very quick debug runs vs. more serious training
QUICK_DEBUG = True

# for region_name in ["es-CL","es-AR","es-MX","es-ES","std_es"]:
for region_name in region_data.keys():
    print(f"\n========== Training MT5 for region: {region_name} ==========")

    # 1. Build dataset only for this dialect (with subsampling inside)
    train_ds, val_ds = build_region_dataset(region_name)

    print(f"{region_name}: {len(train_ds)} train examples, {len(val_ds)} val examples")

    # 2. Tokenize for this region
    train_tokenized = train_ds.map(
        preprocess_batch,
        batched=True,
        remove_columns=train_ds.column_names,
    )
    val_tokenized = val_ds.map(
        preprocess_batch,
        batched=True,
        remove_columns=val_ds.column_names,
    )

    # 3. Fresh MT5 model for this region
    model = MT5ForConditionalGeneration.from_pretrained(model_name)

    # 4. Data collator + training args
    data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

    # 🔹 Lighter training settings
    if QUICK_DEBUG:
        num_epochs = 1
    else:
        num_epochs = 3

    training_args = Seq2SeqTrainingArguments(
        output_dir=f"mt5-gnome-en-es-{region_name}",
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        learning_rate=3e-4,
        num_train_epochs=num_epochs,
        logging_steps=50,
        predict_with_generate=True,
        fp16=False,
    )

    trainer = Seq2SeqTrainer(
        model=model,
        args=training_args,
        train_dataset=train_tokenized,
        eval_dataset=val_tokenized,  # this is fine; no eval during training unless you call evaluate()
        tokenizer=tokenizer,
        data_collator=data_collator,
    )

    # 5. Train only on this dialect
    trainer.train()

    # 6. Save this region’s model into its own directory (final save)
    region_save_dir = os.path.join(base_save_dir, region_name)
    os.makedirs(region_save_dir, exist_ok=True)
    trainer.save_model(region_save_dir)
    tokenizer.save_pretrained(region_save_dir)

    print(f"Saved MT5 model for {region_name} to {region_save_dir}")


es-AR: 450 train examples, 50 val examples


Map:   0%|          | 0/450 [00:00<?, ? examples/s]



Map:   0%|          | 0/50 [00:00<?, ? examples/s]

  trainer = Seq2SeqTrainer(


Step,Training Loss
50,31.8279
100,14.524


Saved MT5 model for es-AR to /content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model/es-AR

es-CL: 450 train examples, 50 val examples


Map:   0%|          | 0/450 [00:00<?, ? examples/s]



Map:   0%|          | 0/50 [00:00<?, ? examples/s]

  trainer = Seq2SeqTrainer(


Step,Training Loss
50,31.8279
100,14.524


Saved MT5 model for es-CL to /content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model/es-CL

es-CR: 450 train examples, 50 val examples


Map:   0%|          | 0/450 [00:00<?, ? examples/s]



Map:   0%|          | 0/50 [00:00<?, ? examples/s]

  trainer = Seq2SeqTrainer(


Step,Training Loss
50,31.8279
100,14.524


Saved MT5 model for es-CR to /content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model/es-CR

es-HN: 450 train examples, 50 val examples


Map:   0%|          | 0/450 [00:00<?, ? examples/s]



Map:   0%|          | 0/50 [00:00<?, ? examples/s]

  trainer = Seq2SeqTrainer(


Step,Training Loss
50,31.8279
100,14.524


Saved MT5 model for es-HN to /content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model/es-HN

es-PA: 450 train examples, 50 val examples


Map:   0%|          | 0/450 [00:00<?, ? examples/s]



Map:   0%|          | 0/50 [00:00<?, ? examples/s]

  trainer = Seq2SeqTrainer(


Step,Training Loss
50,31.8279
100,14.524


Saved MT5 model for es-PA to /content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model/es-PA

es-PR: 450 train examples, 50 val examples


Map:   0%|          | 0/450 [00:00<?, ? examples/s]



Map:   0%|          | 0/50 [00:00<?, ? examples/s]

  trainer = Seq2SeqTrainer(


Step,Training Loss
50,35.6618
100,22.4919


Saved MT5 model for es-PR to /content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model/es-PR


In [None]:
'''# use Seq2SeqTrainer for encoder-decoder models
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

# Seq2SeqTrainingArguments defines all the hyperparameters and strategies for training.
# These arguments control various aspects of the training loop.
training_args = Seq2SeqTrainingArguments(
    output_dir="mt5-gnome-en-es",   # Directory where model checkpoints and logs will be saved.
    per_device_train_batch_size=4,  # Batch size for training on each device (GPU/CPU).
    per_device_eval_batch_size=4,   # Batch size for evaluation on each device.
    learning_rate=3e-4,             # The initial learning rate for the optimizer.
    num_train_epochs=3,             # Total number of training epochs to perform.
    logging_steps=100,              # Number of update steps between two logs.
    eval_strategy="epoch",          # Evaluate the model at the end of each epoch.
    save_strategy="epoch",          # Save the model checkpoint at the end of each epoch.
    predict_with_generate=True,     # Whether to use generate to calculate metrics (useful for sequence generation tasks).
    fp16=False,                     # Whether to use mixed precision training (float16). Set to True for performance on compatible GPUs.
)'''

In [None]:
# apply LR and neural LMs, and minimize cross-entropy
# coder–decoder transformer that learns a conditional distribution P(Spanish token | English tokens)
'''trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized,
    eval_dataset=val_tokenized,
    tokenizer=tokenizer,
    data_collator=data_collator,
)'''

In [None]:
# API key will be required here
'''import wandb
wandb.init(project="your_project_name")
trainer.train()'''



In [None]:
'''
# Define the Google Drive path where to save the model
model_save_path = "/content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model"

# Ensure the parent directory exists
os.makedirs(os.path.dirname(model_save_path), exist_ok=True)


# If a previous run exists, wipe it so we start clean
if os.path.exists(model_save_path):
    print(f"Deleting existing directory: '{model_save_path}' to ensure a clean save.")
    shutil.rmtree(model_save_path)


# Recreate the base directory
os.makedirs(model_save_path, exist_ok=True)

# Save the GLOBAL multi-dialect model at the root
trainer.save_model(model_save_path)
tokenizer.save_pretrained(model_save_path)
print(f"Global model and tokenizer saved to: {model_save_path}")

# Also save the model for each region in its own subfolder
for region_name in region_data.keys():
    region_dir = os.path.join(model_save_path, region_name)

    # Create region-specific directory
    os.makedirs(region_dir, exist_ok=True)

    # Save the same trained model and tokenizer into this region-specific directory
    trainer.save_model(region_dir)
    tokenizer.save_pretrained(region_dir)

    print(f"Region model copy saved to: {region_dir}")'''

# **Start to run from the below if you DON'T want to retrain the model(otherwise, it may take at least 15 minutes).**

In [None]:
# If there's error to load the saved model, you may need to downgrade the colab.
# This is for compatibility issue
'''import transformers
print(transformers.__version__)

!pip install -q "transformers==4.57.1"'''

In [22]:
from transformers import MT5ForConditionalGeneration, T5TokenizerFast
import os

# Load the tokenizer and model from the saved directory
base_save_dir = "/content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model"

# change to 'es-AR', 'es-PR', etc. when you want another dialect
region_name = "es-CL"
model_path = os.path.join(base_save_dir, region_name)

print(f"Attempting to load model from: {model_path}")
if not os.path.exists(model_path):
    print(f"Error: Model path '{model_path}' does not exist.")
elif not os.listdir(model_path):
    print(f"Error: Model path '{model_path}' is empty.")
else:
    print("Directory contents:")
    for item in os.listdir(model_path):
        print("  -", item)

# Load tokenizer from the base MT5 model
base_model_name = "google/mt5-small"
tokenizer = T5TokenizerFast.from_pretrained(base_model_name)

# Load the fine-tuned weights for THIS dialect from the local folder
model = MT5ForConditionalGeneration.from_pretrained(model_path)

print(f"Model loaded successfully from: {model_path}")
print(f"Tokenizer loaded from base model: {base_model_name}")

Attempting to load model from: /content/drive/MyDrive/CS4120/mt5/my_saved_mt5_model/es-CL
Directory contents:
  - config.json
  - generation_config.json
  - model.safetensors
  - tokenizer_config.json
  - special_tokens_map.json
  - spiece.model
  - tokenizer.json
  - training_args.bin




In [23]:
# decoding process to find y_hat = argmaxP(y|x)
#	So use heuristics:
#	Greedy: at each step take the most probable next token.
#	Beam search: keep the top k partial sequences (beam size), expand each, keep top k again.
#	Add length penalties to avoid over-favoring short sequences.

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def translate_mt5(
    text_en, # The English text to be translated
    model,   # The MT5 model used for translation
    tokenizer, # The tokenizer corresponding to the MT5 model
    num_beams=1, # Number of beams for beam search. 1 means greedy decoding.
    do_sample=False, # Whether to use sampling; False for deterministic decoding (beam search/greedy)
    max_length=128, # Maximum length of the generated target sequence
    length_penalty=1, # Penalty for generating longer sequences
    temperature=1, # Controls randomness in sampling. Lower values make output more deterministic.
    top_p=None, # Top-p (nucleus) sampling parameter
):
    # Prepare the input text with the task prefix
    input_text = task_prefix + text_en
    # Tokenize the input text and move it to the appropriate device (CPU/GPU)
    inputs = tokenizer(
        input_text,
        return_tensors="pt", # Return PyTorch tensors
        truncation=True,     # Truncate sequences longer than max_source_length
        max_length=max_source_length,
    ).to(device)

    # Define generation arguments
    gen_kwargs = {
        "max_length": max_length,
        "num_beams": num_beams,
        "length_penalty": length_penalty,
        "do_sample": do_sample,
        "temperature": temperature,
    }

    # Add top_p to generation arguments if specified
    if top_p is not None:
        gen_kwargs["top_p"] = top_p

    # Generate the output sequence (translated text token IDs)
    output_ids = model.generate(**inputs, **gen_kwargs)
    # Decode the generated token IDs back into human-readable text, skipping special tokens
    return tokenizer.decode(output_ids[0], skip_special_tokens=True)

In [24]:
decoding_configs = [
    {"name": "greedy",        "num_beams": 1, "do_sample": False, "length_penalty": 1.0},
    {"name": "beam_4",        "num_beams": 4, "do_sample": False, "length_penalty": 1.0},
    {"name": "beam_8",        "num_beams": 8, "do_sample": False, "length_penalty": 1.0},
    {"name": "beam_4_lp_0.6", "num_beams": 4, "do_sample": False, "length_penalty": 0.6},
    {"name": "beam_4_lp_1.4", "num_beams": 4, "do_sample": False, "length_penalty": 1.4},
    # Optional:
    # {"name": "top_p_0.9", "num_beams": 1, "do_sample": True,  "top_p": 0.9, "temperature": 0.7},
]

In [25]:
# Defind those variable again in here(if you want to use the existing saved model rather than retrainning the model)
max_source_length = 128
max_target_length = 128
task_prefix = "translate English to Spanish: "

In [26]:
# Some test sentences to be translated
test_examples = [
    "Keyboard Accessibility Preferences",
    "Shows the status of keyboard accessibility features",
    "There was an error launching the help viewer.",
]

# Apply the model and get the translation with customized parameters(beam_# ...)
for text in test_examples:
    print(f"\nSOURCE: {text}")
    for cfg in decoding_configs:
        out = translate_mt5(
            text_en=text,
            model=model,
            tokenizer=tokenizer,
            num_beams=cfg.get("num_beams", 1),
            do_sample=cfg.get("do_sample", False),
            length_penalty=cfg.get("length_penalty", 1.0),
            temperature=cfg.get("temperature", 1.0),
            top_p=cfg.get("top_p", None),
        )
        print(f"[{cfg['name']}] {out}")


SOURCE: Keyboard Accessibility Preferences
[greedy] <extra_id_0> . .
[beam_4] <extra_id_0> to Spanish . . . . . . . . . .  . .   .    .    .                                                                       .      .     
[beam_8] <extra_id_0> to Spanish . . . . . . . . . .  . .   .    .     .                                                                      .      .     
[beam_4_lp_0.6] <extra_id_0> . .
[beam_4_lp_1.4] <extra_id_0> to Spanish . . . . . . . . . .  . .   .    .    .                                                                       .      .     

SOURCE: Shows the status of keyboard accessibility features
[greedy] <extra_id_0> Spanish
[beam_4] <extra_id_0> . 
[beam_8] <extra_id_0>. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <extra_id_14>. <extra_id_46>
[beam_4_lp_0.6] <extra_id_0> Spanish
[beam_4_lp_1.4] <extra_id_0>. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 

Output explanation:

1. Greedy: the model simply picks the word with the highest probability as the next word in the sequence. It can be suboptimal because a locally optimal choice at one step might lead to a globally bad translation later on.
2. Beam: Instead of just picking the single best word at each step, beam search keeps track of the num_beams (e.g., 4 or 8) most probable partial translations. Therefore, __it's less likely to get stuck in local optima. Increasing num_beams usually leads to better quality, up to a point.__
3. lp_#: This parameter is used with beam search to influence the length of the generated translation. Models sometimes have a bias towards generating shorter sequences. A higher number will encourage to generate longer sequences. However, __if the outputs are the same, it means the length penalties doesn't alter the most probable sequence for this model__.

# End of code