# Fine-tune a Marian model pretrained to translate from English to French on the KDE4 dataset

## Setup

In [1]:
import torch
torch.cuda.is_available()

True

In [None]:
!pip install datasets evaluate transformers[sentencepiece]
# !pip install accelerate

# !apt install git-lfs

In [None]:
# # Logged in to git
# !git config --global user.email ""
# !git config --global user.name ""

In [3]:
# Logged in to HuggingFace Hub
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Preparing the data

### Load The KDE4 dataset

In [4]:
from datasets import load_dataset

raw_datasets = load_dataset("kde4", lang1="en", lang2="fr")
raw_datasets

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script:   0%|          | 0.00/4.25k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/5.10k [00:00<?, ?B/s]

The repository for kde4 contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/kde4.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Downloading data:   0%|          | 0.00/7.05M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/210173 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'translation'],
        num_rows: 210173
    })
})

In [5]:
# Take a sample of 50,000 examples from the 'train' split
raw_datasets = raw_datasets["train"].shuffle(seed=42).select(range(50000))
raw_datasets

Dataset({
    features: ['id', 'translation'],
    num_rows: 50000
})

In [6]:
# Display some examples from the sampled dataset
raw_datasets[:5]

{'id': ['98963', '169889', '12433', '16176', '104066'],
 'translation': [{'en': 'Netscape Communicator reg; plugins (for viewing Flash reg;, Real reg; Audio, Real reg; Video, etc.)',
   'fr': 'Netscape Communicator reg; modules externes (pour afficher Flash reg;, Real reg; Audio, Real reg; Video, etc.)'},
  {'en': 'Bring to Front', 'fr': 'Mettre au premier plan'},
  {'en': 'Another reason & konqueror; may not show the file or folder you are looking for is that you may have the View Filter plugin set to display only certain types of file.',
   'fr': "Une autre raison explique que & konqueror; peut ne pas afficher le fichier ou le dossier que vous cherchez lorsque vous êtes censé avoir défini le module graphique Afficher un filtre pour n'afficher que certains types de fichiers."},
  {'en': 'Info is a type of documentation. The documents are in a file format called texinfo, and can be read on the command line with the info program.',
   'fr': 'Info est un type de documentation. Les docume

In [7]:
# Create our own validation set
split_datasets = raw_datasets.train_test_split(train_size=0.8, seed=42)
split_datasets

DatasetDict({
    train: Dataset({
        features: ['id', 'translation'],
        num_rows: 40000
    })
    test: Dataset({
        features: ['id', 'translation'],
        num_rows: 10000
    })
})

In [8]:
# Rename the "test" key to "validation"
split_datasets["validation"] = split_datasets.pop("test")

In [9]:
# Now let’s take a look at one element of the dataset
split_datasets["train"][1]["translation"]

{'en': 'leap year; leap years',
 'fr': 'année bissextile; années bissextilesamount in units (real)'}

In [10]:
from transformers import pipeline

model_checkpoint = "Helsinki-NLP/opus-mt-en-fr"
translator = pipeline("translation", model=model_checkpoint)
translator("Publisher")

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'translation_text': 'Éditeur'}]

In [11]:
split_datasets["train"][111]["translation"]

{'en': 'Mail service', 'fr': 'Service de messagerie'}

In [12]:
translator("The VARA() function calculates the variance based on a sample.")

[{'translation_text': "La fonction VARA() calcule la variance à partir d'un échantillon."}]

### Processing the dataset for Translation

- Texts must be converted into token IDs for model processing.
- Both inputs and targets need tokenization.
- Create a tokenizer object for this task.
- Use the Marian English-to-French pretrained model.
- Adapt the model checkpoint if using a different language pair.

In [13]:
from transformers import AutoTokenizer

model_checkpoint = "Helsinki-NLP/opus-mt-en-fr"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="pt")

In [14]:
# let’s process one sample of each language in the training set
en_sentence = split_datasets["train"][1]["translation"]["en"]
fr_sentence = split_datasets["train"][1]["translation"]["fr"]

print(en_sentence)
print(fr_sentence)

inputs = tokenizer(en_sentence, text_target=fr_sentence)
inputs

leap year; leap years
année bissextile; années bissextilesamount in units (real)


{'input_ids': [34782, 347, 50, 34782, 302, 0], 'attention_mask': [1, 1, 1, 1, 1, 1], 'labels': [927, 6058, 9, 1312, 12892, 50, 655, 6058, 9, 1312, 12892, 5645, 16035, 313, 18, 34, 7722, 24, 158, 253, 28, 0]}

In [15]:
wrong_targets = tokenizer(fr_sentence)
print(tokenizer.convert_ids_to_tokens(wrong_targets["input_ids"]))
print(tokenizer.convert_ids_to_tokens(inputs["labels"]))

['▁an', 'née', '▁bis', 'sex', 'tile', ';', '▁an', 'née', 's', '▁bis', 'sex', 'tile', 's', 'a', 'mount', '▁in', '▁units', '▁(', 'real', ')', '</s>']
['▁année', '▁bis', 's', 'ex', 'tile', ';', '▁années', '▁bis', 's', 'ex', 'tile', 'sa', 'mou', 'nt', '▁in', '▁un', 'its', '▁(', 're', 'al', ')', '</s>']


In [16]:
# Define the preprocessing function for our dataset
max_length = 128

def preprocess_function(examples):
    inputs = [ex["en"] for ex in examples["translation"]]
    targets = [ex["fr"] for ex in examples["translation"]]
    model_inputs = tokenizer(
        inputs,
        text_target=targets,
        max_length=max_length,
        truncation=True
    )
    return model_inputs

In [17]:
# Apply preprocessing on all the splits of our dataset
tokenized_datasets = split_datasets.map(
    preprocess_function,
    batched=True,
    remove_columns=split_datasets["train"].column_names
)

Map:   0%|          | 0/40000 [00:00<?, ? examples/s]

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

In [18]:
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 40000
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 10000
    })
})

## Fine-tuning the model with the Trainer API

In [19]:
# Loading the AutoModel API
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

### Data collation

In [20]:
from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

In [21]:
# Test this on a few samples
batch = data_collator([tokenized_datasets["train"][i] for i in range(1, 3)])
batch.keys()

dict_keys(['input_ids', 'attention_mask', 'labels', 'decoder_input_ids'])

In [22]:
batch["labels"]

tensor([[  927,  6058,     9,  1312, 12892,    50,   655,  6058,     9,  1312,
         12892,  5645, 16035,   313,    18,    34,  7722,    24,   158,   253,
            28,     0],
        [   49, 30145,     9,  8754,     0,  -100,  -100,  -100,  -100,  -100,
          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
          -100,  -100]])

In [23]:
batch["decoder_input_ids"]

tensor([[59513,   927,  6058,     9,  1312, 12892,    50,   655,  6058,     9,
          1312, 12892,  5645, 16035,   313,    18,    34,  7722,    24,   158,
           253,    28],
        [59513,    49, 30145,     9,  8754,     0, 59513, 59513, 59513, 59513,
         59513, 59513, 59513, 59513, 59513, 59513, 59513, 59513, 59513, 59513,
         59513, 59513]])

In [24]:
# labels for the first and second elements in our dataset
for i in range(1, 3):
    print(tokenized_datasets["train"][i]["labels"])

[927, 6058, 9, 1312, 12892, 50, 655, 6058, 9, 1312, 12892, 5645, 16035, 313, 18, 34, 7722, 24, 158, 253, 28, 0]
[49, 30145, 9, 8754, 0]


### Metrics

- The traditional metric used for translation is the BLEU/SacreBLEU score
- The score can go from 0 to 100, and higher is better.

In [25]:
!pip install sacrebleu

Collecting sacrebleu
  Downloading sacrebleu-2.4.3-py3-none-any.whl.metadata (51 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/51.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting portalocker (from sacrebleu)
  Downloading portalocker-2.10.1-py3-none-any.whl.metadata (8.5 kB)
Collecting colorama (from sacrebleu)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Downloading sacrebleu-2.4.3-py3-none-any.whl (103 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/104.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.0/104.0 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Downloading portalocker-2.10.1-py3-none-any.whl (18 kB)
Installing collected packages: portalocker, colorama, sacr

In [26]:
import evaluate

metric = evaluate.load("sacrebleu")

Downloading builder script:   0%|          | 0.00/8.15k [00:00<?, ?B/s]

In [27]:
# Let’s try an example
predictions = [
    "This plugin lets you translate web pages between several languages automatically."
]
references = [
    [
        "This plugin allows you to automatically translate web pages between several languages."
    ]
]
metric.compute(predictions=predictions, references=references)

{'score': 46.750469682990165,
 'counts': [11, 6, 4, 3],
 'totals': [12, 11, 10, 9],
 'precisions': [91.66666666666667,
  54.54545454545455,
  40.0,
  33.333333333333336],
 'bp': 0.9200444146293233,
 'sys_len': 12,
 'ref_len': 13}

In [28]:
# Try with the two bad types of predictions (lots of repetitions or too short)
predictions = ["This this this This This this"]
references = [
    [
        "This plugin allows you to automatically translate web pages between several languages."
    ]
]
metric.compute(predictions=predictions, references=references)

{'score': 2.5275658895144484,
 'counts': [1, 0, 0, 0],
 'totals': [6, 5, 4, 3],
 'precisions': [16.666666666666668, 10.0, 6.25, 4.166666666666667],
 'bp': 0.31140322391459774,
 'sys_len': 6,
 'ref_len': 13}

In [29]:
predictions = ["This plugin"]
references = [
    [
        "This plugin allows you to automatically translate web pages between several languages."
    ]
]
metric.compute(predictions=predictions, references=references)

{'score': 0.0,
 'counts': [2, 1, 0, 0],
 'totals': [2, 1, 0, 0],
 'precisions': [100.0, 100.0, 0.0, 0.0],
 'bp': 0.004086771438464067,
 'sys_len': 2,
 'ref_len': 13}

In [30]:
# Function to Compute metrics
import numpy as np

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    # If the model returns more than the prediction logits
    if isinstance(preds, tuple):
        preds = preds[0]

    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    # Replace -100 in the labels as we can't decode them
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [[label.strip()] for label in decoded_labels]

    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    return {"bleu": result["score"]}

### Fine-tuning the model

In [31]:
from transformers import Seq2SeqTrainingArguments

args = Seq2SeqTrainingArguments(
    f"marian-finetuned-kde4-en-to-fr",
    evaluation_strategy="no",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=3,
    predict_with_generate=True,
    fp16=True,    # Speeds up training on modern GPUs.
    push_to_hub=True,
)



In [32]:
# Pass everything to the Seq2SeqTrainer
from transformers import Seq2SeqTrainer

trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

In [33]:
# Look at the score our model gets before training
trainer.evaluate(max_length=max_length)

{'eval_loss': 1.7118111848831177,
 'eval_bleu': 39.89541150785927,
 'eval_runtime': 726.3351,
 'eval_samples_per_second': 13.768,
 'eval_steps_per_second': 0.216}

In [34]:
# Train the model
trainer.train()

Step,Training Loss
500,1.3929
1000,1.2298
1500,1.1355
2000,1.0441
2500,1.0364
3000,0.949
3500,0.9617


Non-default generation parameters: {'max_length': 512, 'num_beams': 4, 'bad_words_ids': [[59513]], 'forced_eos_token_id': 0}
Non-default generation parameters: {'max_length': 512, 'num_beams': 4, 'bad_words_ids': [[59513]], 'forced_eos_token_id': 0}
Non-default generation parameters: {'max_length': 512, 'num_beams': 4, 'bad_words_ids': [[59513]], 'forced_eos_token_id': 0}
Non-default generation parameters: {'max_length': 512, 'num_beams': 4, 'bad_words_ids': [[59513]], 'forced_eos_token_id': 0}


TrainOutput(global_step=3750, training_loss=1.09770986328125, metrics={'train_runtime': 717.7664, 'train_samples_per_second': 167.185, 'train_steps_per_second': 5.225, 'total_flos': 2347521093402624.0, 'train_loss': 1.09770986328125, 'epoch': 3.0})

In [35]:
# Evaluate our model again and check the BLEU score
trainer.evaluate(max_length=max_length)

{'eval_loss': 1.037054181098938,
 'eval_bleu': 49.64800786424299,
 'eval_runtime': 820.9422,
 'eval_samples_per_second': 12.181,
 'eval_steps_per_second': 0.191,
 'epoch': 3.0}

In [36]:
# Push the model to HuggingFace Hub
trainer.push_to_hub(
    tags="translation, supervised, kde4",
    commit_message="Training Completed"
)

Non-default generation parameters: {'max_length': 512, 'num_beams': 4, 'bad_words_ids': [[59513]], 'forced_eos_token_id': 0}


Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

events.out.tfevents.1724663525.3a9cc48076bd.524.1:   0%|          | 0.00/407 [00:00<?, ?B/s]

events.out.tfevents.1724661978.3a9cc48076bd.524.0:   0%|          | 0.00/7.87k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ashaduzzaman/marian-finetuned-kde4-en-to-fr/commit/017fd6cda2abe22a7db1e83d60db54610d9eebc2', commit_message='Training Completed', commit_description='', oid='017fd6cda2abe22a7db1e83d60db54610d9eebc2', pr_url=None, pr_revision=None, pr_num=None)

## A custom training loop with using 🤗 Accelerate.

### Preparing everything for training

In [37]:
# # Building the DataLoaders from our datasets
# from torch.utils.data import DataLoader

# tokenized_datasets.set_format("torch")

# train_dataloader = DataLoader(
#     tokenized_datasets["train"],
#     shuffle=True,
#     collate_fn=data_collator,
#     batch_size=8
# )

# eval_dataloader = DataLoader(
#     tokenized_datasets["validation"],
#     collate_fn=data_collator,
#     batch_size=8
# )

In [38]:
# # Fine-tuning from the pretrained model again
# model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

In [39]:
# # Set an optimizer
# from transformers import AdamW

# optimizer = AdamW(model.parameters(), lr=2e-5)

In [40]:
# # Instantiates an Accelerator function
# from accelerate import Accelerator

# accelerator = Accelerator()
# model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
#     model, optimizer, train_dataloader, eval_dataloader
# )

In [41]:
# # Compute the number of training steps
# from transformers import get_scheduler

# num_train_epochs = 3
# num_update_steps_per_epoch = len(train_dataloader)
# num_training_steps = num_train_epochs * num_update_steps_per_epoch

# lr_scheduler = get_scheduler(
#     "linear",
#     optimizer=optimizer,
#     num_warmup_steps=0,
#     num_training_steps=num_training_steps
# )

In [42]:
# # Push our model to the Hub
# from huggingface_hub import create_repo, Repository, get_full_repo_name

# repo_id = "marian-finetuned-kde4-en-to-fr-accelerate"
# create_repo(repo_id)

In [43]:
# model_name = "marian-finetuned-kde4-en-to-fr-accelerate"
# repo_name = get_full_repo_name(model_name)
# repo_name

In [44]:
# # Clone that repository in a local folder
# output_dir = "marian-finetuned-kde4-en-to-fr-accelerate"
# repo = Repository(output_dir, clone_from=repo_name)

### Training loop

In [45]:
# # Define a pastprocessing function
# def postprocess(predictions, labels):
#     predictions = predictions.cpu().numpy()
#     labels = labels.cpu().numpy()

#     decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)

#     # Replace -100 in the labels as we can't decode them
#     labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
#     decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

#     # Some simple post-processing
#     decoded_preds = [pred.strip() for pred in decoded_preds]
#     decoded_labels = [[label.strip()] for label in decoded_labels]
#     return decoded_preds, decoded_labels

In [46]:
# # Define a full training loop
# from tqdm.auto import tqdm
# import torch

# progress_bar = tqdm(range(num_training_steps))

# for epoch in range(num_train_epochs):
#     # Training
#     model.train()
#     for batch in train_dataloader:
#         outputs = model(**batch)
#         loss = outputs.loss
#         accelerator.backward(loss)

#         optimizer.step()
#         lr_scheduler.step()
#         optimizer.zero_grad()
#         progress_bar.update(1)

#     # Evaluation
#     model.eval()
#     for batch in tqdm(eval_dataloader):
#         with torch.no_grad():
#             generated_tokens = accelerator.unwrap_model(model).generate(
#                 batch["input_ids"],
#                 attention_mask=batch["attention_mask"],
#                 max_length=max_length
#             )
#         labels = batch["labels"]

#         # Necessary to pad predictions and labels for being gathered
#         generated_tokens = accelerator.pad_across_processes(
#             generated_tokens,
#             dim=1,
#             pad_index=tokenizer.pad_token_id
#         )
#         labels = accelerator.pad_across_processes(labels, dim=1, pad_index=-100)

#         predictions_gathered = accelerator.gather(generated_tokens)
#         labels_gathered = accelerator.gather(labels)

#         decoded_preds, decoded_labels = postprocess(predictions_gathered, labels_gathered)
#         metric.add_batch(predictions=decoded_preds, references=decoded_labels)

#     results = metric.compute()
#     print(f"epoch {epoch}, BLEU score: {results['score']:.2f}")

#     # Save and upload
#     accelerator.wait_for_everyone()
#     unwrapped_model = accelerator.unwrap_model(model)
#     unwrapped_model.save_pretrained(output_dir, save_function=accelerator.save)
#     if accelerator.is_main_process:
#         tokenizer.save_pretrained(output_dir)
#         repo.push_to_hub(
#             commit_message=f"Training in progress epoch {epoch}",
#             blocking=False
#         )

## Using the fine-tuned model with pipeline

In [51]:
from transformers import pipeline

model_checkpoint = "ashaduzzaman/marian-finetuned-kde4-en-to-fr"
translator = pipeline("translation", model=model_checkpoint)
translator("Publisher")

config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/299M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/288 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/842 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.46M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/74.0 [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'translation_text': 'Éditeur'}]

In [52]:
translator(
    "Unable to import %1 using the OFX importer plugin. This file is not the correct format."
)

[{'translation_text': "Impossible d'importer %1 en utilisant le module externe d'importation OFX. Ce fichier n'est pas le bon format."}]

## Create a simple interface for our translation with Gradio


In [None]:
!pip install gradio

In [50]:
import gradio as gr

# Load the translation pipeline with the specified model checkpoint
model_checkpoint = "ashaduzzaman/marian-finetuned-kde4-en-to-fr"
translator = pipeline("translation", model=model_checkpoint)

# Define a function that translates input text
def translate_text(input_text):
    # Translate the input text and return the translated output
    return translator(input_text)[0]['translation_text']

# Create the Gradio interface
iface = gr.Interface(
    fn=translate_text,   # The function to call
    inputs="text",       # Input type: text
    outputs="text",      # Output type: text
    title="English to French Translator",  # Interface title
    description="Translate English text to French using a Marian model."
)

# Launch the interface
iface.launch()

'''
**English input samples to test the translation interface:**

- Hello, how are you?
- Can you help me with this task?
- What time is the meeting tomorrow?
- I would like to order a coffee, please.
- The weather is beautiful today.

'''

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://e4bf1fd6bd9f84a524.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


'\n**English input samples to test the translation interface:**\n\n- Hello, how are you?\n- Can you help me with this task?\n- What time is the meeting tomorrow?\n- I would like to order a coffee, please.\n- The weather is beautiful today.\n\n'