# Translating languages using MarianMT

https://www.kdnuggets.com/how-to-translate-languages-with-marianmt-and-hugging-face-transformers

In [1]:
from transformers import MarianMTModel, MarianTokenizer

# Specify the model name
model_name = "Helsinki-NLP/opus-mt-en-fr"

# Load the tokenizer and model
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

Downloading source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]



Downloading pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

Downloading generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

In [6]:
src_text = ["The enigmatic fox darted through the dense forest, leaving only a whisper of its presence.",
"She meticulously cataloged every shell she found along the windswept shore.",
"A symphony of crickets filled the tranquil night, their song ebbing and flowing like waves.",
"The artist's brushstrokes danced across the canvas, capturing the vibrant hues of the sunset.",
"With a sudden jolt, the old train lurched forward, its wheels squealing against the rusty tracks.",
"The mathematician pondered the complexities of the theorem, her mind racing with infinite possibilities.",
"As the storm raged on, the lighthouse stood resolute, its beacon slicing through the darkness.",
"The aroma of freshly baked bread wafted through the quaint bakery, inviting customers inside.",
"His adventurous spirit compelled him to scale the treacherous peak despite the howling winds.",
"The politician's speech was filled with rhetoric but lacked substantial content.",
"She gazed at the ornate chandelier, its crystals casting a kaleidoscope of colors on the walls.",
"A subtle melancholy settled over the city as the rain began to fall, tapping gently on windows.",
"The detective inspected the crime scene with meticulous attention to detail, searching for any overlooked clues.",
"The ancient manuscript, bound in worn leather, held secrets that had been forgotten for centuries.",
"He navigated the bustling marketplace, weaving between vendors shouting their prices.",
"The gardener's hands were covered in soil, a testament to her hours spent nurturing the rose bushes.",
"A raven perched on the gnarled branch, its beady eyes watching every movement below.",
"The scientist marveled at the newly discovered species, its iridescent scales shimmering under the microscope.",
"An air of anticipation hung in the theater as the audience awaited the curtain's rise.",
"The pianist's fingers flew across the keys, producing a melody that resonated with emotion and grace."]

In [7]:
# Define the source text
#src_text = ["this is a sentence in English that we want to translate to French"]

In [8]:
# Tokenize the source text
inputs = tokenizer(src_text, return_tensors="pt", padding=True)

In [10]:
# Generate the translation
translated = model.generate(**inputs)

In [11]:
# Decode the translated text
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
print(tgt_text)

["Le renard énigmatique traversa la forêt dense, ne laissant qu'un murmure de sa présence.", "Elle a méticuleusement catalogué toutes les coquilles qu'elle a trouvées le long de la rive balayée par le vent.", "Une symphonie de crickets remplissait la nuit tranquille, leur chant s'affaissait et flottait comme des vagues.", "Les coups de pinceau de l'artiste dansaient sur la toile, captant les teintes vibrantes du coucher du soleil.", "Avec une secousse soudaine, l'ancien train s'élança vers l'avant, ses roues s'élancent contre les rails rouillés.", 'Le mathématicien a réfléchi à la complexité du théorème, son esprit courant avec des possibilités infinies.', "Au fur et à mesure que la tempête faisait rage, le phare était résolu, sa balise sillonnant l'obscurité.", "L'arôme de pain fraîchement cuit a balancé à travers la boulangerie pittoresque, invitant les clients à l'intérieur.", "Son esprit aventureux l'obligea à gravir le sommet perfide malgré les vents hurlants.", 'Le discours du po

Let's try Japanese

In [13]:
# Specify the model name
model_name = "Helsinki-NLP/opus-mt-en-jap"

# Load the tokenizer and model
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

Downloading source.spm:   0%|          | 0.00/509k [00:00<?, ?B/s]

Downloading target.spm:   0%|          | 0.00/1.02M [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/1.64M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/43.0 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]



Downloading pytorch_model.bin:   0%|          | 0.00/274M [00:00<?, ?B/s]

Downloading generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

In [17]:
src_text = ["My name is Jack Sparrow and I am going to be the king of the pirates!"]

In [18]:
# Tokenize the source text
inputs = tokenizer(src_text, return_tensors="pt", padding=True)
# Generate the translation
translated = model.generate(**inputs)
# Decode the translated text
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

In [19]:
tgt_text

['わが 名 は レイム と い い, " わたし は 彼 ら の 王 と な り, その は し ため の 王 と な る " と い う.']