# **Traducción automática con MarianMT**

La librería 'Transformers' creada por Huggungface, contiene un traductor pre-entrenado para muchos pares de idiomas, para mas detalles adjunto los siguientes links:

* [Lista completa de modelos pre-entrenados](https://huggingface.co/models?search=Helsinki-NLP)

* [MarianMT info](https://huggingface.co/transformers/model_doc/marian.html)

* [GitHub Transformers](https://github.com/huggingface/transformers)

<br>
<br>
<br>


In [1]:
#Instalación librerías necesarias
!pip install transformers==3.5.1 -q
!pip install torch==1.4.0 -q

In [2]:
#importamos librerías necesarias
from transformers import MarianMTModel, MarianTokenizer
import torch

<br>
<br>

### **Traducción de Ingles a Español**

In [3]:
#Ajuste del modelo y descarga de pesos pre-entrenados
model_name = 'Helsinki-NLP/opus-mt-en-es'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

In [4]:
#Frases a traducir
en_text = [
           'This is a sentence in english that we want to translate to Spanish',
           'This should also go to Spanish',
           'And this to Spanish'
           ]

translated = model.generate(**tokenizer.prepare_seq2seq_batch(en_text))
es_text_translated = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

for text in es_text_translated:
  print(text)

Esta es una frase en inglés que queremos traducir al español
Esto también debería ir al español
Y esto a español


<br>
<br>

### **Traducción de Español a Ingles**


In [5]:
#Ajuste del modelo y descarga de pesos pre-entrenados
model_name = 'Helsinki-NLP/opus-mt-es-en'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

In [6]:
#Frases a traducir
es_text = [
           'Esta es una frase en español que queremos traducir al ingles',
           'Esto también debería ir al ingles',
           'Y esto a ingles'
           ]

translated = model.generate(**tokenizer.prepare_seq2seq_batch(es_text))
en_text_translated = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

for text in en_text_translated:
  print(text)

This is a sentence in Spanish that we want to translate into English
This should also go to English.
And this is in English.


<br>
<br>

### **Traducción de Español a Frances**

In [7]:
#Ajuste del modelo y descarga de pesos pre-entrenados
model_name = 'Helsinki-NLP/opus-mt-es-fr'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

In [8]:
#Frases a traducir
es_text = [
           'Esta es una frase en español que queremos traducir al ingles',
           'Esto también debería ir al ingles',
           'Y esto a ingles'
           ]

translated = model.generate(**tokenizer.prepare_seq2seq_batch(es_text))
en_text_translated = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

for text in en_text_translated:
  print(text)

C'est une phrase en espagnol que nous voulons traduire en anglais
Ça devrait aller aussi en anglais.
Et ça en anglais.


<br>
<br>

### **Traducción de Español a Ruso**

In [9]:
#Ajuste del modelo y descarga de pesos pre-entrenados
model_name = 'Helsinki-NLP/opus-mt-es-ru'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

In [10]:
#Frases a traducir
es_text = [
           'Esta es una frase en español que queremos traducir al ingles',
           'Esto también debería ir al ingles',
           'Y esto a ingles'
           ]

translated = model.generate(**tokenizer.prepare_seq2seq_batch(es_text))
en_text_translated = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

for text in en_text_translated:
  print(text)

Это предложение на испанском языке, которое мы хотим перевести на английский язык.
Это тоже должно быть на английском.
И это на английском.


<br>
<br>

### **Traducción Múltiple**

In [11]:
from transformers import MarianMTModel, MarianTokenizer
src_text = [
'>>fra<< This is a sentence in english that we want to translate to french',
'>>por<< This should go to portuguese',
'>>esp<< And this to Spanish'
]

model_name = 'Helsinki-NLP/opus-mt-en-roa'
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(f'\nCódigos de lenguajes: {tokenizer.supported_language_codes}\n\n')

model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
data = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

for text in data:
  print(text)


Códigos de lenguajes: ['>>zlm_Latn<<', '>>mfe<<', '>>hat<<', '>>pap<<', '>>ast<<', '>>cat<<', '>>ind<<', '>>glg<<', '>>wln<<', '>>spa<<', '>>fra<<', '>>ron<<', '>>por<<', '>>ita<<', '>>oci<<', '>>arg<<', '>>min<<']


C'est une phrase en anglais que nous voulons traduire en français
Isto deve ir para o português
esto al español


<br>
<br>

### **Lista completa de modelos**

In [12]:
#Instalación librerías necesarias
!pip install huggingface-hub==0.1.1 -q

In [13]:
#Lista de modelos
from huggingface_hub.hf_api import HfApi
model_list = HfApi().list_models()
org = "Helsinki-NLP"
model_ids = [x.modelId for x in model_list if x.modelId.startswith(org)]
suffix = [x.split('/')[1] for x in model_ids]
old_style_multi_models = [f'{org}/{s}' for s in suffix if s != s.lower()]

In [14]:
#Lista de modelos disponibles
model_ids

['Helsinki-NLP/opus-mt-NORTH_EU-NORTH_EU',
 'Helsinki-NLP/opus-mt-ROMANCE-en',
 'Helsinki-NLP/opus-mt-SCANDINAVIA-SCANDINAVIA',
 'Helsinki-NLP/opus-mt-aav-en',
 'Helsinki-NLP/opus-mt-aed-es',
 'Helsinki-NLP/opus-mt-af-de',
 'Helsinki-NLP/opus-mt-af-en',
 'Helsinki-NLP/opus-mt-af-eo',
 'Helsinki-NLP/opus-mt-af-es',
 'Helsinki-NLP/opus-mt-af-fi',
 'Helsinki-NLP/opus-mt-af-fr',
 'Helsinki-NLP/opus-mt-af-nl',
 'Helsinki-NLP/opus-mt-af-ru',
 'Helsinki-NLP/opus-mt-af-sv',
 'Helsinki-NLP/opus-mt-afa-afa',
 'Helsinki-NLP/opus-mt-afa-en',
 'Helsinki-NLP/opus-mt-alv-en',
 'Helsinki-NLP/opus-mt-am-sv',
 'Helsinki-NLP/opus-mt-ar-de',
 'Helsinki-NLP/opus-mt-ar-el',
 'Helsinki-NLP/opus-mt-ar-en',
 'Helsinki-NLP/opus-mt-ar-eo',
 'Helsinki-NLP/opus-mt-ar-es',
 'Helsinki-NLP/opus-mt-ar-fr',
 'Helsinki-NLP/opus-mt-ar-he',
 'Helsinki-NLP/opus-mt-ar-it',
 'Helsinki-NLP/opus-mt-ar-pl',
 'Helsinki-NLP/opus-mt-ar-ru',
 'Helsinki-NLP/opus-mt-ar-tr',
 'Helsinki-NLP/opus-mt-art-en',
 'Helsinki-NLP/opus-mt-ase-d

<br>
<br>

### **Información de librerías instaladas**

In [15]:
pip show transformers torch huggingface_hub

Name: transformers
Version: 3.5.1
Summary: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
Home-page: https://github.com/huggingface/transformers
Author: Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Patrick von Platen, Sylvain Gugger, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors
Author-email: thomas@huggingface.co
License: Apache
Location: /usr/local/lib/python3.7/dist-packages
Requires: tokenizers, sacremoses, requests, regex, protobuf, sentencepiece, tqdm, numpy, packaging, filelock
Required-by: 
---
Name: torch
Version: 1.4.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.7/dist-packages
Requires: 
Required-by: torchvision, torchtext, fastai
---
Name: huggingface-hub
Version: 0.1.1
Summary: Cl