# Helsinki-NLP/opus-mt-en-es & es-en
https://huggingface.co/Helsinki-NLP/opus-mt-es-en  
https://huggingface.co/Helsinki-NLP/opus-mt-en-es


## Load model

In [1]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# English to Spanish
tokenizer_en_es = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-es")
model_en_es = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-es")

# Spanish to English
tokenizer_es_en = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
model_es_en = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-es-en")

## Toy example

In [2]:
# Define your Spanish text
spanish_text = "Tu texto en español aquí."

# Tokenize the input text
inputs = tokenizer_es_en(spanish_text, return_tensors="pt")

# Perform translation
outputs = model_es_en.generate(**inputs)

# Decode the generated output
translated_text = tokenizer_es_en.decode(outputs[0], skip_special_tokens=True)

print("Translated text:", translated_text)

Translated text: Your Spanish text here.


## Spanish train data (to English)

In [3]:
import pandas as pd
import os

In [7]:
%cd

C:\Users\Nadia Timoleon


### Load data to pandas DataFrame

In [4]:
def load_pickle_to_df(data_type):
  pickle_path = "./Documents/GitHub/pan-clef-2024/data/pickle/"
  data_path = os.path.join(pickle_path, data_type)
  df = pd.read_pickle(data_path)
  df = df[['id', 'text']].copy() # select only the 'id' and 'text' columns
  # df.set_index('id', inplace=True) # set 'id' column as index
  return df


### Add column with translation

In [5]:
def translate_text(text, tokenizer, model):
  inputs = tokenizer(text, return_tensors="pt")
  outputs = model.generate(**inputs)
  translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
  return translated_text

In [8]:
df_train_es = load_pickle_to_df('dataset_es_train.pkl')
df_train_es.rename(columns={'text': 'text_es'}, inplace=True)

In [None]:
# Crashes! Sequence too long
df_train_es['text_en'] = df_train_es['text'].apply(translate_text, tokenizer=tokenizer_es_en, model=model_es_en)

df_train_es.head()

In [9]:
# Average-sized sequence
text = df_train_es['text_es'].iloc[390]
print("Text length:", len(text))
translated_text = translate_text(text, tokenizer=tokenizer_es_en, model=model_es_en)
print("Translated text:", translated_text)

Text length: 813
Translated text: The vaccine is to blame for the increase in cases : clear conclusion when seeing the graphs in these countries : in the case of Nigeria it is very clear, in Ethiopia the same happens : they start to vaccinate and the cases rise exponentially. In Ethiopia less than 2% of the population has the two doses. The cases increased a few weeks after the vaccination increased there And now I ask you : if the vaccine does not stop anything but rather increases the number of positives, the passport COVID What is it? What is it for? What guarantees does it have? Why do the vaccinated get sick more? Guarantee to get sick? Why restrict rights to the non-VACONEED if we also know that the Vaccination DOES NOT SINCE? If you are also against the passport covid us in : t. me / NEWS _ DISIDENTES


In [10]:
# Sequence with length > mean + std
text = df_train_es['text_es'].iloc[1372]
print("Text length:", len(text))
translated_text = translate_text(text, tokenizer=tokenizer_es_en, model=model_es_en)
print("Translated text:", translated_text)

Text length: 1631
Translated text: By dismounting the lies of Risto Mejide. The vaccinated people get sick more and die more in relation to covid. From Report 508 of the Ministry of Health? it is deduced that 65.1% of those hospitalized, 50.8% of those admitted to UCI, and 77.1% of the deceased were fully vaccinated. And the difference is still much greater than that reflected by this data, as this report states, I copy textually " A person is considered fully vaccinated 7 days after receiving a second dose of Comirnaty ( Pfizer / BioNTech ) or 14 days after the second dose of Vaxzevria ( Oxford / AstraZeneca ) or Moderna, and if between the first and second dose there has been a minimum interval of 19 days if the first dose of Comurnaty ( Pfizer / BioNTech ) or 21 days after the second dose of Vaxzevria or 25 days after the second dose of Moderna. Also is considered fully vaccinated a person 14 days after the first and second dose of the second dose of vaccine has been given of Jansse