# Translation Pipeline in Hugging Face
* Notebook by Adam Lang
* Date: 12/4/2024

# Overview
* In this notebook I will implement a translation pipeline in hugging face using transformers.

In [1]:
## install dependencies
!pip install -U transformers #upgrades
!pip install -U sentencepiece #upgrades
!pip install -U sacremoses #upgrades

Collecting transformers
  Downloading transformers-4.46.3-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.46.3-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m46.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.46.2
    Uninstalling transformers-4.46.2:
      Successfully uninstalled transformers-4.46.2
Successfully installed transformers-4.46.3
Collecting sacremoses
  Downloading sacremoses-0.1.1-py3-none-any.whl.metadata (8.3 kB)
Downloading sacremoses-0.1.1-py3-none-any.whl (897 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m897.5/897.5 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sacremoses
Successfully installed sac

In [2]:
## imports - transformers
from transformers import pipeline
from transformers import set_seed
set_seed(42) #set seed for consistency


import pandas as pd
import numpy as np

# Translation Pipeline
* We will demo how to implement a translation pipeline.
* If we don't set the model name then it defaults to: `google-t5/t5-base and revision a9723ea`
  * model card: https://huggingface.co/google-t5/t5-base

In [4]:
## text to translate
text = """
Dear Amazon, last week I ordered a new pair of alpine skis
from your online store in Seattle. Unfortunately when I opened
the package, I discovered that I had accidentally been sent a Snowboard instead!
"""

In [5]:
## setup translator pipeline -- english to german
translator = pipeline("translation_en_to_de")

# set outputs
outputs = translator(text)
outputs

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'translation_text': 'Dear Amazon, last week I ordered a new pair of alpine skis from your online store in Seattle, unfortunately when I opened the package, I discovered that I had accidentally been sent a Snowboard instead!'}]

Summary:
* However, we can see that the T5-base model did not actually translate the text from english to german.
* So, we need to set the model name with a specific model that translates english to german.

# Using a specific Translation Model: English to German
* We will use a model from the Helsinki family of models which I have actually used quite a bit before.
* The specific model is: `Helsinki-NLP/opus-mt-en-de`
  * model card: https://huggingface.co/Helsinki-NLP/opus-mt-en-de

In [7]:
## setup translator pipeline -- english to german
translator = pipeline("translation_en_to_de",
                      model="Helsinki-NLP/opus-mt-en-de")

# set outputs
from IPython.display import Markdown

outputs = translator(text)
#outputs
## output to markdown
Markdown(outputs[0]["translation_text"])

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Liebe Amazon, letzte Woche habe ich ein neues Paar Alpinski in Ihrem Online-Shop in Seattle bestellt. Leider habe ich beim Öffnen des Pakets entdeckt, dass ich versehentlich ein Snowboard geschickt worden war!