In [1]:
#  Uncomment the line below if you are using Google Colab.
# !pip install transformers

### Use Transformers `AutoTokenizer` and `TFAutoModelForSeq2SeqLM` to Translate Each News Headline.

In [2]:
# Import the Autotokenizer and TFAutoModelForSeq2SeqLM classes from the transformers module.
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM

In [3]:
# Create an instance of the Autotokenizer class using the t5-base model.
tokenizer = AutoTokenizer.from_pretrained("t5-base", model_max_length=256)

In [4]:
# The news headlines to translate.
headlines = [
    'How To Spend More Time With Your Family While Working Remotely',
    'NCAA Football Playoffs Should be Like the NFL',
    'Hacker Pleads Guilty To Stealing Over 100,000 Passwords for Reddit',
    'Lawmakers Want To Boost School Funding To Address Teacher Walkouts',
    'The Best Sub Shops in the Caribbean You Should Visit This Summer',
    'The Dark Side Of The Bitcoin Mining',
    'Treasury Secretary is Confirmed Today',
    'The 5 Best Restaurants In The World',
    'How to Build a Brand for Your Small Business',
    'NY Giants Quarterback Injured After Being Punched By Teammate']

In [5]:
# Create a list to hold the input ids. 
headline_input_ids =[]
# Retrieve the input ids from each headline translation using the translate function.
def create_input_ids(headline):
    # Get input ids using the translate prompt for each headline.
    input_ids = tokenizer(f"translate English to French: {headline}", return_tensors="tf").input_ids
    # Append each input id to the list and return the list.
    return headline_input_ids.append(input_ids)

In [6]:
# Use a for loop to pass each headline to the `create_input_ids` function to create the input ids.
for headline in headlines:
    create_input_ids(headline)

2023-12-14 12:01:50.375770: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [7]:
# Print the headline input ids array
print(headline_input_ids)

[<tf.Tensor: shape=(1, 18), dtype=int32, numpy=
array([[13959,  1566,    12,  2379,    10,   571,   304, 13115,  1537,
         2900,   438,   696,  3712,   818,  7301, 19410,   120,     1]],
      dtype=int32)>, <tf.Tensor: shape=(1, 16), dtype=int32, numpy=
array([[13959,  1566,    12,  2379,    10, 20711, 10929,  2911,  1647,
            7,  5066,    36,  2792,     8, 10439,     1]], dtype=int32)>, <tf.Tensor: shape=(1, 29), dtype=int32, numpy=
array([[13959,  1566,    12,  2379,    10, 12715,    49, 14164,     9,
           26,     7,  2846,   173,    17,    63,   304,  3557,     9,
          697,  2035, 18829,  3424,  6051,     7,    21,  1624,    26,
          155,     1]], dtype=int32)>, <tf.Tensor: shape=(1, 21), dtype=int32, numpy=
array([[13959,  1566,    12,  2379,    10,  2402,  8910,  6834,   304,
            3, 16481,  1121,  3563,    53,   304, 13246, 17476, 10801,
          670,     7,     1]], dtype=int32)>, <tf.Tensor: shape=(1, 19), dtype=int32, numpy=
array([[13959,

In [8]:
# Create an instance of the TFAutoModelForSeq2SeqLM class using the t5-base model.
translation_model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-base")

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [9]:
# Create a list to hold the translated headlines.
translated_headlines = []
# Use the decode function to generate the numerical outputs from the model.
def decode(input_id):
    # Create the output id from the input id
    output_id = translation_model.generate(input_id, max_new_tokens=100)
    # Append each decoded output_id, i.e., translation to the list. 
    translated_headlines.append(tokenizer.decode(output_id[0], skip_special_tokens=True))
    # Return the list. 
    return translated_headlines

In [10]:
# Use a for loop to pass each input id to the decode function. 
for input_id in headline_input_ids:
    decode(input_id)

2023-12-14 12:02:26.320940: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe79a633350 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-12-14 12:02:26.321154: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2023-12-14 12:02:26.324377: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-14 12:02:26.423769: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-12-14 12:02:26.783672: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of 

In [11]:
# Print out each translated headline.
for translation in translated_headlines:
    print(translation)

Comment passer plus de temps avec votre famille en travaillant à distance
Les épreuves de football de la NCAA devraient ressembler à celles de la NFL
Un hacker plaide coupable de vol de plus de 100 000 mots de passe pour Reddit
Les législateurs veulent augmenter le financement des écoles pour lutter contre les marches à pied des enseignants
Les meilleurs magasins de sous-marins des Carabes à visiter cet été
Le côté sombre de l’extraction de Bitcoin
Le Secrétaire du Trésor est confirmé aujourd'hui
Les 5 meilleurs restaurants du monde
Comment créer une marque pour votre petite entreprise
NY Giants : un quart-back blessé après avoir été frappé par un coéquipier


### Use the Transformer Pipeline to Translate Each Headline.

In [12]:
# Import the pipeline class from the transformers module. 
from transformers import pipeline
# Initialize the pipeline to translate using the t5-base model. 
translator = pipeline("translation", model="t5-base")

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [13]:
# Create a list to hold the translated headlines 
translated_headlines = []
# Use the translate function to translate each English headline into French.
def translate(headline):
    # Set the translate prompt to a variable
    text = f"translate English to French: {headline}"
    # Pass the translated variable to the translator method.
    results = translator(text)
    # Return the list with the translated text.
    return translated_headlines.append(results[0]['translation_text'])

In [14]:
# Use a for loop to pass each headline to the translate function.
for headline in headlines:
    translate(headline)

In [15]:
# Print out each translated headline.
for translation in translated_headlines:
    print(translation)

Comment passer plus de temps avec votre famille tout en travaillant à distance
Les épreuves de football de la NCAA devraient ressembler à celles de la NFL
Un hacker plaide coupable de vol de plus de 100 000 mots de passe pour Reddit
Les législateurs veulent accroître le financement des écoles pour faire face aux manifestations des enseignants
Les meilleurs magasins de sous-marins des Carabes à visiter cet été
Le côté sombre de l’extraction de Bitcoin
Le secrétaire du Trésor est confirmé aujourd'hui
Les 5 meilleurs restaurants du monde
Comment bâtir une marque pour votre petite entreprise
Les Giants de New York sont blessés après avoir été frappés par un membre de leur équipe
