# Lesson 3: Translation and Summarization 🎯 

Requirements: If you would like to run this code on your own machine, you can install the following:

- The trasformers library is need to use the pipeline API (available on the Hugging Face website).
- The torch library (or PyTorch) is used for building, training, and deploying deep learning models efficiently, with strong support for hardware acceleration, and it is widely adopted for research and development in machine learning.


In [1]:
%pip install transformers 
%pip install torch

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


- Here is some code that suppresses warning messages.

In [2]:
#to suppress non- critical log messages
from transformers.utils import logging
logging.set_verbosity_error()

  from .autonotebook import tqdm as notebook_tqdm


### Build the `translation` pipeline using 🤗 Transformers Library

In [3]:
from transformers import pipeline 
import torch

NLLB: No Language Left Behind: ['nllb-200-distilled-600M'](https://huggingface.co/facebook/nllb-200-distilled-600M).

To get the code of the model go in the section transformers of the selected model. 

In [4]:
translator = pipeline(task="translation",
                      model="facebook/nllb-200-distilled-600M",
                      #model="./models/facebook/nllb-200-distilled-600M",
                      torch_dtype=torch.bfloat16) #the model should use bfloat16 precision for tensor calculations

- Example1:Translation pf a text from English to French

In [5]:
text = """\
My puppy is adorable, \
Your kitten is cute.
Her panda is friendly.
His llama is thoughtful. \
We all have nice pets!"""

In [6]:
print(text)

My puppy is adorable, Your kitten is cute.
Her panda is friendly.
His llama is thoughtful. We all have nice pets!


In [7]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="fra_Latn")

In [8]:
text_translated

[{'translation_text': 'Mon chiot est adorable, ton chaton est mignon, son panda est ami, sa lamme est attentive, nous avons tous de beaux animaux de compagnie.'}]

In [9]:
#yExtract and print just the relevant string in a clean format, thus only printing the translation_text directly.

# Extract the translation text
translation_text = text_translated[0]['translation_text']

# Print the text in a clean format
#print(translation_text)

#To format the output properly, especially in order that the text appears in multiple lines 
import textwrap

# Wrap the text into lines of a specified width (e.g., 50 characters per line)
wrapped_text = textwrap.fill(translation_text, width=50)

# Print the nicely formatted text
#print(wrapped_text)
print(f"The translation of the given text is:\n{wrapped_text}")

The translation of the given text is:
Mon chiot est adorable, ton chaton est mignon, son
panda est ami, sa lamme est attentive, nous avons
tous de beaux animaux de compagnie.


To choose other languages, you can find the other language codes on the page: [Languages in FLORES-200](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)



- Example2:Translation of a text from English to Dutch

In [10]:
text1 = """\
What are you doing today?"""

In [11]:
text_translated1 = translator(text1,
                             src_lang="eng_Latn",
                             tgt_lang="nld_Latn") #Dutch

In [12]:
text_translated1

[{'translation_text': 'Wat doe je vandaag?'}]

In [13]:
# Extract the translation text
translation_text = text_translated1[0]['translation_text']

# Print the text in a clean format
#print(translation_text)

# Wrap the text into lines of a specified width (e.g., 50 characters per line)
wrapped_text = textwrap.fill(translation_text, width=50)

# Print the nicely formatted text
#print(wrapped_text)
print(f"The translation of the given text is:\n{wrapped_text}")

The translation of the given text is:
Wat doe je vandaag?


## Free up some memory before continuing
- In order to have enough free memory to run the rest of the code, please run the following to free up memory on the machine.

In [14]:
import gc

del translator

In [15]:
del translator

In [16]:
gc.collect()

0

### Build the `summarization` pipeline using 🤗 Transformers Library

Model info: ['bart-large-cnn'](https://huggingface.co/facebook/bart-large-cnn)

In [17]:
summarizer = pipeline(task="summarization",
                      model="facebook/bart-large-cnn",
                      #model="./models/facebook/bart-large-cnn",
                      torch_dtype=torch.bfloat16)

- Example1:Summarization of a text

In [18]:
text = """Paris is the capital and most populous city of France, with
          an estimated population of 2,175,601 residents as of 2018,
          in an area of more than 105 square kilometres (41 square
          miles). The City of Paris is the centre and seat of
          government of the region and province of Île-de-France, or
          Paris Region, which has an estimated population of
          12,174,880, or about 18 percent of the population of France
          as of 2017."""

In [19]:
summary = summarizer(text,
                     min_length=10,
                     max_length=100)

In [20]:
summary

[{'summary_text': 'Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018. The City of Paris is the centre and seat of the government of the region and province of Île-de-France.'}]

In [21]:
# Extract the summary text
summary_text = summary[0]['summary_text']

# Wrap the text into lines of a specified width (e.g., 50 characters per line)
wrapped_text = textwrap.fill(summary_text, width=50)

# Print the nicely formatted text
print(f"Summary:\n{wrapped_text}")


Summary:
Paris is the capital and most populous city of
France, with an estimated population of 2,175,601
residents as of 2018. The City of Paris is the
centre and seat of the government of the region
and province of Île-de-France.


- Example2:Summarization of a text

In [22]:
text2 = """Amsterdam, the capital of the Netherlands, is known for its picturesque canals,
        historic architecture, and vibrant cultural scene. Founded in the 12th century as 
        a small fishing village, it grew into a major global trading hub during the Dutch Golden 
        Age. Today, Amsterdam is a cosmopolitan city, famous for its rich artistic heritage, 
        with museums like the Van Gogh Museum and Rijksmuseum. The city is also known for its 
        liberal attitudes, such as tolerance of cannabis use and legal sex work, as well 
        as its cycling culture, eco-friendly initiatives, and diverse, international population."""

In [23]:
summary2 = summarizer(text2,
                     min_length=10,
                     max_length=100)

In [24]:
summary2

[{'summary_text': 'Amsterdam, the capital of the Netherlands, is known for its picturesque canals, historic architecture, and vibrant cultural scene. Founded in the 12th century as a small fishing village, it grew into a major global trading hub. Today, Amsterdam is a cosmopolitan city, famous for its rich artistic heritage, with museums like the Van Gogh Museum and Rijksmuseum.'}]

In [25]:
# Extract the summary text
summary_text = summary2[0]['summary_text']

# Wrap the text into lines of a specified width (e.g., 50 characters per line)
wrapped_text = textwrap.fill(summary_text, width=50)

# Print the nicely formatted text
print(f"Summary:\n{wrapped_text}")

Summary:
Amsterdam, the capital of the Netherlands, is
known for its picturesque canals, historic
architecture, and vibrant cultural scene. Founded
in the 12th century as a small fishing village, it
grew into a major global trading hub. Today,
Amsterdam is a cosmopolitan city, famous for its
rich artistic heritage, with museums like the Van
Gogh Museum and Rijksmuseum.


### Try it yourself! 
- Try this model with your own texts!