# Translation and Summarization 🎯 

## Introduction
This notebook demonstrates how to use the Hugging Face Transformers library for `translation and summarization` tasks. We will go through the steps of setting up the environment, preparing the data, building the pipelines, and evaluating the results.

### Install and update the necessary libraries

Here is a brief description of the required libraries:

- The transformers library by Hugging Face provides state-of-the-art pre-trained models for natural language processing (NLP) tasks such as text classification, translation, summarization, and more. 

- The pipeline function from the transformers library by Hugging Face is used to easily access pre-trained models for various natural language processing (NLP) tasks. By calling pipeline, you can quickly load models for tasks like sentiment analysis, text generation, translation, question answering, and more.

- PyTorch is an open-source deep learning framework developed by Facebook's AI Research lab. It provides flexibility and speed for building, training, and deploying deep learning models. 

- The textwrap library in Python provides utilities for formatting and wrapping plain text. It helps in breaking long strings into lines of a specified width, making the text more readable. This is particularly useful for generating neatly formatted text outputs.

In [1]:
%pip install --upgrade transformers torch

Collecting transformers
  Downloading transformers-4.47.1-py3-none-any.whl.metadata (44 kB)
Collecting huggingface-hub<1.0,>=0.24.0 (from transformers)
  Downloading huggingface_hub-0.27.1-py3-none-any.whl.metadata (13 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2024.11.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloading safetensors-0.5.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting tqdm>=4.27 (from transformers)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Downloading transformers-4.47.1-py3-none-any.whl (10.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m65.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading huggin

In [2]:
# Suppress warning messages such as non-critical log messages
from transformers.utils import logging
logging.set_verbosity_error()

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Import the required libraries
from transformers import pipeline
import torch
import textwrap

### Build the `translation` pipeline using 🤗 Transformers Library

Build and train the translation pipeline using the Transformers library.

### Load Translation Model

The NLLB (No Language Left Behind) model has been selected since it offers significant benefits for translation tasks. Info about NLLB  ['nllb-200-distilled-600M'](https://huggingface.co/facebook/nllb-200-distilled-600M).

In [4]:
# Load the translation pipeline using the NLLB model
translator = pipeline(task="translation", 
                      model="facebook/nllb-200-distilled-600M", 
                      torch_dtype=torch.bfloat16)

### Translation Examples
Perform translation on example texts.

 - Example 1: Translation from English to French

In [5]:
# Define the input text
text = """\
My puppy is adorable, \
Your kitten is cute.
Her panda is friendly.
His llama is thoughtful. \
We all have nice pets!"""

In [6]:
print(text)

My puppy is adorable, Your kitten is cute.
Her panda is friendly.
His llama is thoughtful. We all have nice pets!


In [7]:
# Translate the input text from English to French
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="fra_Latn")

# Uncomment the following line to print the text_translated
#text_translated

# Extract and print the translation text
translation_text = text_translated[0]['translation_text']
wrapped_text = textwrap.fill(translation_text, width=50)
print(f"The translation of the given text is:\n{wrapped_text}")

The translation of the given text is:
Mon chiot est adorable, ton chaton est mignon, son
panda est ami, sa lamme est attentive, nous avons
tous de beaux animaux de compagnie.


To choose other languages, see more info on the page: [Languages in FLORES-200](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)



- Example 2: Translation from English to Dutch

In [8]:
# Define the input text
text1 = """
What are you doing today?
"""
print(text1)

# Translate the input text from English to Dutch
text_translated1 = translator(text1, src_lang="eng_Latn", tgt_lang="nld_Latn")  # Dutch

# Uncomment the following line to print the text_translated
#text_translated1

# Extract and print the translation text
translation_text = text_translated1[0]['translation_text']
wrapped_text = textwrap.fill(translation_text, width=50)
print(f"The translation of the given text is:\n{wrapped_text}")


What are you doing today?

The translation of the given text is:
Wat doe je vandaag?


- Free up some memory before continuing with the summarization tasks.

In [9]:
import gc

# Delete the translator model and free up memory
del translator
gc.collect()

85

### Build the `summarization` pipeline using 🤗 Transformers Library

Summarization Pipeline
Build and train the summarization pipeline using the Transformers library.

The BART (Bidirectional and Auto-Regressive Transformers) model has been choosen since it is highly effective for abstractive summarization and other NLP tasks. More info ['bart-large-cnn'](https://huggingface.co/facebook/bart-large-cnn)

In [10]:
# Load the summarization pipeline using the BART model
summarizer = pipeline(task="summarization",
                      model="facebook/bart-large-cnn",
                      torch_dtype=torch.bfloat16)

### Summarization Examples
Perform summarization on example texts.

 - Example 1: Summarization of a Text

In [11]:
# Define the input text
text = """Paris is the capital and most populous city of France, with
          an estimated population of 2,175,601 residents as of 2018,
          in an area of more than 105 square kilometres (41 square
          miles). The City of Paris is the centre and seat of
          government of the region and province of Île-de-France, or
          Paris Region, which has an estimated population of
          12,174,880, or about 18 percent of the population of France
          as of 2017."""

In [12]:
# Summarize the input text
summary = summarizer(text,
                     min_length=10,
                     max_length=100)

# Uncomment the following line to print the summary
#summary

# Extract and print the summary text
summary_text = summary[0]['summary_text']
wrapped_text = textwrap.fill(summary_text, width=50)
print(f"Summary:\n{wrapped_text}")

Summary:
Paris is the capital and most populous city of
France, with an estimated population of 2,175,601
residents as of 2018. The City of Paris is the
centre and seat of the government of the region
and province of Île-de-France.


- Example 2: Summarization of a Text

In [13]:
# Define the input text
text2 = """Amsterdam, the capital of the Netherlands, is known for its picturesque canals,
        historic architecture, and vibrant cultural scene. Founded in the 12th century as 
        a small fishing village, it grew into a major global trading hub during the Dutch Golden 
        Age. Today, Amsterdam is a cosmopolitan city, famous for its rich artistic heritage, 
        with museums like the Van Gogh Museum and Rijksmuseum. The city is also known for its 
        liberal attitudes, such as tolerance of cannabis use and legal sex work, as well 
        as its cycling culture, eco-friendly initiatives, and diverse, international population."""

In [14]:
# Summarize the input text
summary2 = summarizer(text2,
                     min_length=10,
                     max_length=100)

# Uncomment the following line to print the summary2
#summary2

# Extract and print the summary text
summary_text = summary2[0]['summary_text']
wrapped_text = textwrap.fill(summary_text, width=50)
print(f"Summary:\n{wrapped_text}")

Summary:
Amsterdam, the capital of the Netherlands, is
known for its picturesque canals, historic
architecture, and vibrant cultural scene. Founded
in the 12th century as a small fishing village, it
grew into a major global trading hub. Today,
Amsterdam is a cosmopolitan city, famous for its
rich artistic heritage, with museums like the Van
Gogh Museum and Rijksmuseum.


### Conclusion
This notebook demonstrated how to use the Hugging Face Transformers library for translation and summarization tasks.
- The NLLB model is valuable for translating low-resource languages but has challenges. It supports inclusivity, delivers quality translations, and is open-source. However, it may carry biases, struggle with underrepresented languages, and lack documentation. It suits research projects but is less ideal for specialized translations like medical or legal texts.

- The BART model excels at abstractive summarization and other NLP tasks, offering high-quality, flexible, and customizable results backed by an active community. However, its high computational demands, hallucination risk, and struggles with domain-specific or poorly structured inputs are limitations. BART works best for summarizing moderately structured inputs like news articles or reports. It is less suited for unstructured inputs like raw comments or fragmented notes.

### Next Steps
- Try this model with your own texts!
- Experiment with different models and parameters.
