# Lesson 3: Translation and Summarization

- If you would like to run this code on your own machine, you can install the following:

```
    !pip install transformers 
    !pip install torch
```

- Here is some code that suppresses warning messages.

In [1]:
from transformers.utils import logging
logging.set_verbosity_error()

### Build the `translation` pipeline using 🤗 Transformers Library

In [2]:
from transformers import pipeline 
import torch

In [3]:
translator = pipeline(task="translation",
                      model="./models/facebook/nllb-200-distilled-600M",
                      torch_dtype=torch.bfloat16) 

NLLB: No Language Left Behind: ['nllb-200-distilled-600M'](https://huggingface.co/facebook/nllb-200-distilled-600M).



In [4]:
text = """\
Korean is hard, \
You need to study everyday.
If you have a hard time studying.
Best to find a friend to help you. \
Good luck on your journey!"""

In [5]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="fra_Latn")

To choose other languages, you can find the other language codes on the page: [Languages in FLORES-200](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)

For example:
- Afrikaans: afr_Latn
- Chinese: zho_Hans
- Egyptian Arabic: arz_Arab
- French: fra_Latn
- German: deu_Latn
- Greek: ell_Grek
- Hindi: hin_Deva
- Indonesian: ind_Latn
- Italian: ita_Latn
- Japanese: jpn_Jpan
- Korean: kor_Hang
- Persian: pes_Arab
- Portuguese: por_Latn
- Russian: rus_Cyrl
- Spanish: spa_Latn
- Swahili: swh_Latn
- Thai: tha_Thai
- Turkish: tur_Latn
- Vietnamese: vie_Latn
- Zulu: zul_Latn

In [6]:
text_translated

[{'translation_text': 'Le coréen est dur, il faut étudier tous les jours. Si vous avez du mal à étudier. Il vaut mieux trouver un ami pour vous aider. Bonne chance sur votre voyage!'}]

## Free up some memory before continuing
- In order to have enough free memory to run the rest of the code, please run the following to free up memory on the machine.

In [7]:
import gc

In [8]:
del translator

In [9]:
gc.collect()

25

### Build the `summarization` pipeline using 🤗 Transformers Library

In [10]:
summarizer = pipeline(task="summarization",
                      model="./models/facebook/bart-large-cnn",
                      torch_dtype=torch.bfloat16)

Model info: ['bart-large-cnn'](https://huggingface.co/facebook/bart-large-cnn)

In [11]:
text = """Paris is the capital and most populous city of France, with
          an estimated population of 2,175,601 residents as of 2018,
          in an area of more than 105 square kilometres (41 square
          miles). The City of Paris is the centre and seat of
          government of the region and province of Île-de-France, or
          Paris Region, which has an estimated population of
          12,174,880, or about 18 percent of the population of France
          as of 2017."""

In [12]:
summary = summarizer(text,
                     min_length=10,
                     max_length=100)

In [13]:
summary

[{'summary_text': 'Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018. The City of Paris is the centre and seat of the government of the region and province of Île-de-France.'}]

### Try it yourself! 
- Try this model with your own texts!

In [14]:
from transformers import pipeline 
import torch

In [15]:
# Import translator model from Facebook 
translator = pipeline(task="translation",
                      model="./models/facebook/nllb-200-distilled-600M",
                      torch_dtype=torch.bfloat16) 

In [16]:
# Make text to translate 
text_2 = """\
Today is the day to learn something new\
Just learning a new skill little by little\
Can have profound effects on your life\
Enjoy the learning journey"""

In [17]:
#Set Language to translate to and from 
text_translated_Kor = translator(text_2,
                   src_lang="eng_Latn",
                   tgt_lang="kor_Hang")

In [18]:
#Check translation 
text_translated_Kor

[{'translation_text': '오늘 새로운 것을 배울 날이에요. 새로운 기술을 조금씩 배우면, 당신의 삶에 큰 영향을 미칠 수 있습니다.'}]

In [19]:
## Free up memory 

In [20]:
import gc

In [21]:
del translator

In [22]:
gc.collect()

238

--------------------------------------------------------------

In [23]:
# import model from facebook to summarize text 
summarizer = pipeline(task="summarization",
                      model="./models/facebook/bart-large-cnn",
                      torch_dtype=torch.bfloat16)

In [24]:
text_korean = """\
옛날에 큰 호랑이 한 마리가 숲 속에 살았다.
어느 날 호랑이는 배가 고파서 마을로 갔다.
마을 옆 밭에 소 한 마리가 서 있었다.
호랑이는 소를 잡아 먹고 싶은데 갑자기 시끄러운 아기 울음소리를 들었다.
밭 옆에 있는 집에서 아기가 울고 있었다.
호랑이는 집으로 다가갔다.
‘아기가 맛있을 것 같아.’
호랑이는 생각했다."""

In [25]:
## Summarize Text in Korean 
summary = summarizer(text_korean,
                     min_length=10,
                     max_length=100)

In [26]:
summary

[{'summary_text': '‘‘   \xa0‘소 한  하 학 같’ ‘’ \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0호’ 혹  홉 큰   ‘혼  ’” ‘œ’: ‘ታ’, ‘˚˚’. ’ ’‘'}]

## Summary model didn't work as expected
- Can try to load in a different model from Hugging Face and run the code again. 