In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras

keras.utils.set_random_seed(42)

## NLP tasks with pre-trained model

In this colab, we will show to use pre-trained models from [Hugging Face](https://huggingface.co/) for various natural language processing tasks. 

Hugging Face provides a popular open-sourced Python library called "transformers" that is useful for variety of NLP tasks. Hugging Face is also an AI community where people can easily access others' pre-trained models and datasets or share with their own. 

Let's install the package first.

In [None]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.18.0-py3-none-any.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 4.2 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.5.1-py3-none-any.whl (77 kB)
[K     |████████████████████████████████| 77 kB 8.1 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 52.4 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 73.1 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.49-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 72.7 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting uninstall: pyyaml


In [None]:
from transformers import pipeline

We will start with text-generation examples.

## Text Generation

GPT-2 model from [here](https://huggingface.co/gpt2). Other text generation models from [here](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads).

In [None]:
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2')

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
set_seed(42)
result = generator("""The mission of the MIT Sloan School of Management is 
to develop principled, innovative leaders who """, max_length=30, num_return_sequences=1)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
print(result[0]['generated_text'])

The mission of the MIT Sloan School of Management is 
to develop principled, innovative leaders who  are more disciplined, ethical and empathetic


Not bad!!

In [None]:
set_seed(42)
result = generator("""I want to play basketball but """, max_length=30, num_return_sequences=1)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
print(result[0]['generated_text'])

I want to play basketball but  I've never played volleyball in my life so I'd like to play basketball again."
"So, if


Next, we will use a text-generation model ***without any fine-tuning on summarization (input, output) examples *** to summarize the article we saw in the PPT. 

The results will be poor; the intent here is simply to demonstrate how a text-generation model can be adapted for a summarization task using delimiters.

In the next section, we will use a model ***specifically fine-tuned on summarization (input, output) examples *** to demonstrate the impact of and need for fine-tuning.

In [None]:
article = """The only thing crazier than a guy in snowbound Massachusetts 
boxing up the powdery white stuff 
and offering it for sale online? People are actually buying it. 
For $89, self-styled entrepreneur Kyle Waring will ship you 6 pounds 
of Boston-area snow in an insulated Styrofoam box – 
enough for 10 to 15 snowballs, he says.
But not if you live in New England or surrounding states. 
His website and social media accounts claim to have filled more than 133 orders 
for snow – more than 30 on Tuesday alone, his busiest day yet. 
With more than 45 total inches, Boston has set a record this winter 
for the snowiest month in its history. Most residents see the huge piles 
of snow choking their yards and sidewalks as a nuisance, but Waring saw an 
opportunity. According to Boston.com, it all started a few weeks ago, 
when Waring and his wife were shoveling deep snow from their yard in 
Manchester-by-the-Sea, a coastal suburb north of Boston. He joked about 
shipping the stuff to friends and family in warmer states, and an 
idea was born. His business slogan: “Our nightmare is your dream!” 
At first, ShipSnowYo sold snow packed into empty 16.9-ounce water 
bottles for $19.99, but the snow usually melted before it 
reached its destination.
"""


For comparison, this is a human summary of the article:

>Kyle Waring will ship you 6 pounds of Boston-area snow in an insulated Styrofoam box – enough
for 10 to 15 snowballs, he says. But not if you live in New England or surrounding states.

We will try adding "TL;DR" at the end of the article to "guide" the text-generation model. If it has seen lots of English text with "TL;DR", it may "know" to summarize.

In [None]:
set_seed(42)
result = generator(article + " TL;DR: ", max_length=400, num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
print(result[0]['generated_text'])

The only thing crazier than a guy in snowbound Massachusetts 
boxing up the powdery white stuff 
and offering it for sale online? People are actually buying it. 
For $89, self-styled entrepreneur Kyle Waring will ship you 6 pounds 
of Boston-area snow in an insulated Styrofoam box – 
enough for 10 to 15 snowballs, he says.
But not if you live in New England or surrounding states. 
His website and social media accounts claim to have filled more than 133 orders 
for snow – more than 30 on Tuesday alone, his busiest day yet. 
With more than 45 total inches, Boston has set a record this winter 
for the snowiest month in its history. Most residents see the huge piles 
of snow choking their yards and sidewalks as a nuisance, but Waring saw an 
opportunity. According to Boston.com, it all started a few weeks ago, 
when Waring and his wife were shoveling deep snow from their yard in 
Manchester-by-the-Sea, a coastal suburb north of Boston. He joked about 
shipping the stuff to friends and fami

In [None]:
print(result[1]['generated_text'])

The only thing crazier than a guy in snowbound Massachusetts 
boxing up the powdery white stuff 
and offering it for sale online? People are actually buying it. 
For $89, self-styled entrepreneur Kyle Waring will ship you 6 pounds 
of Boston-area snow in an insulated Styrofoam box – 
enough for 10 to 15 snowballs, he says.
But not if you live in New England or surrounding states. 
His website and social media accounts claim to have filled more than 133 orders 
for snow – more than 30 on Tuesday alone, his busiest day yet. 
With more than 45 total inches, Boston has set a record this winter 
for the snowiest month in its history. Most residents see the huge piles 
of snow choking their yards and sidewalks as a nuisance, but Waring saw an 
opportunity. According to Boston.com, it all started a few weeks ago, 
when Waring and his wife were shoveling deep snow from their yard in 
Manchester-by-the-Sea, a coastal suburb north of Boston. He joked about 
shipping the stuff to friends and fami

That was bad. 

Let's try adding "In summary, " as a prompt instead of TL;DR. 

In [None]:
set_seed(42)
result = generator(article + " In summary,  ", max_length=400, num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
print(result[0]['generated_text'])

The only thing crazier than a guy in snowbound Massachusetts 
boxing up the powdery white stuff 
and offering it for sale online? People are actually buying it. 
For $89, self-styled entrepreneur Kyle Waring will ship you 6 pounds 
of Boston-area snow in an insulated Styrofoam box – 
enough for 10 to 15 snowballs, he says.
But not if you live in New England or surrounding states. 
His website and social media accounts claim to have filled more than 133 orders 
for snow – more than 30 on Tuesday alone, his busiest day yet. 
With more than 45 total inches, Boston has set a record this winter 
for the snowiest month in its history. Most residents see the huge piles 
of snow choking their yards and sidewalks as a nuisance, but Waring saw an 
opportunity. According to Boston.com, it all started a few weeks ago, 
when Waring and his wife were shoveling deep snow from their yard in 
Manchester-by-the-Sea, a coastal suburb north of Boston. He joked about 
shipping the stuff to friends and fami

In [None]:
print(result[1]['generated_text'])

The only thing crazier than a guy in snowbound Massachusetts 
boxing up the powdery white stuff 
and offering it for sale online? People are actually buying it. 
For $89, self-styled entrepreneur Kyle Waring will ship you 6 pounds 
of Boston-area snow in an insulated Styrofoam box – 
enough for 10 to 15 snowballs, he says.
But not if you live in New England or surrounding states. 
His website and social media accounts claim to have filled more than 133 orders 
for snow – more than 30 on Tuesday alone, his busiest day yet. 
With more than 45 total inches, Boston has set a record this winter 
for the snowiest month in its history. Most residents see the huge piles 
of snow choking their yards and sidewalks as a nuisance, but Waring saw an 
opportunity. According to Boston.com, it all started a few weeks ago, 
when Waring and his wife were shoveling deep snow from their yard in 
Manchester-by-the-Sea, a coastal suburb north of Boston. He joked about 
shipping the stuff to friends and fami

The summaries are poor but this is expected behavior since we didn't fine-tune the model.

Perhaps GPT-2 didn't see enough passages with "In summary" in its training data.

(Spoiler alert: As we will see in the slides, GPT-3 is able to do a better job even without fine-tuning because it has been trained on more/better data)

---



---


Now, let's see what happens if we use models **pretrained specifically for summarization**.

## Summarization

In [None]:
## using pipeline and task name to load a summarizer from a pre-trained model

summarizer = pipeline("summarization") 

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Output the summarization.

In [None]:
result = summarizer(article, max_length=30, min_length=30, num_return_sequences=4) ##generates the summarization

In [None]:
result[0]['summary_text']

' For $89, Kyle Waring will ship 6 pounds of Boston-area snow in an insulated Styrofoam box . Waring'

In [None]:
result[1]['summary_text']

' For $89, Kyle Waring will ship 6 pounds of Boston-area snow in an insulated Styrofoam box . His website'

You can use other pre-trained models (e.g., [google-pegasus-xsum](https://huggingface.co/google/pegasus-xsum) below). You can find plenty of models [here](https://huggingface.co/models?pipeline_tag=summarization).

In [None]:
summarizer = pipeline("summarization", model = "google/pegasus-xsum") ## change the model to your favorite one

Downloading:   0%|          | 0.00/1.36k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.12G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.36M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

The output is better than before! You can do a deep-dive [here](https://huggingface.co/docs/transformers/task_summary#summarization) for low-level usage and task-specific fine-tuning. 



---
Many other NLP tasks are supported out of the box. We will look at two: 
* Question Answering
* Mask Filling


## Question Answering

In [None]:
model_name = "deepset/roberta-base-squad2"

nlp = pipeline('question-answering', model=model_name)



Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/473M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [None]:
QA_input = {
    'question': 'Why is model conversion important?',
    'context': '''The option to convert models between FARM and transformers 
    gives freedom to the user and let people easily switch 
    between frameworks.'''
}


In [None]:
nlp(QA_input)

{'answer': 'gives freedom to the user',
 'end': 89,
 'score': 0.44885993003845215,
 'start': 64}

## Fill Masks

In [None]:
unmasker = pipeline('fill-mask', model='bert-base-uncased')

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

In [None]:
unmasker("Paris is the [MASK] of France.")

[{'score': 0.9969369173049927,
  'sequence': 'paris is the capital of france.',
  'token': 3007,
  'token_str': 'capital'},
 {'score': 0.0005914870998822153,
  'sequence': 'paris is the heart of france.',
  'token': 2540,
  'token_str': 'heart'},
 {'score': 0.00043787728645838797,
  'sequence': 'paris is the center of france.',
  'token': 2415,
  'token_str': 'center'},
 {'score': 0.00033783717663027346,
  'sequence': 'paris is the centre of france.',
  'token': 2803,
  'token_str': 'centre'},
 {'score': 0.0002699585456866771,
  'sequence': 'paris is the city of france.',
  'token': 2103,
  'token_str': 'city'}]

## Fine-tuning GPT-2 

We will quickly show how to run a sentence through GPT-2 and get the final embeddings. 

You can set up a downstream NN and use these final embeddings as input and your problem's labels as output (like we did with BERT in HW2).

In [None]:
from transformers import GPT2Tokenizer, TFGPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = TFGPT2Model.from_pretrained('gpt2')

Downloading:   0%|          | 0.00/475M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFGPT2Model.

All the layers of TFGPT2Model were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2Model for predictions without further training.


In [None]:
text = ["cat sat on the mat"]
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

In [None]:
encoded_input

{'input_ids': <tf.Tensor: shape=(1, 5), dtype=int32, numpy=array([[9246, 3332,  319,  262, 2603]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 5), dtype=int32, numpy=array([[1, 1, 1, 1, 1]], dtype=int32)>}

In [None]:
temp = output.last_hidden_state

In [None]:
temp.shape

TensorShape([1, 5, 768])

Note above that we have a 768-long contextual embedding for each of the 5 words in the input.

We can use these to do sequence classification.

(Question: Since there's no CLS token in the input, how will we come up with an embedding for the whole sentence?)

We can use these to fine-tune GPT-2 with our own summarization datasets etc. 