# Workshop 1 - Summarization 

In [1]:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM, GenerationConfig

## T5 Models

The <code>flan-t5</code> is a Text-To-Text Transfer Transformer (T5) that is capable of performing zero-shot NLP task such as summary, simple reasoninig, answering questions, etc. 

Some T5 models from Huggingface
- [<code>google/flan-t5-base</code>](https://huggingface.co/google/flan-t5-base)
- [<code>google/flan-t5-small</code>](https://huggingface.co/google/flan-t5-small)
- [<code>google/flan-t5-xl</code>](https://huggingface.co/google/flan-t5-xl)
- [<code>google/flan-t5-xxl</code>](https://huggingface.co/google/flan-t5-xxl) - full model

Complete list of [T5 models](https://huggingface.co/models?search=google/flan) on Huggingface.

In [2]:
model_name = 'google/flan-t5-base'

In [3]:
# TODO: Load tokenizer and model
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [None]:
# TODO: Print the model
print(model)

In [None]:
text = """ 
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
"""

In [20]:
text = """ 
When a traveler in north central Massachusetts takes the wrong fork
at the junction of the Aylesbury pike just beyond Dean's Corners he
comes upon a lonely and curious country. The ground gets higher, and
the brier-bordered stone walls press closer and closer against the ruts
of the dusty, curving road. The trees of the frequent forest belts
seem too large, and the wild weeds, brambles, and grasses attain a
luxuriance not often found in settled regions. At the same time the
planted fields appear singularly few and barren; while the sparsely
scattered houses wear a surprizing uniform aspect of age, squalor, and
dilapidation. Without knowing why, one hesitates to ask directions
from the gnarled, solitary figures spied now and then on crumbling
doorsteps or in the sloping, rock-strewn meadows. Those figures are
so silent and furtive that one feels somehow confronted by forbidden
things, with which it would be better to have nothing to do. When a
rise in the road brings the mountains in view above the deep woods,
the feeling of strange uneasiness is increased. The summits are too
rounded and symmetrical to give a sense of comfort and naturalness, and
sometimes the sky silhouettes with especial clearness the queer circles
of tall stone pillars with which most of them are crowned.
"""

In [None]:
#text = "Mr. Dursley was the director of a ﬁrm called Grunnings, which made drills. He was a big, beefy man with hardly any neck, although he did have a very large mustache. Mrs. Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbors. The Dursleys had a small son called Dudley and in their opinion there was no ﬁner boy anywhere."

In [26]:
text = "The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didn’t think they could bear it if anyone found out about the Potters. Mrs. Potter was Mrs. Dursley’s sister, but they hadn’t met for several years; in fact, Mrs. Dursley pretended she didn’t have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didn’t want Dudley mixing with a child like that."

## Templates
The following Github repository contains a list of prompt templates that you can use with T5.

[https://github.com/google-research/FLAN/blob/main/flan/templates.py](https://github.com/google-research/FLAN/blob/main/flan/templates.py)

Look through them and select a summarization template.

In [27]:
# TODO: Create a prompt
prompt = f"Write a short summary for this text: {text}"

print(prompt)


Write a short summary for this text: The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didn’t think they could bear it if anyone found out about the Potters. Mrs. Potter was Mrs. Dursley’s sister, but they hadn’t met for several years; in fact, Mrs. Dursley pretended she didn’t have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didn’t want Dudley mixing with a child like that.


In [28]:
# TODO: tokenize the text
enc_prompt = tokenizer(prompt, return_tensors='pt')
print(enc_prompt)

{'input_ids': tensor([[ 8733,     3,     9,   710,  9251,    21,    48,  1499,    10,    37,
          8633,  8887,     7,   141,   762,    79,  1114,     6,    68,    79,
            92,   141,     3,     9,  2829,     6,    11,    70,  4016,  2971,
            47,    24, 10843,   133,  2928,    34,     5,   328,   737,    22,
            17,   317,    79,   228,  4595,    34,     3,    99,  1321,   435,
            91,    81,     8, 16023,     7,     5,  8667,     5, 16023,    47,
          8667,     5,  8633,  8887,    22,     7,  4806,     6,    68,    79,
         12381,    22,    17,  1736,    21,   633,   203,   117,    16,   685,
             6,  8667,     5,  8633,  8887,   554, 15443,   255,   737,    22,
            17,    43,     3,     9,  4806,     6,   250,   160,  4806,    11,
           160,   207,    18,  1161,    18,    29,    32,  8052,  2553,   130,
            38,    73,   308,   450,  8887,  1273,    38,    34,    47,   487,
            12,    36,     5,    37,  

In [16]:
# TODO: Decode the token
print(tokenizer.decode(enc_prompt.input_ids[0]))

Write a short summary for this text: The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didn’t think they could bear it if anyone found out about the Potters. Mrs. Potter was Mrs. Dursley’s sister, but they hadn’t met for several years; in fact, Mrs. Dursley pretended she didn’t have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didn’t want Dudley mixing with a child like that.</s>


In [29]:
# TODO: Generate summary with model 
enc_summary = model.generate(enc_prompt.input_ids)

In [25]:
print(enc_summary)

tensor([[    0,    37,   684,    19,     3,     9,     3,     7, 28426,    11,
            73, 20905,   286,     5,     1]])


In [30]:
# TODO: Decode the summary
print(enc_summary)
summary = tokenizer.decode(enc_summary[0], skip_special_tokens=True)
print(summary)

tensor([[    0,    37,  8633,  8887,     7,   141,     3,     9,  2829,     6,
            11,    70,  4016,  2971,    47,    24, 10843,   133,  2928,    34,
             5]])
The Dursleys had a secret, and their greatest fear was that somebody would discover it.


## Configuration
- Set the temperate 
- Top P
- Top K

In [None]:
# TODO: Configure the LLM
config = GenerationConfig(
   do_sample=True,
   temperature = 4.1,
   max_new_token=100,
   min_new_token=100
)

enc_summary = model.generate(enc_prompt.input_ids, generation_config=config)
summary = tokenizer.decode(enc_summary[0])
print(summary)

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<pad> The Dursleys had a secret, and their greatest fear was that somebody would discover it.
