# Workshop 1 - Summarization 

In [6]:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM, GenerationConfig

## T5 Models

The <code>flan-t5</code> is a Text-To-Text Transfer Transformer (T5) that is capable of performing zero-shot NLP task such as summary, simple reasoninig, answering questions, etc. 

Some T5 models from Huggingface
- [<code>google/flan-t5-base</code>](https://huggingface.co/google/flan-t5-base)
- [<code>google/flan-t5-small</code>](https://huggingface.co/google/flan-t5-small)
- [<code>google/flan-t5-xl</code>](https://huggingface.co/google/flan-t5-xl)
- [<code>google/flan-t5-xxl</code>](https://huggingface.co/google/flan-t5-xxl) - full model

Complete list of [T5 models](https://huggingface.co/models?search=google/flan) on Huggingface.

In [3]:
model_name = 'google/flan-t5-base'

In [7]:
# TODO: Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

In [8]:
# TODO: Print the model
print(model)

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=768, out_features=2048, bias=False)
              (wi_1): Linear(in_features=768, out_features=2048, bias=False)
              (wo):

In [None]:
text = """ 
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and Iâ€”
I took the one less traveled by,
And that has made all the difference.
"""

In [14]:
text = """When a traveler in north central Massachusetts takes the wrong fork
at the junction of the Aylesbury pike just beyond Dean's Corners he
comes upon a lonely and curious country. The ground gets higher, and
the brier-bordered stone walls press closer and closer against the ruts
of the dusty, curving road. The trees of the frequent forest belts
seem too large, and the wild weeds, brambles, and grasses attain a
luxuriance not often found in settled regions. At the same time the
planted fields appear singularly few and barren; while the sparsely
scattered houses wear a surprizing uniform aspect of age, squalor, and
dilapidation. Without knowing why, one hesitates to ask directions
from the gnarled, solitary figures spied now and then on crumbling
doorsteps or in the sloping, rock-strewn meadows. Those figures are
so silent and furtive that one feels somehow confronted by forbidden
things, with which it would be better to have nothing to do. When a
rise in the road brings the mountains in view above the deep woods,
the feeling of strange uneasiness is increased. The summits are too
rounded and symmetrical to give a sense of comfort and naturalness, and
sometimes the sky silhouettes with especial clearness the queer circles
of tall stone pillars with which most of them are crowned.
"""

In [15]:
# TODO: Create a prompt
prompt = f"Write a short summary for this text: {text}"

print(prompt)


Write a short summary for this text: When a traveler in north central Massachusetts takes the wrong fork
at the junction of the Aylesbury pike just beyond Dean's Corners he
comes upon a lonely and curious country. The ground gets higher, and
the brier-bordered stone walls press closer and closer against the ruts
of the dusty, curving road. The trees of the frequent forest belts
seem too large, and the wild weeds, brambles, and grasses attain a
luxuriance not often found in settled regions. At the same time the
planted fields appear singularly few and barren; while the sparsely
scattered houses wear a surprizing uniform aspect of age, squalor, and
dilapidation. Without knowing why, one hesitates to ask directions
from the gnarled, solitary figures spied now and then on crumbling
doorsteps or in the sloping, rock-strewn meadows. Those figures are
so silent and furtive that one feels somehow confronted by forbidden
things, with which it would be better to have nothing to do. When a
rise i

In [18]:
# TODO: tokenize the text
prompt_enc = tokenizer(prompt, return_tensors='pt')
print(prompt_enc)

{'input_ids': tensor([[ 8733,     3,     9,   710,  9251,    21,    48,  1499,    10,   366,
             3,     9,  1111,    49,    16,  3457,  2069,  9777,  1217,     8,
          1786,    21,   157,    44,     8, 23704,    13,     8,    71,    63,
           965,  7165,  2816,  1050,   131,  1909, 12738,    31,     7, 15143,
             7,     3,    88,   639,  1286,     3,     9, 23633,    11,  9865,
           684,     5,    37,  1591,  2347,  1146,     6,    11,     8,     3,
          2160,    49,    18, 24678,    15,    26,  3372,  4205,  2785,  4645,
            11,  4645,   581,     8,     3,  6830,     7,    13,     8,  5784,
            63,     6,  5495,  3745,  1373,     5,    37,  3124,    13,     8,
          8325,  5827,  6782,     7,  1727,   396,   508,     6,    11,     8,
          3645,     3,  8578,     7,     6,  3858,    51,  2296,     7,     6,
            11,  5956,    15,     7, 14568,     3,     9,     3,  8387,   459,
           663,    59,   557,   435,  

In [None]:
# TODO: Decode the token


In [29]:
config = GenerationConfig(
   do_sample=True,
   temperature=4.0,
   min_new_tokens=20
)

In [30]:
# TODO: Generate summary with model 
summary_enc = model.generate(prompt_enc.input_ids, generation_config=config)
print(summary_enc)

tensor([[    0,   290,  3223,     6,    28,  6765,     3,  2160,    40,   449,
            11,  8227,     7,     5,   366,   728,    46, 13876,   286,  8247,
          2347]])


In [31]:
# TODO: Decode the summary
summary = tokenizer.decode(summary_enc[0], skip_special_tokens=True)
print(summary)

There exist, with strange brilter and fences. When once an abandoned place suddenly gets


In [40]:
question = "Where is the traveller travelling in?"
question = "What is the travelle's occupation?"
prompt = f"{text}\n{question} (If the question is unanswerable, say 'unanswerable')" 
print(prompt)

When a traveler in north central Massachusetts takes the wrong fork
at the junction of the Aylesbury pike just beyond Dean's Corners he
comes upon a lonely and curious country. The ground gets higher, and
the brier-bordered stone walls press closer and closer against the ruts
of the dusty, curving road. The trees of the frequent forest belts
seem too large, and the wild weeds, brambles, and grasses attain a
luxuriance not often found in settled regions. At the same time the
planted fields appear singularly few and barren; while the sparsely
scattered houses wear a surprizing uniform aspect of age, squalor, and
dilapidation. Without knowing why, one hesitates to ask directions
from the gnarled, solitary figures spied now and then on crumbling
doorsteps or in the sloping, rock-strewn meadows. Those figures are
so silent and furtive that one feels somehow confronted by forbidden
things, with which it would be better to have nothing to do. When a
rise in the road brings the mountains in vi

In [41]:
prompt_enc = tokenizer(prompt, return_tensors='pt').input_ids

In [42]:
answer_enc = model.generate(prompt_enc)

In [43]:
answer = tokenizer.decode(answer_enc[0], skip_special_tokens=True)
print(answer)

unanswerable
