## Test: T5 vs FLAN-T5
* Dataset: Ocean_15

In [19]:
def prompt(i):
  return f"Given a statement of you: \"You {i}\". Please choose from the following options to identify how accurately this statement describes you. \nOptions: \nModerately Accurate \nVery Accurate \nVery Inaccurate \nModerately Inaccurate \nNeither Accurate Nor Inaccurate\n\nAnswers:"
items = [
    "worry a lot",
    "get nervous easily",
    "remain calm in tense situations,"
    "are talkative",
    "are outgoing, sociable",
    "are reserved,"
    "are original, come up with new ideas",
    "value artistic, aesthetic experiences",
    "have an active imagination",
    "are sometimes rude to others,"
    "have a forgiving nature",
    "are considerate and kind to almost everyone",
    "do a thorough job",
    "tend to be lazy,"
    "do things efficiently",
]

In [23]:
import torch
from torch.nn import functional as F
import numpy as np
from icecream import ic
# HuggingFace & Torch
from transformers import AutoTokenizer, T5ForConditionalGeneration, BartForConditionalGeneration, AutoModelForSeq2SeqLM

## What is FLAN-T5?

In one sentence: **FLAN-T5 is better T5** with same number of parameters!

FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract:

> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger  models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.

* Paper Link: https://arxiv.org/pdf/2210.11416.pdf
* HF Link (Base): https://huggingface.co/google/flan-t5-base



## T5-small

In [None]:
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [None]:
inputs = tokenizer(prompt, return_tensors='pt')
input_ids = inputs.input_ids
response = model.generate(input_ids,
              top_p=0.95,
              temperature=0.1,
              max_new_tokens=100)
output = tokenizer.decode(response[0])

In [None]:
output

'<pad><extra_id_0>: Moderately Accurate Very Accurate Very Inaccurate Moderately Accurate Very Inaccurate Moderately Inaccurate Neither Accurate nor Inaccurate Answers:</s>'

## FLAN-T5-small

In [28]:
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-small")

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/308M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [31]:
for i in items:
  inputs = tokenizer(prompt(i), return_tensors='pt')
  input_ids = inputs.input_ids
  response = model.generate(input_ids,
                top_p=0.95,
                temperature=0.1,
                max_new_tokens=100)
  output = tokenizer.decode(response[0][1:-1]) # remove <pad> and <\s>
  print(output)

Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate Neither Accurate Nor Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate Neither Accurate Nor Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate Neither Accurate Nor Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate Neither Accurate Nor Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Neither Accurate Nor Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate Neither Accurate Nor Inaccurate
Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate


## T5-base

In [None]:
tokenizer = AutoTokenizer.from_pretrained("t5-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


Downloading pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [None]:
inputs = tokenizer(prompt, return_tensors='pt')
input_ids = inputs.input_ids
response = model.generate(input_ids,
              top_p=0.95,
              temperature=0.1,
              max_new_tokens=100)
output = tokenizer.decode(response[0])

In [None]:
output

'<pad> Choose from the following options to identify how accurately this statement describes you. Options: Moderately Accurate Very Accurate Very Inaccurate Moderately Inaccurate Options: Moderately Accurate Moderately Inaccurate Options: Moderately Accurate Moderately Inaccurate Options: Moderately Accurate Moderately Inaccurate Options: Moderately Accurate Moderately Inaccurate Options'

## FLAN-T5-base

In [24]:
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [27]:
for i in items:
  inputs = tokenizer(prompt(i), return_tensors='pt')
  input_ids = inputs.input_ids
  response = model.generate(input_ids,
                top_p=0.95,
                temperature=0.1,
                max_new_tokens=100)
  output = tokenizer.decode(response[0][1:-1])
  print(output)

Moderately Accurate
Moderately Accurate
Moderately Accurate
Moderately Accurate
Moderately Accurate
Moderately Accurate
Moderately Accurate
Moderately Accurate
Moderately Accurate
Very Accurate
Moderately Accurate


## T5-large

In [None]:
tokenizer = AutoTokenizer.from_pretrained("t5-large")
model = T5ForConditionalGeneration.from_pretrained("t5-large")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


Downloading pytorch_model.bin:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [None]:
inputs = tokenizer(prompt, return_tensors='pt')
input_ids = inputs.input_ids
response = model.generate(input_ids,
              top_p=0.95,
              temperature=0.1,
              max_new_tokens=100)
output = tokenizer.decode(response[0])

In [None]:
output

'<pad><extra_id_0> "You worry a lot."<extra_id_1> Moderately Accurate Very Accurate Very Inaccurate<extra_id_2> "You worry a lot."<extra_id_3> "You worry a lot." Options: Moderately Accurate Very Inaccurate Very Inaccurate Answers: None<extra_id_4> Correct<extra_id_5> Correct<extra_id_6> Correct<extra_id_7> Correct<extra_id_8>.<extra_id_9> "You worry a lot."<extra_id_10> Very<extra_id_11> Correct<extra_id_12> Very<extra_id_13> Accurate<extra_id_14> Very<extra_id_15> Accurate<extra_id_16> Very<extra_id_17> Mode'

## 

## FLAN-T5-large

In [32]:
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large")

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [33]:
for i in items:
  inputs = tokenizer(prompt(i), return_tensors='pt')
  input_ids = inputs.input_ids
  response = model.generate(input_ids,
                top_p=0.95,
                temperature=0.1,
                max_new_tokens=100)
  output = tokenizer.decode(response[0][1:-1]) # remove <pad> and <\s>
  print(output)

Moderately Accurate
Moderately Inaccurate
Moderately Accurate
Very Accurate
Moderately Accurate
Moderately Accurate
Very Accurate
Moderately Accurate
Moderately Accurate
Very Accurate
Very Inaccurate
