# T5 model

Text-To-Text Transfer Trasnformer

Transfer的意思是Transfer Learning，也就是T5也属于预训练+Transfer的范畴。T5模型把所有的NLP任务都看成了Text-to-Text的任务。

* cola（Corpus of Linguistic Acceptability）：判断给定句子是否为语法上正确且自然的句子，对给定的句子进行二元分类。
* stsb（Semantic Textual Similarity Benchmark）：衡量给定的两个文本句子之间的语义相似性，输入2个句子，输出一个相似度得分。
* copa（Choice of Plausible Alternatives）：根据给定的上下文情境和一个问题，选择两个可能的答案中哪一个是最有可能的。每个问题都涉及一个事件和两个可能的原因或效果选项。根据上下文情境，模型需要推断出事件和选项之间的因果关系，并选择最合适的答案。

![](../images/t5_nlp_task.webp)


In [1]:
from transformers import AutoConfig, AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F

# 模型结构


|model        |参数量       |hidden dim                 |  encoder/decoder layers |
| ----------- |----------- |-------------------------  | ------------------------|
|t5-small     | 61M        |     512 (64\*8)  ->  512 |                        6|
|t5-base      |223M        |    768  (64\*12) ->  768 |                       12|
|t5-large     |738M        |   1024  (64\*16) -> 1024 |                       24|
|t5-3b        |2.85B       |   4096 (128\*32) -> 1024 |                       24|
|t5-11b       |  11B       | 16384 (128\*128) -> 1024 |                       24|

In [11]:
model_name = "t5-base"

t5_config = AutoConfig.from_pretrained(model_name)
t5_tokenizer = AutoTokenizer.from_pretrained(model_name, model_max_length=512)
t5_model = AutoModel.from_pretrained(model_name)

In [12]:
print(
    f"d_model: {t5_config.d_model}",
    f"ff_hidden_size: {t5_config.d_ff}",
    f"head_size: {t5_config.d_kv}",
    f"num_heads: {t5_config.num_heads}",
    f"num_layers: {t5_config.num_layers}",
    sep="\n",
)

d_model: 768
ff_hidden_size: 3072
head_size: 64
num_heads: 12
num_layers: 12


![](../images/t5.drawio.svg)

In [13]:
t5_model

T5Model(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=768, out_features=3072, bias=False)
              (wo): Linear(in_features=3072, out_features=768, bias=False)
              (dropout): Dropout(p=0.1, inplace

# Tokenizer

In [51]:
input_text = "Studies have been shown that owning a dog is good for you"
input_ids = t5_tokenizer(input_text, return_tensors="pt").input_ids
print(input_ids[0].tolist())
print(t5_tokenizer.convert_ids_to_tokens(input_ids[0]))
print(t5_tokenizer.decode(input_ids[0]))

[6536, 43, 118, 2008, 24, 293, 53, 3, 9, 1782, 19, 207, 21, 25, 1]
['▁Studies', '▁have', '▁been', '▁shown', '▁that', '▁own', 'ing', '▁', 'a', '▁dog', '▁is', '▁good', '▁for', '▁you', '</s>']
Studies have been shown that owning a dog is good for you</s>


# Forward

In [26]:
decoder_input_ids = t5_tokenizer("Studies show that", return_tensors="pt").input_ids
decoder_input_ids = t5_model._shift_right(decoder_input_ids)
print(decoder_input_ids)
print(t5_tokenizer.decode(decoder_input_ids[0]))

tensor([[   0, 6536,  504,   24]])
<pad> Studies show that


In [45]:
# 整体进行forward

model_out = t5_model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
model_out.keys()

odict_keys(['last_hidden_state', 'past_key_values', 'encoder_last_hidden_state'])

In [52]:
# 分成encoder和decoder2个阶段进行forward
encoder_output = t5_model.encoder(input_ids)
print(encoder_output.keys())
last_hidden_state = encoder_output["last_hidden_state"]
decoder_out = t5_model.decoder(
    input_ids=decoder_input_ids, encoder_hidden_states=last_hidden_state
)
print(decoder_out.keys())
print("last_hidden_state: ", decoder_out.last_hidden_state.shape)
print(
    "past_key_values(layers, #key-value):",
    len(decoder_out.past_key_values),
    len(decoder_out.past_key_values[0]),
)

odict_keys(['last_hidden_state'])
odict_keys(['last_hidden_state', 'past_key_values'])
last_hidden_state:  torch.Size([1, 4, 768])
past_key_values(layers, #key-value): 12 4


In [None]:
from transformers import T5For