https://huggingface.co/docs/transformers/main/en/model_doc/ul2


The T5 model was presented in Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler.

Tips:
- UL2 is an encoder-decoder model pre-trained on a mixture of denoising functions as well as fine-tuned on an array of downstream tasks.
- UL2 has the same architecture as [T5v1.1](https://huggingface.co/docs/transformers/main/en/model_doc/t5v1.1) but uses the Gated-SiLU activation function instead of Gated-GELU.
- The authors release checkpoints of one architecture which can be seen [here](https://github.com/google-research/google-research/tree/master/ul2).

In [None]:
import torch
from transformers import T5Model, T5Tokenizer

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

In [None]:
version = "google/flan-ul2" # google/ul2
encoder_input = "Studies have been shown that owning a dog is good for you"
decoder_input = "Studies show that"

# T5Tokenizer

In [None]:
tokenizer: T5Tokenizer = T5Tokenizer.from_pretrained(version)
tokenizer

## tokenizer([sequence])

In [None]:
tokenizer.tokenize(encoder_input)

In [None]:
encoder_inputs = tokenizer(
    encoder_input,                      # 句子batch
    truncation = True,                  # 超出max_length截断处理
    padding = True,                     # 填充方式选择 [True, 'longest', 'max_length', 'do_not_pad']
    # max_length = max_length,          # 最长长度,不设置默认为模型最大长度
    add_special_tokens = True,          # text添加特殊key
    return_length = True,               # 返回有效长度
    return_overflowing_tokens = False,  # 返回所有的文本片段（由于文本比较长，默认情况下超过预设截断长度的token会被丢失。如果设置了return_overflowing_tokens=True则会返回所有的token片段）。
    return_tensors = "pt"               # 返回数据格式 np pt tf jax
).to(device, torch.float16)

print(encoder_inputs.keys())
print(encoder_inputs["input_ids"])
print(encoder_inputs["attention_mask"]) # 对应是否是文字
print(encoder_inputs["length"])         # 对应有效文字长度

In [None]:
tokenizer.tokenize(decoder_input)

In [None]:
decoder_inputs = tokenizer(
    decoder_input,                      # 句子batch
    truncation = True,                  # 超出max_length截断处理
    padding = True,                     # 填充方式选择 [True, 'longest', 'max_length', 'do_not_pad']
    # max_length = max_length,          # 最长长度,不设置默认为模型最大长度
    add_special_tokens = True,          # text添加特殊key
    return_length = True,               # 返回有效长度
    return_overflowing_tokens = False,  # 返回所有的文本片段（由于文本比较长，默认情况下超过预设截断长度的token会被丢失。如果设置了return_overflowing_tokens=True则会返回所有的token片段）。
    return_tensors = "pt"               # 返回数据格式 np pt tf jax
).to(device)

print(decoder_inputs.keys())
print(decoder_inputs["input_ids"])
print(decoder_inputs["attention_mask"]) # 对应是否是文字
print(decoder_inputs["length"])         # 对应有效文字长度

# T5Model

The bare T5 Model transformer outputting raw hidden-states without any specific head on top.

In [None]:
model: T5Model = T5Model.from_pretrained(version, torch_dtype=torch.float16).to(device)
model

In [None]:
model.eval()
with torch.inference_mode():
    outputs = model(
        input_ids = encoder_inputs["input_ids"],
        attention_mask = encoder_inputs["attention_mask"],
        decoder_input_ids = decoder_inputs["input_ids"],
        decoder_attention_mask = decoder_inputs["attention_mask"],
    )
outputs
# Seq2SeqModelOutput

In [None]:
# 最后一层的输出
outputs.last_hidden_state.shape

In [None]:
outputs.last_hidden_state

In [None]:
len(outputs.past_key_values)

In [None]:
for past_key in outputs.past_key_values:
    for past in past_key:
        print(past.shape)
    print("-" * 25)