- references
    - https://zhuanlan.zhihu.com/p/675273498?utm_id=0
- padding_side:
    - transformers 的 Tokenizer 默认是 padding 右边，

In [1]:
import os
os.environ['http_proxy'] = 'http://127.0.0.1:7890'
os.environ['https_proxy'] = 'http://127.0.0.1:7890'

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

In [8]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from torch.distributions import Categorical
import torch as t

model_name = "gpt2"
lm = AutoModelForCausalLM.from_pretrained(model_name)

In [9]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.padding_side

'right'

In [11]:
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
tokenizer.padding_side

'left'

### padding_side

- https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side
- decoder-only => left padding (batch inputs 是会存在 padding 的形式)
    - 不仅要设置为 left padding，而且要在 model.generate 的时候要传入 `attention_mask`（不只有 `input_ids`）
    - 使用左填充可以将实际数据对齐到右侧，方便模型从左到右处理序列。
    - This is because the output is a continuation of the input prompt -- there would be gaps in the output without left padding.
- 如果 decoder-only 在generate时，tokenizer.padding_side 被设置为 `right`，Transformer 代码会报警告

    ```
    A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
    ```
- 位置编码（position ids）
    - 绝对位置编码，相对位置编码；
        - gpt2 是绝对位置编码（https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L1246C1-L1247C62）
        ```
        position_ids = attention_mask.long().cumsum(-1) - 1
        position_ids.masked_fill_(attention_mask == 0, 1)
        ```

### padding right (default)

In [39]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
prompt = tokenizer(["big unpadded five token prompt ", 
                    "padded three token "], 
                   return_tensors='pt', padding=True, add_special_tokens=True)

#generate with plain sampling (https://huggingface.co/blog/how-to-generate)

result = lm.generate(prompt["input_ids"], 
                     attention_mask=prompt["attention_mask"], 
                     do_sample=True, output_scores=True, return_dict_in_generate=True, 
                     top_k=0, max_length=10)

x, logits_gen = result.sequences, result.scores
logits_gen = t.stack(logits_gen, 1)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


In [40]:
x

tensor([[14261,  8593, 29373,  1936, 11241,  6152,   220,  2171, 11584,    85],
        [   79, 29373,  1115, 11241,   220, 50256, 50256,    42,  2635,  2162]])

In [41]:
print(tokenizer.decode(x[0]))
print(tokenizer.decode(x[1]))

big unpadded five token prompt illsSIv
padded three token <|endoftext|><|endoftext|>K White ;


In [37]:
# logits_gen

In [42]:
x_attention_mask = (x != tokenizer.eos_token_id).to(dtype=t.int64)
x_attention_mask

tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 0, 0, 1, 1, 1]])

In [43]:
position_ids = x_attention_mask.cumsum(-1)-1
position_ids

tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
        [0, 1, 2, 3, 4, 4, 4, 5, 6, 7]])

In [35]:
position_ids.masked_fill_(x_attention_mask == 0, 1)
position_ids

tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
        [0, 1, 2, 3, 4, 1, 1, 5, 6, 7]])

### padding left

In [26]:
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")


tokenizer.pad_token = tokenizer.eos_token
prompt = tokenizer(["big unpadded five token prompt ", 
                    "padded three token "], return_tensors='pt', padding=True, add_special_tokens=True)

#generate with plain sampling (https://huggingface.co/blog/how-to-generate)

result = lm.generate(prompt["input_ids"], attention_mask=prompt["attention_mask"], do_sample=True, output_scores=True, return_dict_in_generate=True, top_k=0, max_length=10)
x, logits_gen = result.sequences, result.scores
logits_gen = t.stack(logits_gen, 1)

print(tokenizer.decode(x[0]))
print(tokenizer.decode(x[1]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


big unpadded five token prompt urn:bank
<|endoftext|><|endoftext|>padded three token ids, new


In [27]:
x_attention_mask = (x != tokenizer.eos_token_id).to(dtype=t.int64)
position_ids = x_attention_mask.cumsum(-1)-1
position_ids.masked_fill_(x_attention_mask == 0, 1)
print("Attention mask for prompt + generated text")
print(x_attention_mask)
print("Position IDs")
print(position_ids)

Attention mask for prompt + generated text
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 1, 1, 1, 1, 1, 1, 1, 1]])
Position IDs
tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
        [1, 1, 0, 1, 2, 3, 4, 5, 6, 7]])


### pad_token

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", 
                                             torch_dtype=torch.float16).to(device)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

In [7]:
print(tokenizer.bos_token, tokenizer.bos_token_id)
print(tokenizer.eos_token, tokenizer.eos_token_id)
print(tokenizer.pad_token, tokenizer.pad_token_id)

<s> 1
</s> 2
None None


In [8]:
# 大多数的 LLM 默认都没有 pad_token，如果不设置，会报错
tokenizer.pad_token = tokenizer.eos_token

In [9]:
print(tokenizer.bos_token, tokenizer.bos_token_id)
print(tokenizer.eos_token, tokenizer.eos_token_id)
print(tokenizer.pad_token, tokenizer.pad_token_id)

<s> 1
</s> 2
</s> 2


In [10]:
inputs = tokenizer(
    ["hello llama", "who are you?"], padding=True, return_tensors="pt"
)
inputs = {k: v.to(device) for k, v in inputs.items()}
inputs

{'input_ids': tensor([[    1, 22172, 11148,  3304,     2],
         [    1,  1058,   526,   366, 29973]], device='cuda:0'),
 'attention_mask': tensor([[1, 1, 1, 1, 0],
         [1, 1, 1, 1, 1]], device='cuda:0')}

In [22]:
GenerationConfig.from_pretrained('meta-llama/Llama-2-7b-hf')

GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9
}

In [11]:
generated_ids = model.generate(**inputs, )

In [15]:
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[1])

who are you? (part 2)
What does it mean to be a human? How do we find out who we are? What does it mean to be a human?
I’m not a very good person, but I’m not a bad person either. I’m just a person.
I’m not a good person, but I’m not a bad person either. I’m just a person.
I’m not a bad person, but I’m not a good person either. I’m just a person.
I’m not a person, but I’m not a bad person either. I’m just a person.
I’m not a good person, but I’m not a bad person either. I’m just a person.
I’m not a bad person, but I’m not a good person either. I’m just a person.
I’m not a person, but I’m not a bad person either. I’m just a person.
I’m not a good person, but I’m not a bad person either. I’m just a person.
I’m not a bad person, but I’m not a good person either. I’m just a person.
I’m not a person, but I’m not a bad person either. I’m just a person.
I’m not a good person, but I’m not a bad person either. I’m just a person.
I’m not a bad person, but I’m not a good person either. I’m just a