`position_ids` related PPO bug #217

tianhao-nexusflow · 2024-02-21T07:36:19Z

I'm working to reproduce results and align them with trlx. After careful code review, I've identified this issue:

Concern: link, Precomputing log probabilities using only token_ids and attention_mask may not fully account for the model's use of positional embeddings. As for llama model, the same token_ids with or without left padding would have different result even it explicitly pass in the attention_mask. A common strategy to correct this is to explicitly pass in the position_ids as in link

Potential Impact: The same padding related issue could also impact the generate quality, while I need more time to carefully look into the code. Will this also impact the critic and reward modules?

The text was updated successfully, but these errors were encountered:

hijkzzz · 2024-02-21T11:09:33Z

This issue is very interesting and I will confirm it as soon as possible. If so, the bug is caused by huggingface and we will fix it.

@tianhao-nexusflow btw, huggingface TRL also does not use the position_ids
https://github.com/huggingface/trl/blob/a46cd84a6405312837f0d0e56fd1cf4d45585770/trl/trainer/ppo_trainer.py#L920

hijkzzz · 2024-02-21T12:53:41Z

fixed position_ids
But there are still small precision issues between left pad and right pad

Llama-7B

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

modelname = "OpenLLMAI/Llama-2-7b-sft-model-ocra-500k"
model = AutoModelForCausalLM.from_pretrained(modelname).cuda()

inputs={'input_ids': torch.tensor([[    1,  7251,   727, 29901, 29871],
        [    2,     2,     1, 29871, 29896]]).cuda(), 'attention_mask': torch.tensor([[1, 1, 1, 1, 1],
        [0, 0, 1, 1, 1]]).cuda()}
inputs2={'input_ids': torch.tensor([[    1,  7251,   727, 29901, 29871],
        [    1, 29871, 29896,     2,     2]]).cuda(), 'attention_mask': torch.tensor([[1, 1, 1, 1, 1],
        [1, 1, 1, 0, 0]]).cuda()}

# baseline
output = model(**inputs)
output2 = model(**inputs2)

output2.logits[1][:3] - output.logits[1][-3:]
tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0020, -0.0010, -0.0001,  ...,  0.0006,  0.0013,  0.0007],
        [ 0.0025,  0.0040, -0.0005,  ...,  0.0025,  0.0015,  0.0008]],
       device='cuda:0', grad_fn=<SubBackward0>)

# fixed positions
position_ids = inputs['attention_mask'].long().cumsum(-1) - 1
position_ids2 = inputs2['attention_mask'].long().cumsum(-1) - 1

output = model(**inputs, position_ids=position_ids)
output2 = model(**inputs2, position_ids=position_ids2)

output2.logits[1][:3] - output.logits[1][-3:]
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 9.3555e-04, -8.1062e-06, -8.5831e-05,  ...,  7.9441e-04,
          5.7936e-04,  4.6229e-04]], device='cuda:0', grad_fn=<SubBackward0>)

hijkzzz added a commit that referenced this issue Feb 21, 2024

#217 fix position_ids

11e8b21

hijkzzz mentioned this issue Feb 21, 2024

#217 fix position_ids #218

Merged

hijkzzz added a commit that referenced this issue Feb 21, 2024

#217 fix position_ids (#218)

1e69ccd

hijkzzz closed this as completed Feb 21, 2024

hijkzzz reopened this Feb 21, 2024

hijkzzz closed this as completed Feb 21, 2024

hijkzzz reopened this Feb 21, 2024

hijkzzz mentioned this issue Feb 22, 2024

use_right_pad #219

Closed

hijkzzz closed this as completed Mar 2, 2024

hijkzzz mentioned this issue Mar 4, 2024

the accuracy issue of left padding and right padding huggingface/transformers#29419

Closed

2 tasks

kfertakis mentioned this issue Apr 15, 2024

Issue with models not using position_ids #270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`position_ids` related PPO bug #217

`position_ids` related PPO bug #217

tianhao-nexusflow commented Feb 21, 2024 •

edited

hijkzzz commented Feb 21, 2024 •

edited

hijkzzz commented Feb 21, 2024 •

edited

position_ids related PPO bug #217

position_ids related PPO bug #217

Comments

tianhao-nexusflow commented Feb 21, 2024 • edited

hijkzzz commented Feb 21, 2024 • edited

hijkzzz commented Feb 21, 2024 • edited

`position_ids` related PPO bug #217

`position_ids` related PPO bug #217

tianhao-nexusflow commented Feb 21, 2024 •

edited

hijkzzz commented Feb 21, 2024 •

edited

hijkzzz commented Feb 21, 2024 •

edited