GPTRewardModel class

Why are the rewards truncated in the "GPTRewardModel" class? What is the reason and where can I find more information about it?

            # Retrieve first index where trajectories diverge
            divergence_ind = (chosen[i] != rejected[i]).nonzero()[0]
            assert divergence_ind > 0

            # Index into the correct rewards
            c_truncated_reward = chosen_rewards[i][divergence_ind:end_ind]
            r_truncated_reward = rejected_rewards[i][divergence_ind:end_ind]

Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTRewardModel class #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPTRewardModel class #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions