Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问下代码里的kl散度问题 #35

Open
rigorosyangffff opened this issue Nov 4, 2023 · 1 comment
Open

请问下代码里的kl散度问题 #35

rigorosyangffff opened this issue Nov 4, 2023 · 1 comment

Comments

@rigorosyangffff
Copy link

您好!
我看了下代码,发现里面的token级的reward里加的kl 惩罚好像不是按标准的kl散度计算的,标准的应该是按两个分布来计算。但是我看代码里好像用的是只用了label这个一个token的概率相除(标准的kl散度能保证是非零的,但是现在代码里的实现不是可能是一个负数么),这是为什么呢?还有我看approx_kl也是这样。

@kxzxvbk
Copy link

kxzxvbk commented Apr 3, 2024

Same question. This lead to a strange situation. The final kl loss is computed like:
kl_penalty = -self.kl_penalty_weight * (logprobs - ref_logprob)
However, the part ref_logprob does not require grad. So maybe it can be removed from computation graph. In current situation, the regularization is more similar to "limit the label logit and prevent it becoming too large" rather than a normal kl-divergence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants