Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于emo loss的疑问 #7

Open
Vincent-Poirot opened this issue Nov 28, 2023 · 1 comment
Open

关于emo loss的疑问 #7

Vincent-Poirot opened this issue Nov 28, 2023 · 1 comment

Comments

@Vincent-Poirot
Copy link

Vincent-Poirot commented Nov 28, 2023

gt_q = (q_grad * one_hot).detach()
q_final = q_grad - gt_q

理论上改完之后loss的值应该不变,但现在使用公式$1 − ( EQ_{\theta})^TEP$算出来修改之前和之后的结果不同,又验证了公式$Q_{\theta}^TCP$,修改前后相同,应该是$Q_{\theta}$本身发生了变化导致$Q_{\theta}^TP$的值不等于1了,所以
emo_loss = (1 - torch.sum(p_contextual_repr * q_contextual_repr, dim=-1))是不是需要再减去一项 torch.sum(gt_q * stable_onehot, dim=-1)?

另外,在ground-truth token处因为$1-E^TE$对角线一直为0,梯度本身也会乘以这个值,相当于ground-truth token处的梯度为0,也不需要更新梯度,这两行代码是否还有存在的必要?

@DRSY
Copy link
Owner

DRSY commented Nov 28, 2023

理论上确实不需要这两行代码,这也是我们先前版本的实现,发现在某些特定训练设置下会出现Loss越来越大的情况。后来经过调整发现目前repo里的实现方式能达到和MLE一个比较好的结合,下游效果也比较好。我们正在调试测试原来的实现,有结果会及时更新的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants