We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gt_q = (q_grad * one_hot).detach() q_final = q_grad - gt_q
理论上改完之后loss的值应该不变,但现在使用公式$1 − ( EQ_{\theta})^TEP$算出来修改之前和之后的结果不同,又验证了公式$Q_{\theta}^TCP$,修改前后相同,应该是$Q_{\theta}$本身发生了变化导致$Q_{\theta}^TP$的值不等于1了,所以 emo_loss = (1 - torch.sum(p_contextual_repr * q_contextual_repr, dim=-1))是不是需要再减去一项 torch.sum(gt_q * stable_onehot, dim=-1)?
另外,在ground-truth token处因为$1-E^TE$对角线一直为0,梯度本身也会乘以这个值,相当于ground-truth token处的梯度为0,也不需要更新梯度,这两行代码是否还有存在的必要?
The text was updated successfully, but these errors were encountered:
理论上确实不需要这两行代码,这也是我们先前版本的实现,发现在某些特定训练设置下会出现Loss越来越大的情况。后来经过调整发现目前repo里的实现方式能达到和MLE一个比较好的结合,下游效果也比较好。我们正在调试测试原来的实现,有结果会及时更新的。
Sorry, something went wrong.
No branches or pull requests
gt_q = (q_grad * one_hot).detach()
q_final = q_grad - gt_q
理论上改完之后loss的值应该不变,但现在使用公式$1 − ( EQ_{\theta})^TEP$算出来修改之前和之后的结果不同,又验证了公式$Q_{\theta}^TCP$,修改前后相同,应该是$Q_{\theta}$本身发生了变化导致$Q_{\theta}^TP$的值不等于1了,所以
emo_loss = (1 - torch.sum(p_contextual_repr * q_contextual_repr, dim=-1))是不是需要再减去一项 torch.sum(gt_q * stable_onehot, dim=-1)?
另外,在ground-truth token处因为$1-E^TE$对角线一直为0,梯度本身也会乘以这个值,相当于ground-truth token处的梯度为0,也不需要更新梯度,这两行代码是否还有存在的必要?
The text was updated successfully, but these errors were encountered: