Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA: add DirectAU and fix some bugs #74

Merged
merged 4 commits into from
Oct 23, 2023
Merged

Conversation

downeykking
Copy link
Contributor

添加了directau算法,在作者提供的数据集下进行了实验,与论文结果大致一致 https://arxiv.org/pdf/2206.12811.pdf

[('recall@10', 0.0953), ('recall@20', 0.1301), ('recall@50', 0.191), ('mrr@10', 0.0472), ('mrr@20', 0.0497), ('mrr@50', 0.0517), ('ndcg@10', 0.0566), ('ndcg@20', 0.0657), ('ndcg@50', 0.0783), ('hit@10', 0.104), ('hit@20', 0.14), ('hit@50', 0.2027), ('precision@10', 0.0113), ('precision@20', 0.0077), ('precision@50', 0.0046)]

修复了hypertuning的bug;改了recbole中出现的存储路径问题;

@downeykking
Copy link
Contributor Author

还有个挺奇怪的问题,就是在设置模型种类的时候(PIARWISE or POINTWISE)
31837e4#diff-f83d6ec723fb5149c4ebca886cc2b1a089b6f63866b49dbe42ebbdd9bd610603R25
如果把pointwise变成pairwise效果会变差,但是我理解的是这个只和负样本采样有关?pairwise会对每对(u, i)采样一个负样本,pointwise只考虑(u, i),请问导致效果变化的原因您觉得可能是哪些呢

@hyp1231
Copy link
Member

hyp1231 commented Oct 20, 2023

感谢贡献!!DirectAU 确实是现在很重要的工作。

POINTWISE 模式下,数据会多一列 label 标注这一条 user-item 交互是否为正例,例如

user_id:token item_id:token label:float
u1 i1 1
u2 i2 1
u3 i3 1
u1 i4 0
u2 i5 0
u3 i6 0

而 PAIRWISE 模式下

user_id:token item_id:token neg_item_id:token
u1 i1 i4
u2 i2 i5
u3 i3 i6

所以事实上如果设置为 POINTWISE 会让 DirectAU 在不存在的 user-item 交互上优化(label 为 0 的那些)?虽然不知道为什么这反倒会让效果上升 [捂脸]

@downeykking
Copy link
Contributor Author

downeykking commented Oct 23, 2023

感谢贡献!!DirectAU 确实是现在很重要的工作。

POINTWISE 模式下,数据会多一列 label 标注这一条 user-item 交互是否为正例,例如

user_id:token item_id:token label:float
u1 i1 1
u2 i2 1
u3 i3 1
u1 i4 0
u2 i5 0
u3 i6 0

而 PAIRWISE 模式下

user_id:token item_id:token neg_item_id:token
u1 i1 i4
u2 i2 i5
u3 i3 i6

所以事实上如果设置为 POINTWISE 会让 DirectAU 在不存在的 user-item 交互上优化(label 为 0 的那些)?虽然不知道为什么这反倒会让效果上升 [捂脸]

明白区别啦!因为我是按照作者代码改的,作者是基于recbole写的,我发现他虽然是POINTWISE,但是设置了neg_sample=0
https://github.com/THUwangcy/DirectAU/blob/main/recbole/properties/model/DirectAU.yaml#L3
在我的理解中这种情况就直接等价于PAIRWISE了?我实验了一下这两个跑出来的效果就是一样的了,所以按照作者的实现意图,再次修改了一下~

@hyp1231
Copy link
Member

hyp1231 commented Oct 23, 2023

好的!感觉确实是这样子,感谢!

@hyp1231 hyp1231 merged commit a31626a into RUCAIBox:main Oct 23, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants