Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Paddle-Pipelines] Add matryoshka representation learning #8165

Merged
merged 4 commits into from Mar 28, 2024

Conversation

w5688414
Copy link
Contributor

PR types

PR changes

Description

Copy link

paddle-bot bot commented Mar 21, 2024

Thanks for your contribution!

@w5688414 w5688414 requested a review from sijunhe March 21, 2024 04:36
@w5688414 w5688414 self-assigned this Mar 21, 2024
Copy link

codecov bot commented Mar 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.44%. Comparing base (b6dcb4e) to head (36a79b2).
Report is 5 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8165      +/-   ##
===========================================
+ Coverage    55.37%   55.44%   +0.07%     
===========================================
  Files          596      596              
  Lines        91622    91464     -158     
===========================================
- Hits         50732    50713      -19     
+ Misses       40890    40751     -139     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pipelines/examples/constrative_train/README.md Outdated Show resolved Hide resolved
pipelines/examples/constrative_train/README.md Outdated Show resolved Hide resolved
pipelines/examples/constrative_train/README.md Outdated Show resolved Hide resolved
q_reps = self._dist_gather_tensor(q_reps)
p_reps = self._dist_gather_tensor(p_reps)

if self.matryoshka_dims:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matryoshka 应该和inbatch_neg不是互斥的吧? 可以相结合?

Copy link
Contributor Author

@w5688414 w5688414 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

兼容了inbatch_neg策略了

from paddlenlp.trainer import Trainer


class BiTrainer(Trainer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个看起来没干啥,为什么需要定制trainer?

Copy link
Contributor Author

@w5688414 w5688414 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为默认collator被我们重写了,要同时处理query和passage,collator的输出就包括query 的input_id和passage的input_id,这样会组成一个新的字典{"query": q_collated, "passage": d_collated},但模型只接收一个包含inputs的字典。不过也可以不用改trainer,改BiEncoderModel的forward输入适配trainer。

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sijunhe sijunhe merged commit 57b22e7 into PaddlePaddle:develop Mar 28, 2024
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants