-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add agent example use case to generate query, positive and negative examples #451
Conversation
…on text embedding finetuning
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
My apologies for the late review! Just had a look over the feature and it is looking great overall! Just a minor point to consider for the examples:
I am still getting my feet wet with code reviews for open-source projects, so if you see anything off in my suggestions, just hit me back. Thanks a lot! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zechengz! The example is awesome. I really like it!
nit: some naming suggestion to make the name more specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zechengz !
Description
Add an agent example use case to generate query, positive and negative docs example which can be used by text embedding model (text encoder) contrastive learning.
Also add the
response_format
tocamel/configs/openai_config.py
, which can force the model to generate json object etc.Motivation and Context
Recently there is a good paper [link] published which uses "agent" to generate tasks and corresponding query, positive and (hard) negative document examples. These document examples can then be used for text embedding model finetuning (contrastive learning). Text encoders use this method achieve quite good text embedding performance on the MTEB leaderboard, including SFR-Embedding-Mistral and e5-mistral-7b-instruct etc.
The whole generation includes two steps.
One is task generation:
Another one is document generation:
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!