Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ColBERT with jina-colbert-v1-en #151

Closed
psykhi opened this issue Feb 21, 2024 · 2 comments
Closed

Replace ColBERT with jina-colbert-v1-en #151

psykhi opened this issue Feb 21, 2024 · 2 comments

Comments

@psykhi
Copy link

psykhi commented Feb 21, 2024

Sorry if this is an silly request, but I have followed the release of the Jina colbert model and tried to drop in replace ColBERT with the model, mainly because of its longer document length

RAG = RAGPretrainedModel.from_pretrained("jinaai/jina-colbert-v1-en")

I am not building indexes with RAGatouille, just reranking candidates with RAG.rerank. I get much, much worse evaluation results after that switch so I assume that something isn't quite right here. Any guidance?

Reading the RAGatouille and ColBERT code I see there are some "auto" parameters for max tokens and max document length. Does Jina require a different config or did I simply misunderstand that it can replace ColBERT?

Thanks!

@bclavie
Copy link
Owner

bclavie commented Feb 22, 2024

Hey!

Thank you for flagging. I'm very short on time so diagnosing will be a little longer than usual, but the results shouldn't be drastically different. Initial evals showed relative parity between the two models, albeit only on 5-10 test cases.

I'm wondering if this is due to (1) a problem loading the model properly, (2) something with the rank() function (which uses the same internal functions as other, more tested functions, but it could happen).

What version of ragatouille and colbert-ai are you on? Jina ColBERT only loads properly with colbert-ai with >=0.2.19, previous versions of colbert-ai initialised weights wrong.

If that's not the issue, would you mind sharing some example code/documents where the issue occurs? Thank you!

@psykhi
Copy link
Author

psykhi commented Feb 22, 2024

You are totally right, the upgrade brought the results back to where I was expecting them to be. Thank you so much! 🙏

@psykhi psykhi closed this as completed Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants