-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: OpenAI ada-002 embedding #1897
Comments
On MTEB you can find the performance for ada-2: They are good, but not excellent compared to alternative options. OpenAI embeddings just work for English. But Cohere provides an embedding model as API call that works well across 100+ languages: |
@nreimers that is exactly what I wanted to know. Very good and many thanks!!
Very interesting info. Thanks. Is there a reference anywhere? Or do users just have to find that out on their own? :-) |
They used to have a section on: that the model is trained just on English data. From tests in other languages it perform comparable to BM25 on Wikipedia data, so not really great. |
And now they deleted the section? Doh! How would you suggest to test Ada-2? |
Yes, sadly was deleted sometime between March and today. Mainly said that the model was only trained on English data and they don't expect it work well on other languages. We tested ada-02 on MIRACL dataset, as we primarily are interested on search: For some languages. For English it was ok (had issues connected to cosine similarity), for other languages performance was not really good (on par or worse than BM25 from Elasticsearch). |
How does instructorXL compare to Sentence Transformers? |
Good question! |
Yep you seem right @nreimers the webarchive snaps prove it :) Is this legally/ethically cool btw, they disclosing the limitations properly and then removing a certain part of it without even changing anything with the embedding model? They also don't explicitly mention non-english/ cross-lingual support also. |
Hi @nreimers ,
your blog about OpenAI embeddings is very interesting:
https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9
Now that OpenAI released ada-2 my question is: Did you do a comparison of ada-2 vs. these embedding models provided by SBERT?
Other question: Do you know any other company than OpenAI that provides "multilingual text embeddings as an API call"? :-) How do they compare?
Many thanks
Philip
The text was updated successfully, but these errors were encountered: