Skip to content

Yannael/multilingual-embeddings

Repository files navigation

OpenAI vs open-source multilingual embeddings models

This noteboook provides example code to assess which embedding model works best for your data. The example task is a retrieval task (as in RAG - retrieval augmented generation), on multilingual data. See associated Medium article here.

The data source is based on the European AI Act, and models cover some of the latest OpenAI and open-source embeddings models (as of 02/2024) to deal with multilingual data:

OpenAI released two models in January 2024:

  • text-embedding-3-small (released 25/01/2024)
  • text-embedding-3-large (released 25/01/2024)

We compare with the following open-source models

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published