Displaying coordinates of text embeddings retrieved using the OpenAI Python library shows more digits than when the embeddings are retrieved explicitly from the API endpoint or using most other libraries. This repository explores why that is, how to get this behavior (and by the same mechanism) when working in other languages, and why one should not usually bother to do so.
More specifically, this repository is a collection of code examples and
documentation for the encoding_format
argument to the OpenAI embeddings API,
which, when set to base64
, will send raw floats encoded in Base64. The OpenAI
Python library uses that under the hood.
Beware that encoding_format
is not officially documented. It could be removed
or changed in the future!
This project is licensed under 0BSD,
which is a "public-domain
equivalent"
license. See
LICENSE
for details.
These materials arose out of conversations with RonaldGRuckus on the OpenAI Discord server. If not for Ronald's observations about embeddings from the Python library, and the conversations that followed, this repository and its contents would not exist.
See Why embeddings via the Python library show more digits for a fully detailed explanation of this.
The example code in this repository is in three directories:
-
In Bash, using
curl
,jq
, andbase64
. See the shell scriptsdemo
anddemo-short
. -
In Python, using Requests. See the notebooks
ada-002.ipynb
andseveral-models.ipynb
. -
In Java, using OkHttp and Jackson. See
Embedder.java
(andMain.java
for use).
Note that the reason to use encoding_format
, if there is one, would not
ordinarily be increased precision, but instead the optimization in speed and
network usage, which appears to be why the
OpenAI Python library uses it.
Furthermore, to reiterate the above warning, encoding_format
is not
officially documented, and it could potentially be removed, or changed, at any
point in the future. The OpenAI Python library's source code shows
how
one might approach using it in a way that partially avoids depending on its
future existence.