`embeddings` endpoint understanding #443

wei-ann-Github · 2023-10-03T03:11:16Z

wei-ann-Github
Oct 3, 2023

Hi,

I am trying to understand the output of the embeddings endpoint, and how it relates to the requests.

Deployed model: Llama-2-7b-chat
Request platform: http://localhost:3000/#operations-Service_APIs-llm-llama-service__embeddings_v1

I am making the reuqests through the BentoServer UI.

In one example, I used

[
  "Hey Jude, welcome to the jungle!",
  "What is the meaning of life?"
]

as the request body. The output I got is

{
  "embeddings": [
    0.007917795330286026,
    -0.014421648345887661,
    0.00481307040899992,
    0.007331526838243008,
    -0.0066398633643984795,
    0.00945580005645752,
    0.0087016262114048,
    -0.010709521360695362,
    0.012635177001357079,
    0.010541186667978764,
    -0.00730888033285737,
    -0.001783102168701589,
    0.02339819073677063,
    -0.010825827717781067,
    -0.015888236463069916,
    0.01876218430697918,
    0.0076906150206923485,
    0.0009032754460349679,
    -0.010024012066423893,
    0.01090280432254076,
    -0.008668390102684498,
    0.02070549875497818,
    0.0014594447566196322,
    -0.018775740638375282,
    -0.014814382418990135,
    0.01796768605709076
  ],
  "num_tokens": 20
}

When my request body is just a single string in the list:

[
  "Hey Jude, welcome to the jungle!"
]

The size of the embeddings is much longer than the request consisting of 2 string:

[
  {
    "embeddings": [
      [
        0.028899626806378365,
        -0.03823496028780937,
        0.01990223489701748,
        0.00990327913314104,
        -0.0055532753467559814,
        -0.008658180013298988,
        0.01569036766886711,
        -0.008906994946300983,
        0.004359483253210783,
        -0.0025049548130482435,
        0.005807220470160246,
        {---- redacted ----}
        0.0011737089371308684,
        -0.0036610725801438093,
        0.0006611274438910186,
        0.0010306845651939511,
        0.0036751453299075365,
        -0.010856563225388527,
        0.0298569668084383,
        -0.014497039839625359,
        -0.011775649152696133,
        0.0006001454312354326
      ]
    ],
    "num_tokens": 12
  }
]

How does one interprete the embeddings and use it if it gives different size?
I was expecting the output from using 2 input strings to consists of a list of 2 sets of embeddings, but this does not seem to be the case.

aarnphm · 2023-10-03T13:34:15Z

aarnphm
Oct 3, 2023
Maintainer

I think this is probably a bug. It should return a list of two embedding representation.

3 replies

aarnphm Oct 3, 2023
Maintainer

The current strategy with embeddings is that for all models that does have custom embeddings calculation (llama, chatglm, t5), it will use the model to calculate the embeddings, otherwise it fallback to use a bert model to calculate these embeddings.

I don't have a plan with embeddings yet, this is mostly still experimental, so any suggestions and recommendation are welcomed.

wei-ann-Github Oct 3, 2023
Author

I see. If it falls back to BERT, will this information be returned in the output?

aarnphm Oct 3, 2023
Maintainer

Not at the moment. I will think a bit more about this next week when I have a bit more time. Currently, I'm a bit busy with other work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`embeddings` endpoint understanding #443

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

embeddings endpoint understanding #443

wei-ann-Github Oct 3, 2023

Replies: 1 comment · 3 replies

aarnphm Oct 3, 2023 Maintainer

aarnphm Oct 3, 2023 Maintainer

wei-ann-Github Oct 3, 2023 Author

aarnphm Oct 3, 2023 Maintainer

`embeddings` endpoint understanding #443

wei-ann-Github
Oct 3, 2023

Replies: 1 comment 3 replies

aarnphm
Oct 3, 2023
Maintainer

aarnphm Oct 3, 2023
Maintainer

wei-ann-Github Oct 3, 2023
Author

aarnphm Oct 3, 2023
Maintainer