Batch embed strings to fit model context #1186

iamlemec · 2024-02-13T22:16:10Z

Now that llama.cpp supports BERT embedding models (ggml-org/llama.cpp#5423), this modifies embed / create_embedding to truncate and combine inputs to fit into the model context size and allows for efficient high-volume embedding. I tried to be minimal with changes, and am definitely open to alternate suggestions. What we have now:

Add reset and add_sequence methods to _LlamaBatch to make multi-sequence batches
Tokens are truncated to n_ctx by default, and without truncation an error is thrown if n_tokens > n_ctx
Embeddings are normalized by default but this can be turned off

One could eke out some gains from grouping batches more intelligently, but I wanted to keep it simple.

abetlen · 2024-02-14T02:00:29Z

@iamlemec absolute legend, thank you!

I'll test that this doesn't interfere with the current prefix-matching kv logic and once that looks good I should be able to merge. Wrt normalizing the embeddings what do you think are the benefits for / against?

iamlemec · 2024-02-14T05:05:34Z

Actually, found a bug in the normalization! (the flag normalize gets overwritten by the function normalize). Will push in a minute.

iamlemec · 2024-02-14T05:17:25Z

I could go either way on the default for normalize. It looks like sentence_transformers defaults to normalize_embeddings=False. But actually, their results come back the same regardless of what you set this flag to, so I guess it's getting normalized anyway somewhere else?

From a practical perspective, I assume most people are using these for cosine similarities, in which case you'd want to normalize them. But normalization is also a destructive process, so in some sense not normalizing is the safer bet.

Also, I should note that the the llama.cpp code returns the pooled sum over token embeddings, so we also divide here by the number of tokens to make it a pooled mean, which is more standard.

abetlen · 2024-02-14T09:22:20Z

Defaulting normalize to true sounds like the best approach then. I've made a minor adjustment to the embed method so that it returns a list[float] if a string is passed, this should avoid any breaking changes.

I'll test a little more tomorrow, for good measure we may want to add a kv_cache_clear / reset to clean up any sequences taking up space in the cache.

abetlen · 2024-02-14T09:25:34Z

Actually I'll just go ahead and do that so we can merge.

Looks good to me, thank you so much for your work on this!

iamlemec · 2024-02-14T14:58:54Z

Wonderful, thanks for a great library! Will keep testing.

cebtenzzre · 2024-02-15T19:23:41Z

llama_cpp/llama.py

+                s_sizes = []
+
+            # add to batch
+            self._batch.add_sequence(tokens, len(s_sizes), False)


@iamlemec There is a null pointer access in this function if n_batch < n_ctx and the prompt exceeds n_batch.

Yup! Check out #1194. It changes the bounds checking from n_ctx to n_batch.

iamlemec added 2 commits February 13, 2024 23:07

handle batched embeddings

ba35aef

fix normalization issue

d331b29

iamlemec force-pushed the bert-embed branch from 9b750bb to d331b29 Compare February 14, 2024 05:09

abetlen added 2 commits February 14, 2024 03:48

Merge branch 'main' into bert-embed

29fa52e

fix type hints, ensure no breaking changes to embed

afc819d

Clear kv cache / reset internal state after embedding complete

e3e04f8

abetlen merged commit d7a6791 into abetlen:main Feb 14, 2024

abetlen mentioned this pull request Feb 15, 2024

Support for embedding models #1189

Closed

cebtenzzre reviewed Feb 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch embed strings to fit model context #1186

Batch embed strings to fit model context #1186

Uh oh!

iamlemec commented Feb 13, 2024

Uh oh!

abetlen commented Feb 14, 2024

Uh oh!

iamlemec commented Feb 14, 2024

Uh oh!

iamlemec commented Feb 14, 2024

Uh oh!

abetlen commented Feb 14, 2024

Uh oh!

abetlen commented Feb 14, 2024

Uh oh!

iamlemec commented Feb 14, 2024

Uh oh!

cebtenzzre Feb 15, 2024

Uh oh!

iamlemec Feb 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Batch embed strings to fit model context #1186

Batch embed strings to fit model context #1186

Uh oh!

Conversation

iamlemec commented Feb 13, 2024

Uh oh!

abetlen commented Feb 14, 2024

Uh oh!

iamlemec commented Feb 14, 2024

Uh oh!

iamlemec commented Feb 14, 2024

Uh oh!

abetlen commented Feb 14, 2024

Uh oh!

abetlen commented Feb 14, 2024

Uh oh!

iamlemec commented Feb 14, 2024

Uh oh!

cebtenzzre Feb 15, 2024

Choose a reason for hiding this comment

Uh oh!

iamlemec Feb 15, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants