llama2 embedding vectors from Python and llm.el don't seem to match #15

jkitchin · 2024-01-03T23:46:37Z

I noticed that an embedding vector from llm.el and Python using llama2 from ollama don't match. It is possible there is some small setting I have overlooked, and maybe even they don't use the same libraries, but I thought they would. They are least the same length. I only show the first and last elements of the embedding below for brevity.

elisp:

#+BEGIN_SRC emacs-lisp
(require 'llm)
(require 'llm-ollama)

(let* ((model (make-llm-ollama :embedding-model "llama2"))
       (v (llm-embedding model "test")))
  (list (length v) (elt v 0) (elt v 4095)))
#+END_SRC

#+RESULTS:
| 4096 | -0.9325962662696838 | -0.05561749264597893 |

Python:

#+BEGIN_SRC jupyter-python
from langchain.embeddings import OllamaEmbeddings
import numpy as np

ollama = OllamaEmbeddings(model="llama2")
vector = np.array(ollama.embed_documents(["test"]))

len(vector[0]), vector[0][0], vector[0][-1]
#+END_SRC

#+RESULTS:
:RESULTS:
| 4096 | 0.8642910718917847 | 0.4629634618759155 |
:END:

any tips on tracking down why the are different?

The text was updated successfully, but these errors were encountered:

ahyatt · 2024-01-04T06:15:49Z

According to a conversation I had with @s-kostyaev, there has been some issues with ollama embeddings, so he may have some ideas on what might be going on here.

s-kostyaev · 2024-01-07T14:51:15Z

Hi @ahyatt @jkitchin
It's broken for now. See ollama/ollama#834 ollama/ollama#327 and ggerganov/llama.cpp#2872.

s-kostyaev · 2024-01-07T16:43:45Z

@jkitchin You can reproduce this with curl and open issue in ollama upstream.

ahyatt · 2024-01-08T06:14:33Z

Thanks @s-kostyaev for the update. I think this is likely to be upstream as well (there's nothing in llm that would cause this). However, it would be good to add some tests to make sure the embedding is giving sane results. That would be a good thing to add, so we can check this in the future. But for now I'm going to close this, since the issue is (as far as we can tell) not in the llm library. Thank you @jkitchin for your investigation, and for reporting this!

ahyatt closed this as completed Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama2 embedding vectors from Python and llm.el don't seem to match #15

llama2 embedding vectors from Python and llm.el don't seem to match #15

jkitchin commented Jan 3, 2024

ahyatt commented Jan 4, 2024

s-kostyaev commented Jan 7, 2024

s-kostyaev commented Jan 7, 2024

ahyatt commented Jan 8, 2024

llama2 embedding vectors from Python and llm.el don't seem to match #15

llama2 embedding vectors from Python and llm.el don't seem to match #15

Comments

jkitchin commented Jan 3, 2024

ahyatt commented Jan 4, 2024

s-kostyaev commented Jan 7, 2024

s-kostyaev commented Jan 7, 2024

ahyatt commented Jan 8, 2024