Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama2 embedding vectors from Python and llm.el don't seem to match #15

Closed
jkitchin opened this issue Jan 3, 2024 · 4 comments
Closed

Comments

@jkitchin
Copy link

jkitchin commented Jan 3, 2024

I noticed that an embedding vector from llm.el and Python using llama2 from ollama don't match. It is possible there is some small setting I have overlooked, and maybe even they don't use the same libraries, but I thought they would. They are least the same length. I only show the first and last elements of the embedding below for brevity.

elisp:

#+BEGIN_SRC emacs-lisp
(require 'llm)
(require 'llm-ollama)

(let* ((model (make-llm-ollama :embedding-model "llama2"))
       (v (llm-embedding model "test")))
  (list (length v) (elt v 0) (elt v 4095)))
#+END_SRC

#+RESULTS:
| 4096 | -0.9325962662696838 | -0.05561749264597893 |

Python:

#+BEGIN_SRC jupyter-python
from langchain.embeddings import OllamaEmbeddings
import numpy as np

ollama = OllamaEmbeddings(model="llama2")
vector = np.array(ollama.embed_documents(["test"]))

len(vector[0]), vector[0][0], vector[0][-1]
#+END_SRC

#+RESULTS:
:RESULTS:
| 4096 | 0.8642910718917847 | 0.4629634618759155 |
:END:

any tips on tracking down why the are different?

@ahyatt
Copy link
Owner

ahyatt commented Jan 4, 2024

According to a conversation I had with @s-kostyaev, there has been some issues with ollama embeddings, so he may have some ideas on what might be going on here.

@s-kostyaev
Copy link
Contributor

Hi @ahyatt @jkitchin
It's broken for now. See ollama/ollama#834 ollama/ollama#327 and ggerganov/llama.cpp#2872.

@s-kostyaev
Copy link
Contributor

@jkitchin You can reproduce this with curl and open issue in ollama upstream.

@ahyatt
Copy link
Owner

ahyatt commented Jan 8, 2024

Thanks @s-kostyaev for the update. I think this is likely to be upstream as well (there's nothing in llm that would cause this). However, it would be good to add some tests to make sure the embedding is giving sane results. That would be a good thing to add, so we can check this in the future. But for now I'm going to close this, since the issue is (as far as we can tell) not in the llm library. Thank you @jkitchin for your investigation, and for reporting this!

@ahyatt ahyatt closed this as completed Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants