-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
i'm working on a question answering chatbot over my personal document store using langchain's LlamaCppEmbeddings
, the LlamaCpp
LLM, and the Chroma
vectorstore.
when i use LlamaCppEmbeddings
as the embedding_function (with vicuna v1 13b 4bit quantized weights), the performance of similarity_search
is extremely poor. often, the best result is the very last or second to last result. this happens even when the text query is an exact substring of the input text, when there is no other input text with the same string.
switching to the default embedding function SentenceTransformerEmbeddingFunction
with no other changes vastly improves the behavior of my chatbot due to returning a much saner ranking of input documents. this workaround is okay for me, but it seems like LlamaCppEmbeddings should be able to far outperform this much smaller transformer model and i suspect there is not-subtle bug lurking.
the consistency of these poor results makes me wonder if there is a sign or lt/gt mixed up somewhere. i looked at the implementation of LlamaCppEmbeddings
and it is so simple that i don't see how the error could be coming from there instead of llama-cpp-python (or llama.cpp).
i'd be happy to help troubleshoot this, and i have a self contained, simple, and easily redistributable example which demonstrates this issue consistently.