-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] 是否能更换Faiss使用的向量匹配引擎和距离度量方式 #3115
Comments
🤖 你好,@liuchuan01!很高兴见到你。我是Dosu,一个可以帮助你解决问题、回答问题、并成为贡献者的机器人。我在这里帮助你,同时等待人类维护者的回复。我会立刻看看你的问题,并尽快给你完整的答复。 |
现在已经去挖掘了源码,能看到faiss.py初始化时使用的是flat引擎和内积距离,请问我如何改变? |
🤖 Based on your question, it seems you're looking to modify the initialization parameters for Faiss in the Langchain-Chatchat source code. These parameters can be found and modified in the Here's a snippet of the def load_vector_store(self) -> ThreadSafeFaiss:
return kb_faiss_pool.load_vector_store(kb_name=self.kb_name,
vector_name=self.vector_name,
embed_model=self.embed_model) To modify the initialization parameters for Faiss, you would adjust the values of Here's a snippet of the def do_init(self):
self.vector_name = self.vector_name or self.embed_model
self.kb_path = self.get_kb_path()
self.vs_path = self.get_vs_path() To modify the initialization parameters, you could directly set these attributes ( You can find more details in the source code at server/knowledge_base/kb_service/faiss_kb_service.py. Regarding the "Loading faiss with AVX2 support" and "Successfully loaded faiss with AVX2 support" messages, these are likely coming from the underlying Faiss library itself when it is being imported or initialized, and not directly from the Langchain-Chatchat codebase. The exact source of these messages would depend on the version and configuration of the Faiss library you are using. I hope this helps! If you have any more questions or need further clarification, feel free to ask. Sources
|
@dosu-bot after your mention .i find this vector_store = FAISS.load_local(vs_path, embeddings, normalize_L2=True,distance_strategy="METRIC_INNER_PRODUCT") in faiss_cache.py |
🤖 Based on the information you provided, it seems you are trying to modify the type of index used by Faiss in the Langchain-Chatchat source code. The To change the type of Faiss index, you would likely need to modify the implementation of the Here's a general approach on how you might modify the index type:
import faiss
# Example of creating an IVF index
dimension = 128 # Example dimension, adjust based on your actual data
nlist = 100 # Number of clusters
quantizer = faiss.IndexFlatL2(dimension) # The quantizer for assigning vectors to clusters
index = faiss.IndexIVFFlat(quantizer, dimension, nlist, faiss.METRIC_L2)
Without access to the implementation details of In addition, I found a similar solved issue in the Langchain-Chatchat repository: [FEATURE] langchain-chatchat Faiss向量库的索引配置在哪看呢. The suggested solution indicates that the Faiss vector library's index configuration can be found in the I hope this helps! If you have any other questions, feel free to ask. Sources:
Sources
|
@zRzRzRzRzRzRzR 维护者你好,我深入阅读源码后了解了加载kb的机制,梳理出这张流图(在知识库问答函数最外层加载时有所不同,因为看的版本被团队其他人做过修改) vector_store = FAISS.from_documents([doc], embeddings, normalize_L2=True,distance_strategy="METRIC_INNER_PRODUCT") 对应的是新建知识库的场景 if distance_strategy == DistanceStrategy.MAX_INNER_PRODUCT:
index = faiss.IndexFlatIP(len(embeddings[0]))
else:
# Default to L2, currently other metric types not initialized.
index = faiss.IndexFlatL2(len(embeddings[0])) 其中的常量定义为: MAX_INNER_PRODUCT = "MAX_INNER_PRODUCT" 这意味着new_vector_store函数加载时传递的参数并不是正确的。最终向量距离度量方式还是L2距离。请问我的理解有误吗? 目前我所在的团队正在持续研究本项目向量数据库(faiss)的使用优化中,也将开始测试内积、余弦相似度匹配的召回精度,希望后续可以为本项目作出更大贡献,感谢! |
This is a tough nut to crack, could really use your help on this, @imClumsyPanda. |
全流程分析在issue chatchat-space#3115
请问最后的结果怎么样?这一段代码我这边跑也是有所疑惑,解析器提示 |
出现这个报错的原因是本地加载知识库时,会指定距离度量方式为 METRIC_INNER_PRODUCT ,但该加载不经过__from,故度量类型不会被纠正,在faiss.py的init方法调用时会有代码如下 if (
self.distance_strategy != DistanceStrategy.EUCLIDEAN_DISTANCE
and self._normalize_L2
):
warnings.warn(
"Normalizing L2 is not applicable for metric type: {strategy}".format(
strategy=self.distance_strategy
)
) 当发现距离度量方式不为L2且使用L2归一化就会出现这个报错。 vector_store = FAISS.load_local(vs_path, embeddings, normalize_L2=True) 去除了手动指定的距离度量方式,最终加载时也就是默认的L2距离。 最后,创建知识库的流程也进行了纠错,可见PullRequest#3206 |
faiss_cache.py 文件_FaissPool方法大概61行,不是 也有这个方法?这一段不需要改吗? |
61行的也要改的,在我开启的PR里改的就是61行。61行是新建知识库的情况,111行左右load_local是本地加载的情况 |
好的 |
楼主大佬你好,同样研究faiss向量数据库的内积、余弦相似度的优化, 谢谢楼主的细致分析!小弟还想请教一下:
|
|
谢谢楼主,我明白了,Metric Inner product 估计确实是笔误,所以就是按你上面所说的,把61行里最后的 distance_strategy="METRIC_INNER_PRODUCT,删掉,这样就没有歧义了,对吧? |
我大概想明白了:
在langchain-chatchat的使用中,感觉使用法1, 知识库问答匹配知识的时候,总会匹配出很多不太相关的知识,想问问楼主有没有试过用2 的方法 采用内积的方式 去检索相似向量呀,我现在准备试试 对比一下效果~ |
尝试2 之后发现,若指定“[距离度量方式]”为"MAX_INNER_PRODUCT",(内积/余弦相似度度量),会出现楼上一样的 UserWarning: Normalizing L2 is not applicable for metric type: METRIC_INNER_PRODUCT 报错,想问问这种情况楼主有遇到不? |
指定度量方式改为"MAX_INNER_PRODUCT"后,知识库问答就检索不到文档了,变成大语言模型自身解答了,不知道为什么。大佬,有解决的办法吗?求 |
在配置文件中,我已经看到了kbs_config = {
"faiss": {
},
我也观察了日志,去寻找了
2024-02-29 14:34:03,340 - loader.py[line:54] - INFO: Loading faiss with AVX2 support.
2024-02-29 14:34:03,358 - loader.py[line:56] - INFO: Successfully loaded faiss with AVX2 support.
的出处,但依然找不到初始化faiss时的参数,希望解惑
The text was updated successfully, but these errors were encountered: