Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]切换milvus向量库,初始化数据报错 #3180

Closed
jiyang2024 opened this issue Mar 5, 2024 · 7 comments
Closed

[BUG]切换milvus向量库,初始化数据报错 #3180

jiyang2024 opened this issue Mar 5, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@jiyang2024
Copy link

切换milvus向量库,执行python init_database.py --recreate-vs进行初始化报错,
错误信息:
Traceback (most recent call last):
File "/root/jgy/Langchain-Chatchat/init_database.py", line 107, in
folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model)
File "/root/jgy/Langchain-Chatchat/server/knowledge_base/migrate.py", line 128, in folder2db
files2vs(kb_name, kb_files)
File "/root/jgy/Langchain-Chatchat/server/knowledge_base/migrate.py", line 113, in files2vs
kb.add_doc(kb_file=kb_file, not_refresh_vs_cache=True)
File "/root/jgy/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 131, in add_doc
doc_infos = self.do_add_doc(docs, **kwargs)
File "/root/jgy/Langchain-Chatchat/server/knowledge_base/kb_service/milvus_kb_service.py", line 83, in do_add_doc
ids = self.milvus.add_documents(docs)
File "/root/anaconda3/envs/env_rag/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 119, in add_documents
return self.add_texts(texts, metadatas, **kwargs)
File "/root/anaconda3/envs/env_rag/lib/python3.10/site-packages/langchain_community/vectorstores/milvus.py", line 520, in add_texts
assert isinstance(
AssertionError: A list of valid ids are required when auto_id is False.

@jiyang2024 jiyang2024 added the bug Something isn't working label Mar 5, 2024
Copy link

dosubot bot commented Mar 5, 2024

你好 @jiyang2024!很高兴再次见到你,希望你一切都好。

您在初始化数据库并切换到 Milvus 向量库时遇到的 "A list of valid ids are required when auto_id is False" 错误可能是由于在向 Milvus 数据库添加文档时没有明确指定它们的 ID,而 Milvus 的 auto_id 设置为 False

在 Milvus 中,当您向集合添加文档(向量)时,可以选择让 Milvus 通过将 auto_id 设置为 True 来自动为这些文档生成唯一的 ID,或者您可以通过将 auto_id 设置为 False 来为每个文档提供您自己的唯一 ID。错误消息表明 auto_id 被设置为 False,但是添加文档到数据库的代码没有提供 ID 列表。

要解决此问题,您需要修改 MilvusKBService 类中的 add_doc 方法,当 auto_id 设置为 False 时,包括为每个被添加的文档提供 ID。如果 MilvusKBService 类与其他 KB 服务类相似,且当前不处理 ID,那么您需要调整它,要么在应用程序中生成 ID,要么更改配置,让 Milvus 通过将 auto_id 设置为 True 来自动处理 ID 生成。

如果您可以控制应用程序中的 Milvus 集合创建过程,您可以确保在创建集合时将 auto_id 设置为 True。这个更改将允许 Milvus 自动为文档分配 ID,从而避免需要手动指定它们。然而,如果您出于特定原因需要保持 auto_idFalse,那么您将需要修改文档插入逻辑,为每个文档包含一个唯一的 ID。

由于提供的上下文没有包括 MilvusKBService 类的实现细节或 Milvus 集合是如何配置的,我无法提供具体的代码更改。您需要查看 MilvusKBService 类,特别是 add_doc 方法,并确保它根据您的 auto_id 配置正确处理 ID。

参考源代码:

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Mar 5, 2024
@jiyang2024
Copy link
Author

你好,初始化时,发现milvus的数据库和collection都没有发现,如何修改collection的auto_id 设置为True?

@hollowdjj
Copy link
Contributor

server/knowledge_base/kb_service/milvus_kb_service.py文件,56行后面加一个参数设置auto_id为True

@MuuuShin
Copy link

MuuuShin commented Mar 26, 2024

您好,auto_id为True之后仍继续报错,请问你是怎么解决的?

报错详情 (前略) 2024-03-26 19:18:34,229 - tokenization_chatglm.py[line:164] - WARNING: Setting eos_token is not supported, use the default one. 2024-03-26 19:18:34,229 - tokenization_chatglm.py[line:160] - WARNING: Setting pad_token is not supported, use the default one. 2024-03-26 19:18:34,229 - tokenization_chatglm.py[line:156] - WARNING: Setting unk_token is not supported, use the default one. 文档切分示例:page_content='ChatGPT是OpenAI开发的一个大型语言模型,可以提供各种主题的信息,\n# 如何向 ChatGPT 提问以获得高质量答案:提示技巧工程完全指南\n## 介绍\n我很高兴欢迎您阅读我的最新书籍《The Art of Asking ChatGPT for High-Quality Answers: A complete Guide to Prompt Engineering Techniques》。本书是一本全面指南,介绍了各种提示技术,用于从ChatGPT中生成高质量的答案。\n我们将探讨如何使用不同的提示工程技术来实现不同的目标。ChatGPT是一款最先进的语言模型,能够生成类似人类的文本。然而,理解如何正确地向ChatGPT提问以获得我们所需的高质量输出非常重要。而这正是本书的目的。' metadata={'source': '/data/bch/LLM/Langchain-Chatchat/knowledge_base/samples/content/test_files/test.txt'} 2024-03-26 19:18:34,833 - tokenization_chatglm.py[line:164] - WARNING: Setting eos_token is not supported, use the default one. 2024-03-26 19:18:34,834 - tokenization_chatglm.py[line:160] - WARNING: Setting pad_token is not supported, use the default one. 2024-03-26 19:18:34,834 - tokenization_chatglm.py[line:156] - WARNING: Setting unk_token is not supported, use the default one. 文档切分示例:page_content='See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/372669736\nCreating Large Language Model Applications Utilizing LangChain: A Primer on\nDeveloping LLM Apps Fast\nArticle\xa0\xa0in\xa0\xa0International Conference on Applied Engineering and Natural Sciences · July 2023\nDOI: 10.59287/icaens.1127\nCITATIONS\n0\nREADS\n47\n2 authors:\nSome of the authors of this publication are also working on these related projects:\nTHALIA: Test Harness for the Assessment of Legacy Information Integration Approaches View project\nAnalysis of Feroresonance with Signal Processing Technique View project\nOguzhan Topsakal' metadata={'source': '/data/bch/LLM/Langchain-Chatchat/knowledge_base/samples/content/test_files/langchain.pdf'} 2024-03-26 19:18:35,221 - tokenization_chatglm.py[line:164] - WARNING: Setting eos_token is not supported, use the default one. 2024-03-26 19:18:35,222 - tokenization_chatglm.py[line:160] - WARNING: Setting pad_token is not supported, use the default one. 2024-03-26 19:18:35,222 - tokenization_chatglm.py[line:156] - WARNING: Setting unk_token is not supported, use the default one. 文档切分示例:page_content='BoolQ\nPIQA\nSIQA\nHella-Swag\nARC-e\nARC-c\nNQ\nTQA\nMMLU\nGSM8K\nHuman-Eval\nMHA\n71.0\n79.3\n48.2\n75.1\n71.2\n43.0\n12.4\n44.7\n28.0\n4.9\n7.9\nMQA\n70.6' metadata={'source': '/data/bch/LLM/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型推理优化策略-幕布图片-930255-616209.jpg'} Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:02<00:00, 4.50it/s] 2024-03-26 19:18:38,421 - decorators.py[line:139] - ERROR: RPC error: [create_index], , Traceback (most recent call last): File "/data/bch/LLM/Langchain-Chatchat/init_database.py", line 107, in folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model) File "/data/bch/LLM/Langchain-Chatchat/server/knowledge_base/migrate.py", line 128, in folder2db files2vs(kb_name, kb_files) File "/data/bch/LLM/Langchain-Chatchat/server/knowledge_base/migrate.py", line 113, in files2vs kb.add_doc(kb_file=kb_file, not_refresh_vs_cache=True) File "/data/bch/LLM/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 131, in add_doc doc_infos = self.do_add_doc(docs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/bch/LLM/Langchain-Chatchat/server/knowledge_base/kb_service/milvus_kb_service.py", line 84, in do_add_doc ids = self.milvus.add_documents(docs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/bch/miniconda3/envs/LLMEnv/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 119, in add_documents return self.add_texts(texts, metadatas, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/bch/miniconda3/envs/LLMEnv/lib/python3.11/site-packages/langchain_community/vectorstores/milvus.py", line 586, in add_texts insert_list = [insert_dict[x][i:end] for x in self.fields] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/bch/miniconda3/envs/LLMEnv/lib/python3.11/site-packages/langchain_community/vectorstores/milvus.py", line 586, in insert_list = [insert_dict[x][i:end] for x in self.fields] ~~~~~~~~~~~^^^ KeyError: 'pk'

@AIdrinkhotWater
Copy link

我也遇到了,同问

@MuuuShin
Copy link

我也遇到了,同问

你可以看一下我的那个issue

@AIdrinkhotWater
Copy link

langchain 0.0.354
langchain-community 0.0.20
langchain-core 0.1.23
版本改成这样就可以了,我使用的是langchainchatchat0.2.10版本代码,milvus版本伟2.2.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants