Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

向量数据库milvus报错 #2999

Closed
hollowdjj opened this issue Feb 18, 2024 · 10 comments
Closed

向量数据库milvus报错 #2999

hollowdjj opened this issue Feb 18, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@hollowdjj
Copy link
Contributor

hollowdjj commented Feb 18, 2024

问题描述 / Problem Description
执行python init_database.py --recreate-vs时报错
image

在不执行python init_database.py --recreate-vs时,启动项目,同样无法添加知识库文件。

2024-02-18 23:54:43,050 - _client.py[line:1027] - INFO: HTTP Request: POST http://127.0.0.1:7861/knowledge_base/search_docs "HTTP/1.1 200 OK" 2024-02-18 23:54:48,113 - utils.py[line:295] - INFO: UnstructuredMarkdownLoader used for /opt/Langchain-Chatchat/knowledge_base/test/content/在Java中使用协程.md 2024-02-18 23:54:49,798 - xml.py[line:105] - INFO: Reading document from string ... 2024-02-18 23:54:49,800 - html.py[line:151] - INFO: Reading document ... 2024-02-18 23:54:49,875 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,875 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,875 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,875 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,875 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,875 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,875 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,876 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 2024-02-18 23:54:49,877 - common.py[line:601] - INFO: HTML element instance has no attribute type 文档切分示例:page_content='在Java中使用协程\n1. 背景\n在Java 19中正式发布了一个新的语言特性 —— 虚拟线程。本质上,Java中的虚拟线程就是指协程,只是名称叫法略有不同。\n\u200b 图1-1 jdk19新特性\n协程是继进程、线程后,一种更细颗粒度的控制流。在IO密集型任务中,协程相比线程有着天然的优势,能够通过简单地编程大幅提升系统吞吐量,且所占用的内存资源也更少。目前,有许多主流语言也对协程进行了支持,其中最知名的就是Go语言,这意味着协程是多任务高并发编程的未来。\n\u200b 图1-2 进程、线程与线程间的关系' metadata={'source': '/opt/Langchain-Chatchat/knowledge_base/test/content/在Java中使用协程.md'} INFO: 127.0.0.1:45880 - "POST /knowledge_base/update_docs HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/applications.py", line 119, in __call__ await self.middleware_stack(scope, receive, send) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/routing.py", line 762, in __call__ await self.middleware_stack(scope, receive, send) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/routing.py", line 782, in app await route.handle(scope, receive, send) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/fastapi/routing.py", line 299, in app raise e File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/fastapi/routing.py", line 294, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/fastapi/routing.py", line 193, in run_endpoint_function return await run_in_threadpool(dependant.call, **values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool return await anyio.to_thread.run_sync(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/Langchain-Chatchat/server/knowledge_base/kb_doc_api.py", line 284, in update_docs kb.update_doc(kb_file, not_refresh_vs_cache=True) File "/opt/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 165, in update_doc return self.add_doc(kb_file, docs=docs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 131, in add_doc doc_infos = self.do_add_doc(docs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/Langchain-Chatchat/server/knowledge_base/kb_service/milvus_kb_service.py", line 83, in do_add_doc ids = self.milvus.add_documents(docs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 119, in add_documents return self.add_texts(texts, metadatas, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/LangChainChatChat/lib/python3.11/site-packages/langchain_community/vectorstores/milvus.py", line 520, in add_texts assert isinstance( AssertionError: A list of valid ids are required when auto_id is False.
环境信息 / Environment Information

  • langchain-ChatGLM 版本/commit 号:0.2.10
  • 是否使用 Docker 部署(是/否):否,部署在WSL中
  • 使用的模型(ChatGLM2-6B / Qwen-7B 等):ChatGLM3-6B @ cuda
  • 使用的 Embedding 模型(moka-ai/m3e-base 等):bge-large-zh-v1.5 @ cuda
  • 使用的向量库类型 (faiss / milvus / pg_vector 等): milvus-gpu
  • 操作系统及版本 / Operating system and version: WSL(Ubuntu-22.04)
  • Python 版本 / Python version: 3.11.7
  • 其他相关环境信息 / Other relevant environment information:
@hollowdjj hollowdjj added the bug Something isn't working label Feb 18, 2024
@hollowdjj hollowdjj changed the title 向量数据库milvus保存 向量数据库milvus报错 Feb 18, 2024
Copy link

dosubot bot commented Feb 18, 2024

🤖

The error "AssertionError: A list of valid ids are required when auto_id is False" you're encountering is likely due to the list of ids you're using to query or delete documents in the Milvus collection being either empty or containing invalid ids.

In the Langchain-Chatchat codebase, the get_doc_by_ids and del_doc_by_ids methods in the MilvusKBService class are the ones that use a list of ids to perform operations on the Milvus collection.

To resolve this issue, you need to ensure that the list of ids you're passing to these methods is not empty and contains valid ids.

Here is a modified version of the get_doc_by_ids method that checks if the list of ids is not empty before trying to query the Milvus collection:

def get_doc_by_ids(self, ids: List[str]) -> List[Document]:
    result = []
    if self.milvus.col and ids:  # Check if ids list is not empty
        data_list = self.milvus.col.query(expr=f'pk in {[int(_id) for _id in ids]}', output_fields=["*"])
        for data in data_list:
            text = data.pop("text")
            result.append(Document(page_content=text, metadata=data))
    return result

And here is a modified version of the del_doc_by_ids method:

def del_doc_by_ids(self, ids: List[str]) -> bool:
    if self.milvus.col and ids:  # Check if ids list is not empty
        self.milvus.col.delete(expr=f'pk in {ids}')

Please replace the existing methods with these modified versions and try running your command again. If the error persists, you may need to check the source of your ids to ensure they are valid. You can find these methods in the milvus_kb_service.py file.

As for the python init_database.py --recreate-vs command, it is used to recreate the vector store. This command is particularly useful if you have copied document files to the content folder, but the vector store has not been populated or if the DEFAUL_VS_TYPE/EMBEDDING_MODEL has changed. You can find more details in the init_database.py file.

I hope this helps! If you have any other questions or if the issue persists, please provide more details so I can assist you further.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@hollowdjj
Copy link
Contributor Author

🤖

The error "AssertionError: A list of valid ids are required when auto_id is False" you're encountering is likely due to the list of ids you're using to query or delete documents in the Milvus collection being either empty or containing invalid ids.

In the Langchain-Chatchat codebase, the get_doc_by_ids and del_doc_by_ids methods in the MilvusKBService class are the ones that use a list of ids to perform operations on the Milvus collection.

To resolve this issue, you need to ensure that the list of ids you're passing to these methods is not empty and contains valid ids.

Here is a modified version of the get_doc_by_ids method that checks if the list of ids is not empty before trying to query the Milvus collection:

def get_doc_by_ids(self, ids: List[str]) -> List[Document]:
    result = []
    if self.milvus.col and ids:  # Check if ids list is not empty
        data_list = self.milvus.col.query(expr=f'pk in {[int(_id) for _id in ids]}', output_fields=["*"])
        for data in data_list:
            text = data.pop("text")
            result.append(Document(page_content=text, metadata=data))
    return result

And here is a modified version of the del_doc_by_ids method:

def del_doc_by_ids(self, ids: List[str]) -> bool:
    if self.milvus.col and ids:  # Check if ids list is not empty
        self.milvus.col.delete(expr=f'pk in {ids}')

Please replace the existing methods with these modified versions and try running your command again. If the error persists, you may need to check the source of your ids to ensure they are valid. You can find these methods in the milvus_kb_service.py file.

As for the python init_database.py --recreate-vs command, it is used to recreate the vector store. This command is particularly useful if you have copied document files to the content folder, but the vector store has not been populated or if the DEFAUL_VS_TYPE/EMBEDDING_MODEL has changed. You can find more details in the init_database.py file.

I hope this helps! If you have any other questions or if the issue persists, please provide more details so I can assist you further.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

依旧报错

Copy link

dosubot bot commented Feb 19, 2024

🤖

Based on the error message you're receiving, it seems like the list of ids you're passing to the get_doc_by_ids and del_doc_by_ids methods is either empty or contains invalid ids.

In the get_doc_by_ids method, you're converting the ids to integers before querying the Milvus collection. If any of the ids in the list cannot be converted to an integer, this could be causing the error.

data_list = self.milvus.col.query(expr=f'pk in {[int(_id) for _id in ids]}', output_fields=["*"])

Similarly, in the del_doc_by_ids method, you're not converting the ids to integers before deleting the documents. If the ids in the Milvus collection are stored as integers, this could be causing the error.

self.milvus.col.delete(expr=f'pk in {ids}')

To resolve this issue, you should ensure that the list of ids you're passing to these methods is not empty and contains valid ids that exist in the Milvus collection. If the ids in the Milvus collection are stored as integers, you should also convert the ids to integers before deleting the documents.

Here is the corrected code:

def get_doc_by_ids(self, ids: List[str]) -> List[Document]:
    result = []
    if self.milvus.col and ids:
        data_list = self.milvus.col.query(expr=f'pk in {[int(_id) for _id in ids]}', output_fields=["*"])
        for data in data_list:
            text = data.pop("text")
            result.append(Document(page_content=text, metadata=data))
    return result

def del_doc_by_ids(self, ids: List[str]) -> bool:
    if self.milvus.col and ids:
        self.milvus.col.delete(expr=f'pk in {[int(_id) for _id in ids]}')

Please replace the existing get_doc_by_ids and del_doc_by_ids methods with the corrected ones and try executing python init_database.py --recreate-vs again.

You can find the original methods in the MilvusKBService class in the server/knowledge_base/kb_service/milvus_kb_service.py file.

If the issue persists, please provide more details about the ids you're passing to these methods and the state of your Milvus collection.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@jnchen
Copy link

jnchen commented Feb 19, 2024

error at
image

@tongxm
Copy link

tongxm commented Feb 24, 2024

I have the same problem, 'AssertionError: A list of valid ids are required when auto_id is False. '

@hollowdjj
Copy link
Contributor Author

I have the same problem, 'AssertionError: A list of valid ids are required when auto_id is False. '

By simply set auto_id = True when initializing Milvus @ /opt/Langchain-Chatchat/server/knowledge_base/kb_service/milvus_kb_service.py
image

However, this will some how cause another problem #2761

@hollowdjj
Copy link
Contributor Author

error at image

Any idea on how to fix it?

@smallshallot
Copy link

我也存在相同的问题

@JM-SLT
Copy link

JM-SLT commented Apr 5, 2024

目前有解决方案吗,出现同样的问题

@dragon0311
Copy link

本地起了milvus,新建知识库 -> 选择向量库:milvus,出现了一样的问题
只能先改用faiss了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants