Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: datacoord list_index error #33124

Open
1 task done
JamesBonddu opened this issue May 17, 2024 · 7 comments
Open
1 task done

[Bug]: datacoord list_index error #33124

JamesBonddu opened this issue May 17, 2024 · 7 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@JamesBonddu
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.1
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.6
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 512/1T
- GPU: 3090*8
- Others:

Current Behavior

Traceback (most recent call last):

  File "<string>", line 1, in <module>
  File "/opt/conda/envs/oassistant/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               │     │   └ 3
               │     └ 209
               └ <function _main at 0x7f6800617d90>
  File "/opt/conda/envs/oassistant/lib/python3.10/multiprocessing/spawn.py", line 129, in _main
    return self._bootstrap(parent_sentinel)
           │    │          └ 3
           │    └ <function BaseProcess._bootstrap at 0x7f6800806710>
           └ <SpawnProcess name='SpawnProcess-72' parent=30 started>
  File "/opt/conda/envs/oassistant/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7f6800805d80>
    └ <SpawnProcess name='SpawnProcess-72' parent=30 started>
  File "/opt/conda/envs/oassistant/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {'config': <uvicorn.config.Config object at 0x7f6800695e70>, 'target': <bound method Server.run of <uvicorn.server.Server obj...
    │    │        │    │        └ <SpawnProcess name='SpawnProcess-72' parent=30 started>
    │    │        │    └ ()
    │    │        └ <SpawnProcess name='SpawnProcess-72' parent=30 started>
    │    └ <function subprocess_started at 0x7f67fd989630>
    └ <SpawnProcess name='SpawnProcess-72' parent=30 started>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
    target(sockets=sockets)
    │              └ [<socket.socket fd=63, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 30008)>]
    └ <bound method Server.run of <uvicorn.server.Server object at 0x7f66b7339900>>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/uvicorn/server.py", line 60, in run
    return asyncio.run(self.serve(sockets=sockets))
           │       │   │    │             └ [<socket.socket fd=63, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 30008)>]
           │       │   │    └ <function Server.serve at 0x7f67fd9888b0>
           │       │   └ <uvicorn.server.Server object at 0x7f66b7339900>
           │       └ <function run at 0x7f67fff28700>
           └ <module 'asyncio' from '/opt/conda/envs/oassistant/lib/python3.10/asyncio/__init__.py'>
  File "/opt/conda/envs/oassistant/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
           │    │                  └ <coroutine object Server.serve at 0x7f66b14d3610>
           │    └ <method 'run_until_complete' of 'uvloop.loop.Loop' objects>
           └ <uvloop.Loop running=True closed=False debug=False>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/middleware/base.py", line 69, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
          │    │   │      │                      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.send_no_error at 0x7f66a46b8ee0>
          │    │   │      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.receive_or_disconnect at 0x7f66a47b3370>
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('192.168.240.2', 30008)...
          │    └ <starlette.middleware.base.BaseHTTPMiddleware object at 0x7f66b0093130>
          └ <starlette.middleware.base.BaseHTTPMiddleware object at 0x7f66b0093190>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/middleware/base.py", line 106, in __call__
    response = await self.dispatch_func(request, call_next)
                     │    │             │        └ <function BaseHTTPMiddleware.__call__.<locals>.call_next at 0x7f66a46b8ca0>
                     │    │             └ <starlette.requests.Request object at 0x7f66a450b9a0>
                     │    └ <function log_exceptions at 0x7f66b009beb0>
                     └ <starlette.middleware.base.BaseHTTPMiddleware object at 0x7f66b0093130>

> File "/data/mlops/Open-Assistant/inference/server/main_upload.py", line 92, in log_exceptions
    response = await call_next(request)
                     │         └ <starlette.requests.Request object at 0x7f66a450b9a0>
                     └ <function BaseHTTPMiddleware.__call__.<locals>.call_next at 0x7f66a46b8ca0>

  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/middleware/base.py", line 80, in call_next
    raise app_exc
          └ MilvusException()
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/middleware/base.py", line 69, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
          │    │   │      │                      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.send_no_error at 0x7f66a46b8f70>
          │    │   │      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.receive_or_disconnect at 0x7f66a46b8af0>
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('192.168.240.2', 30008)...
          │    └ <starlette.middleware.sessions.SessionMiddleware object at 0x7f66b0092830>
          └ <starlette.middleware.base.BaseHTTPMiddleware object at 0x7f66b0093130>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/middleware/sessions.py", line 86, in __call__
    await self.app(scope, receive, send_wrapper)
          │    │   │      │        └ <function SessionMiddleware.__call__.<locals>.send_wrapper at 0x7f66a45bf250>
          │    │   │      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.receive_or_disconnect at 0x7f66a46b8af0>
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('192.168.240.2', 30008)...
          │    └ <starlette.middleware.cors.CORSMiddleware object at 0x7f66b0092620>
          └ <starlette.middleware.sessions.SessionMiddleware object at 0x7f66b0092830>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
          │    │   │      │        └ <function SessionMiddleware.__call__.<locals>.send_wrapper at 0x7f66a45bf250>
          │    │   │      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.receive_or_disconnect at 0x7f66a46b8af0>
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('192.168.240.2', 30008)...
          │    └ <starlette.middleware.exceptions.ExceptionMiddleware object at 0x7f66b0092650>
          └ <starlette.middleware.cors.CORSMiddleware object at 0x7f66b0092620>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
          │    │   │      │        └ <function ExceptionMiddleware.__call__.<locals>.sender at 0x7f66a45bf760>
          │    │   │      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.receive_or_disconnect at 0x7f66a46b8af0>
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('192.168.240.2', 30008)...
          │    └ <fastapi.middleware.asyncexitstack.AsyncExitStackMiddleware object at 0x7f66b00925f0>
          └ <starlette.middleware.exceptions.ExceptionMiddleware object at 0x7f66b0092650>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
          │    │   │      │        └ <function ExceptionMiddleware.__call__.<locals>.sender at 0x7f66a45bf760>
          │    │   │      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.receive_or_disconnect at 0x7f66a46b8af0>
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('192.168.240.2', 30008)...
          │    └ <fastapi.routing.APIRouter object at 0x7f66b0090ac0>
          └ <fastapi.middleware.asyncexitstack.AsyncExitStackMiddleware object at 0x7f66b00925f0>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
          │     │      │      │        └ <function ExceptionMiddleware.__call__.<locals>.sender at 0x7f66a45bf760>
          │     │      │      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.receive_or_disconnect at 0x7f66a46b8af0>
          │     │      └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('192.168.240.2', 30008)...
          │     └ <function Route.handle at 0x7f67fefd64d0>
          └ APIRoute(path='/mq_ocr_task', name='mq_ocr_task', methods=['POST'])
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
          │    │   │      │        └ <function ExceptionMiddleware.__call__.<locals>.sender at 0x7f66a45bf760>
          │    │   │      └ <function BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.receive_or_disconnect at 0x7f66a46b8af0>
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('192.168.240.2', 30008)...
          │    └ <function request_response.<locals>.app at 0x7f66b00ac550>
          └ APIRoute(path='/mq_ocr_task', name='mq_ocr_task', methods=['POST'])
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
                     │    └ <starlette.requests.Request object at 0x7f6690448d00>
                     └ <function get_request_handler.<locals>.app at 0x7f66b00ac4c0>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/fastapi/routing.py", line 235, in app
    raw_response = await run_endpoint_function(
                         └ <function run_endpoint_function at 0x7f67fefd5000>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
    return await dependant.call(**values)
                 │         │      └ {'session': <sqlalchemy.orm.session.AsyncSession object at 0x7f6690448eb0>, 'request': <starlette.requests.Request object at ...
                 │         └ <function mq_ocr_task at 0x7f66b00ac430>
                 └ <fastapi.dependencies.models.Dependant object at 0x7f66b0093880>

  File "/data/mlops/Open-Assistant/inference/server/main_upload.py", line 309, in mq_ocr_task
    knowledge_service.batch_process_docs_to_vector(
    │                 └ <function KnowledgeService.batch_process_docs_to_vector at 0x7f66b5a78940>
    └ <oasst_inference_server.service.knowledge_service.KnowledgeService object at 0x7f66a45334c0>

  File "/data/mlops/Open-Assistant/inference/server/oasst_inference_server/service/knowledge_service.py", line 378, in batch_process_docs_to_vector
    vector_db, pks = self.docs_to_vector_store(collection_name, documents, file_path_list)
    │                │    │                    │                │          └ ['/data/turing_ai_jfs/data/chat_data/vector_db/admin/985b64f4-d03f-45b4-99c5-e7a43df1fdab/6e2f3111-bd87-47fa-a633-42e5317c006...
    │                │    │                    │                └ [Document(page_content='昆山市人民法院集约送达报告(张家港拓普五金制  ', metadata={'source': '/data/turing_ai_jfs/data/chat_data/vector_db/admin/98...
    │                │    │                    └ 'knowledge__admin__1762875948452122626'
    │                │    └ <function KnowledgeService.docs_to_vector_store at 0x7f66b5a789d0>
    │                └ <oasst_inference_server.service.knowledge_service.KnowledgeService object at 0x7f66a46cdff0>
    └ None

  File "/data/mlops/Open-Assistant/inference/server/oasst_inference_server/service/knowledge_service.py", line 396, in docs_to_vector_store
    vector_client = VectorStoreConnector(
                    └ <class 'oasst_inference_server.plugins.vectors_db.vector_store.connector.VectorStoreConnector'>

  File "/data/mlops/Open-Assistant/inference/server/oasst_inference_server/plugins/vectors_db/vector_store/connector.py", line 45, in __init__
    self.client = self.connector_class(**self.params)
    │             │    │                 │    └ {'embedding_function': HuggingFaceEmbeddings(client=SentenceTransformer(
    │             │    │                 │        (0): Transformer({'max_seq_length': 512, 'do_lower...
    │             │    │                 └ <oasst_inference_server.plugins.vectors_db.vector_store.connector.VectorStoreConnector object at 0x7f66a4665cf0>
    │             │    └ <class 'oasst_inference_server.plugins.vectors_db.vectors.milvus.Milvus'>
    │             └ <oasst_inference_server.plugins.vectors_db.vector_store.connector.VectorStoreConnector object at 0x7f66a4665cf0>
    └ <oasst_inference_server.plugins.vectors_db.vector_store.connector.VectorStoreConnector object at 0x7f66a4665cf0>

  File "/data/mlops/Open-Assistant/inference/server/oasst_inference_server/plugins/vectors_db/vectors/milvus.py", line 178, in __init__
    self.vector_db: _Milvus = _Milvus(
    │                         └ <class 'langchain.vectorstores.milvus.Milvus'>
    └ <oasst_inference_server.plugins.vectors_db.vectors.milvus.Milvus object at 0x7f66a46cd3f0>

  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/langchain/vectorstores/milvus.py", line 170, in __init__
    self._init()
    │    └ <function Milvus._init at 0x7f66a7009900>
    └ <langchain.vectorstores.milvus.Milvus object at 0x7f66a4533460>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/langchain/vectorstores/milvus.py", line 234, in _init
    self._create_index()
    │    └ <function Milvus._create_index at 0x7f66a7009b40>
    └ <langchain.vectorstores.milvus.Milvus object at 0x7f66a4533460>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/langchain/vectorstores/milvus.py", line 329, in _create_index
    if isinstance(self.col, Collection) and self._get_index() is None:
                  │    │    │               │    └ <function Milvus._get_index at 0x7f66a7009ab0>
                  │    │    │               └ <langchain.vectorstores.milvus.Milvus object at 0x7f66a4533460>
                  │    │    └ <class 'pymilvus.orm.collection.Collection'>
                  │    └ <Collection>:
                  │      -------------
                  │      <name>: knowledge__admin__1762875948452122626
                  │      <description>: 
                  │      <schema>: {'auto_id': True, 'descri...
                  └ <langchain.vectorstores.milvus.Milvus object at 0x7f66a4533460>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/langchain/vectorstores/milvus.py", line 320, in _get_index
    for x in self.col.indexes:
             │    │   └ <property object at 0x7f66c8ce9710>
             │    └ <Collection>:
             │      -------------
             │      <name>: knowledge__admin__1762875948452122626
             │      <description>: 
             │      <schema>: {'auto_id': True, 'descri...
             └ <langchain.vectorstores.milvus.Milvus object at 0x7f66a4533460>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 1117, in indexes
    tmp_index = conn.list_indexes(self._name, **kwargs)
                │    │            │    │        └ {}
                │    │            │    └ 'knowledge__admin__1762875948452122626'
                │    │            └ <Collection>:
                │    │              -------------
                │    │              <name>: knowledge__admin__1762875948452122626
                │    │              <description>: 
                │    │              <schema>: {'auto_id': True, 'descri...
                │    └ <function GrpcHandler.list_indexes at 0x7f66c11a3490>
                └ <pymilvus.client.grpc_handler.GrpcHandler object at 0x7f66a70344c0>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/pymilvus/decorators.py", line 135, in handler
    raise e from e
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/pymilvus/decorators.py", line 131, in handler
    return func(*args, **kwargs)
           │     │       └ {}
           │     └ (<pymilvus.client.grpc_handler.GrpcHandler object at 0x7f66a70344c0>, 'knowledge__admin__1762875948452122626')
           └ <function retry_on_rpc_failure.<locals>.wrapper.<locals>.handler at 0x7f66c11a35b0>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/pymilvus/decorators.py", line 170, in handler
    return func(self, *args, **kwargs)
           │    │      │       └ {}
           │    │      └ ('knowledge__admin__1762875948452122626',)
           │    └ <pymilvus.client.grpc_handler.GrpcHandler object at 0x7f66a70344c0>
           └ <function retry_on_rpc_failure.<locals>.wrapper.<locals>.handler at 0x7f66c11a3520>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/pymilvus/decorators.py", line 110, in handler
    raise e from e
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/pymilvus/decorators.py", line 74, in handler
    return func(*args, **kwargs)
           │     │       └ {}
           │     └ (<pymilvus.client.grpc_handler.GrpcHandler object at 0x7f66a70344c0>, 'knowledge__admin__1762875948452122626')
           └ <function GrpcHandler.list_indexes at 0x7f66c11a31c0>
  File "/opt/conda/envs/oassistant/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 884, in list_indexes
    raise MilvusException(status.code, status.reason, status.error_code)
          │               │      │     │      │       │      └ <field property 'milvus.proto.common.Status.error_code'>
          │               │      │     │      │       └ error_code: NotReadyServe
          │               │      │     │      │         reason: "stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/m...
          │               │      │     │      └ <field property 'milvus.proto.common.Status.reason'>
          │               │      │     └ error_code: NotReadyServe
          │               │      │       reason: "stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/m...
          │               │      └ <field property 'milvus.proto.common.Status.code'>
          │               └ error_code: NotReadyServe
          │                 reason: "stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/m...
          └ <class 'pymilvus.exceptions.MilvusException'>

pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:556 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:570 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:107 github.com/milvus-io/milvus/internal/distributed/datacoord/client.wrapGrpcCall[...]
/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:633 github.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).DescribeIndex.func1
/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44 github.com/milvus-io/milvus/pkg/util/retry.Do
/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:632 github.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).DescribeIndex
/go/src/github.com/milvus-io/milvus/internal/proxy/task_index.go:619 github.com/milvus-io/milvus/internal/proxy.(*describeIndexTask).Execute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask
/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:492 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).definitionLoop: service not ready[datacoord=3424]: Initializing)>
2024-05-17 12:07:19.495 | ERROR    | oasst_inference_server.service.knowledge_service:docs_to_vector_store:404 | trace_id:d2d18bf9f9c548d0a80c6c24efac2284 - Traceback (most recent call last):```


```# Licensed to the LF AI & Data foundation under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Related configuration of etcd, used to store Milvus metadata & service discovery.
etcd:
  endpoints: localhost:2379
  rootPath: by-dev # The root path where data is stored in etcd
  metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
  kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
  log:
    level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
    # path is one of:
    #  - "default" as os.Stderr,
    #  - "stderr" as os.Stderr,
    #  - "stdout" as os.Stdout,
    #  - file path to append server logs to.
    # please adjust in embedded Milvus: /tmp/milvus/logs/etcd.log
    path: stdout
  ssl:
    enabled: false # Whether to support ETCD secure connection mode
    tlsCert: /path/to/etcd-client.pem # path to your cert file
    tlsKey: /path/to/etcd-client-key.pem # path to your key file
    tlsCACert: /path/to/ca.pem # path to your CACert file
    # TLS min version
    # Optional values: 1.0, 1.1, 1.2, 1.3。
    # We recommend using version 1.2 and above.
    tlsMinVersion: 1.3
  use:
    embed: false # Whether to enable embedded Etcd (an in-process EtcdServer).
  data:
    dir: default.etcd # Embedded Etcd only. please adjust in embedded Milvus: /tmp/milvus/etcdData/

metastore:
  # Default value: etcd
  # Valid values: [etcd, tikv]
  type: etcd

# Related configuration of tikv, used to store Milvus metadata.
# Notice that when TiKV is enabled for metastore, you still need to have etcd for service discovery.
# TiKV is a good option when the metadata size requires better horizontal scalability.
tikv:
  # Note that the default pd port of tikv is 2379, which conflicts with etcd.
  endpoints: 127.0.0.1:2389
  rootPath: by-dev # The root path where data is stored
  metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
  kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath

localStorage:
  path: /var/lib/milvus/data/ # please adjust in embedded Milvus: /tmp/milvus/data/

# Related configuration of MinIO/S3/GCS or any other service supports S3 API, which is responsible for data persistence for Milvus.
# We refer to the storage service as MinIO/S3 in the following description for simplicity.
# TODO:需要根据入网后内网ip进行调整address,port,accessKeyID,secretAccessKey,bucketName
minio:
  address: 138.99.61.151 # Address of MinIO/S3
  port: 9000 # Port of MinIO/S3
  accessKeyID: minioadmin # accessKeyID of MinIO/S3
  secretAccessKey: turingminio123 # MinIO/S3 encryption string
  useSSL: false # Access to MinIO/S3 with SSL
  bucketName: milvus-bucket # Bucket name in MinIO/S3
  rootPath: milvus # The root path where the message is stored in MinIO/S3
  # Whether to useIAM role to access S3/GCS instead of access/secret keys
  # For more information, refer to
  # aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
  # gcp: https://cloud.google.com/storage/docs/access-control/iam
  # aliyun (ack): https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/use-rrsa-to-enforce-access-control
  # aliyun (ecs): https://www.alibabacloud.com/help/en/elastic-compute-service/latest/attach-an-instance-ram-role
  useIAM: false
  # Cloud Provider of S3. Supports: "aws", "gcp", "aliyun".
  # You can use "aws" for other cloud provider supports S3 API with signature v4, e.g.: minio
  # You can use "gcp" for other cloud provider supports S3 API with signature v2
  # You can use "aliyun" for other cloud provider uses virtual host style bucket
  # When useIAM enabled, only "aws", "gcp", "aliyun" is supported for now
  cloudProvider: aws
  # Custom endpoint for fetch IAM role credentials. when useIAM is true & cloudProvider is "aws".
  # Leave it empty if you want to use AWS default endpoint
  iamEndpoint:
  # Log level for aws sdk log.
  # Supported level:  off, fatal, error, warn, info, debug, trace
  logLevel: fatal
  # Cloud data center region
  region: ""
  # Cloud whether use virtual host bucket mode
  useVirtualHost: false
  # timeout for request time in milliseconds
  requestTimeoutMs: 1000000

# Milvus supports four MQ: rocksmq(based on RockDB), natsmq(embedded nats-server), Pulsar and Kafka.
# You can change your mq by setting mq.type field.
# If you don't set mq.type field as default, there is a note about enabling priority if we config multiple mq in this file.
# 1. standalone(local) mode: rocksmq(default) > Pulsar > Kafka
# 2. cluster mode:  Pulsar(default) > Kafka (rocksmq and natsmq is unsupported in cluster mode)
mq:
  # Default value: "default"
  # Valid values: [default, pulsar, kafka, rocksmq, natsmq]
  type: default

# Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
pulsar:
  address: localhost # Address of pulsar
  port: 6650 # Port of Pulsar
  webport: 80 # Web port of pulsar, if you connect directly without proxy, should use 8080
  maxMessageSize: 20971520 # 5 * 1024 * 1024 Bytes, Maximum size of each message in pulsar.  5242880
  tenant: public
  namespace: default
  requestTimeout: 600 # pulsar client global request timeout in seconds
  enableClientMetrics: false # Whether to register pulsar client metrics into milvus metrics path.

# If you want to enable kafka, needs to comment the pulsar configs
# kafka:
#   brokerList:
#   saslUsername:
#   saslPassword:
#   saslMechanisms: PLAIN
#   securityProtocol: SASL_SSL
#   readTimeout: 10 # read message timeout in seconds

rocksmq:
  # The path where the message is stored in rocksmq
  # please adjust in embedded Milvus: /tmp/milvus/rdb_data
  path: /var/lib/milvus/rdb_data
  lrucacheratio: 0.06 # rocksdb cache memory ratio
  rocksmqPageSize: 67108864 # 64 MB, 64 * 1024 * 1024 bytes, The size of each page of messages in rocksmq
  retentionTimeInMinutes: 4320 # 3 days, 3 * 24 * 60 minutes, The retention time of the message in rocksmq.
  retentionSizeInMB: 8192 # 8 GB, 8 * 1024 MB, The retention size of the message in rocksmq.
  compactionInterval: 86400 # 1 day, trigger rocksdb compaction every day to remove deleted data
  # compaction compression type, only support use 0,7.
  # 0 means not compress, 7 will use zstd
  # len of types means num of rocksdb level.
  compressionTypes: [0, 0, 7, 7, 7]

# natsmq configuration.
# more detail: https://docs.nats.io/running-a-nats-service/configuration
natsmq:
  server: # server side configuration for natsmq.
    port: 4222 # 4222 by default, Port for nats server listening.
    storeDir: /var/lib/milvus/nats # /var/lib/milvus/nats by default, directory to use for JetStream storage of nats.
    maxFileStore: 17179869184 # (B) 16GB by default, Maximum size of the 'file' storage.
    maxPayload: 8388608 # (B) 8MB by default, Maximum number of bytes in a message payload.
    maxPending: 67108864 # (B) 64MB by default, Maximum number of bytes buffered for a connection Applies to client connections.
    initializeTimeout: 4000 # (ms) 4s by default, waiting for initialization of natsmq finished.
    monitor:
      trace: false # false by default, If true enable protocol trace log messages.
      debug: false # false by default, If true enable debug log messages.
      logTime: true # true by default, If set to false, log without timestamps.
      logFile: /tmp/milvus/logs/nats.log # /tmp/milvus/logs/nats.log by default, Log file path relative to .. of milvus binary if use relative path.
      logSizeLimit: 536870912 # (B) 512MB by default, Size in bytes after the log file rolls over to a new one.
    retention:
      maxAge: 4320 # (min) 3 days by default, Maximum age of any message in the P-channel.
      maxBytes: # (B) None by default, How many bytes the single P-channel may contain. Removing oldest messages if the P-channel exceeds this size.
      maxMsgs: # None by default, How many message the single P-channel may contain. Removing oldest messages if the P-channel exceeds this limit.

# Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
rootCoord:
  dmlChannelNum: 8 # The number of dml channels created at system startup
  maxDatabaseNum: 64 # Maximum number of database
  maxPartitionNum: 4096 # Maximum number of partitions in a collection
  minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed
  importTaskExpiration: 900 # (in seconds) Duration after which an import task will expire (be killed). Default 900 seconds (15 minutes).
  importTaskRetention: 86400 # (in seconds) Milvus will keep the record of import tasks for at least `importTaskRetention` seconds. Default 86400, seconds (24 hours).
  enableActiveStandby: false
  # can specify ip for example
  # ip: 127.0.0.1
  ip: # if not specify address, will use the first unicastable address as local ip
  port: 53100
  grpc:
    serverMaxSendSize: 536870912
    serverMaxRecvSize: 536870912
    clientMaxSendSize: 268435456
    clientMaxRecvSize: 268435456

# Related configuration of proxy, used to validate client requests and reduce the returned results.
proxy:
  timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick
  healthCheckTimeout: 3000 # ms, the interval that to do component healthy check
  msgStream:
    timeTick:
      bufSize: 512
  maxNameLength: 255 # Maximum length of name for a collection or alias
  # Maximum number of fields in a collection.
  # As of today (2.2.0 and after) it is strongly DISCOURAGED to set maxFieldNum >= 64.
  # So adjust at your risk!
  maxFieldNum: 684
  maxShardNum: 16 # Maximum number of shards in a collection
  maxDimension: 32768 # Maximum dimension of a vector
  # Whether to produce gin logs.\n
  # please adjust in embedded Milvus: false
  ginLogging: true
  ginLogSkipPaths: "/" # skipped url path for gin log split by comma
  maxTaskNum: 1024 # max task number of proxy task queue
  accessLog:
    enable: false
    # Log filename, set as "" to use stdout.
    # filename: ""
    # define formatters for access log by XXX:{format: XXX, method:[XXX,XXX]}
    formatters:
      # "base" formatter could not set methods
      # all method will use "base" formatter default
      base:
        # will not print access log if set as ""
        format: "[$time_now] [ACCESS] <$user_name: $user_addr> $method_name [status: $method_status] [code: $error_code] [sdk: $sdk_version] [msg: $error_msg] [traceID: $trace_id] [timeCost: $time_cost]"
      query:
        format: "[$time_now] [ACCESS] <$user_name: $user_addr> $method_name [status: $method_status] [code: $error_code] [sdk: $sdk_version] [msg: $error_msg] [traceID: $trace_id] [timeCost: $time_cost] [database: $database_name] [collection: $collection_name] [partitions: $partition_name] [expr: $method_expr]"
          # set formatter owners by method name(method was all milvus external interface)
          # all method will use base formatter default
          # one method only could use one formatter
          # if set a method formatter mutiple times, will use random fomatter.
        methods: ["Query", "Search", "Delete"]
    # localPath: /tmp/milvus_accesslog // log file rootpath
    # maxSize: 64 # max log file size(MB) of singal log file, mean close when time <= 0.
    # rotatedTime: 0 # max time range of singal log file, mean close when time <= 0;
    # maxBackups: 8 # num of reserved backups. will rotate and crate a new backup when access log file trigger maxSize or rotatedTime.
    # cacheSize: 10240 # write cache of accesslog in Byte

    # minioEnable: false # update backups to milvus minio when minioEnable is true.
    # remotePath: "access_log/" # file path when update backups to minio
    # remoteMaxTime: 0 # max time range(in Hour) of backups in minio, 0 means close time retention.
  http:
    enabled: true # Whether to enable the http server
    debug_mode: false # Whether to enable http server debug mode
  # can specify ip for example
  # ip: 127.0.0.1
  ip: # if not specify address, will use the first unicastable address as local ip
  port: 19530
  internalPort: 19529
  grpc:
    serverMaxSendSize: 536870910
    serverMaxRecvSize: 536870910
    clientMaxSendSize: 268435456
    clientMaxRecvSize: 268435456

# Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
queryCoord:         # rootcoord
  autoHandoff: true # Enable auto handoff
  autoBalance: true # Enable auto balance
  balancer: ScoreBasedBalancer # Balancer to use
  globalRowCountFactor: 0.1 # expert parameters, only used by scoreBasedBalancer
  scoreUnbalanceTolerationFactor: 0.05 # expert parameters, only used by scoreBasedBalancer
  reverseUnBalanceTolerationFactor: 1.3 #expert parameters, only used by scoreBasedBalancer
  overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload
  balanceIntervalSeconds: 60
  memoryUsageMaxDifferencePercentage: 30
  checkInterval: 1000
  channelTaskTimeout: 60000 # 1 minute
  segmentTaskTimeout: 120000 # 2 minute
  distPullInterval: 500
  heartbeatAvailableInterval: 10000 # 10s, Only QueryNodes which fetched heartbeats within the duration are available
  loadTimeoutSeconds: 600
  checkHandoffInterval: 5000
  # can specify ip for example
  # ip: 127.0.0.1
  ip: # if not specify address, will use the first unicastable address as local ip
  port: 19531
  grpc:
    serverMaxSendSize: 536870912
    serverMaxRecvSize: 536870912
    clientMaxSendSize: 268435456
    clientMaxRecvSize: 268435456
  taskMergeCap: 1
  taskExecutionCap: 256
  enableActiveStandby: false # Enable active-standby
  brokerTimeout: 5000 # broker rpc timeout in milliseconds

# Related configuration of queryNode, used to run hybrid search between vector and scalar data.
# TODO:添加mmap配置项mmapDirPath
queryNode:
  mmapDirPath: /var/lib/milvus/mmap
  dataSync:
    flowGraph:
      maxQueueLength: 16 # Maximum length of task queue in flowgraph
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
  stats:
    publishInterval: 1000 # Interval for querynode to report node information (milliseconds)
  segcore:
    cgoPoolSizeRatio: 2.0 # cgo pool size ratio to max read concurrency
    knowhereThreadPoolNumRatio: 4
    # Use more threads to make better use of SSD throughput in disk index.
    # This parameter is only useful when enable-disk = true.
    # And this value should be a number greater than 1 and less than 32.
    chunkRows: 128 # The number of vectors in a chunk.
    interimIndex: # build a vector temperate index for growing segment or binlog to accelerate search
      enableIndex: true
      nlist: 128 # segment index nlist
      nprobe: 16 # nprobe to search segment, based on your accuracy requirement, must smaller than nlist
      memExpansionRate: 1.15 # the ratio of building interim index memory usage to raw data
  loadMemoryUsageFactor: 1 # The multiply factor of calculating the memory usage while loading segments
  enableDisk: false # enable querynode load disk index, and search on disk index
  maxDiskUsagePercentage: 95
  cache:
    enabled: true # deprecated, TODO: remove it
    memoryLimit: 2147483648 # 2 GB, 2 * 1024 *1024 *1024 # deprecated, TODO: remove it
    readAheadPolicy: willneed # The read ahead policy of chunk cache, options: `normal, random, sequential, willneed, dontneed`
  grouping:
    enabled: true
    maxNQ: 1000
    topKMergeRatio: 20
  scheduler:
    receiveChanSize: 10240
    unsolvedQueueSize: 10240
    # maxReadConcurrentRatio is the concurrency ratio of read task (search task and query task).
    # Max read concurrency would be the value of runtime.NumCPU * maxReadConcurrentRatio.
    # It defaults to 2.0, which means max read concurrency would be the value of runtime.NumCPU * 2.
    # Max read concurrency must greater than or equal to 1, and less than or equal to runtime.NumCPU * 100.
    # (0, 100]
    maxReadConcurrentRatio: 1
    cpuRatio: 10 # ratio used to estimate read task cpu usage.
    maxTimestampLag: 86400
    # read task schedule policy: fifo(by default), user-task-polling.
    scheduleReadPolicy:
      # fifo: A FIFO queue support the schedule.
      # user-task-polling:
      #     The user's tasks will be polled one by one and scheduled.
      #     Scheduling is fair on task granularity.
      #     The policy is based on the username for authentication.
      #     And an empty username is considered the same user.
      #     When there are no multi-users, the policy decay into FIFO
      name: fifo
      maxPendingTask: 10240
      # user-task-polling configure:
      taskQueueExpire: 60 # 1 min by default, expire time of inner user task queue since queue is empty.
      enableCrossUserGrouping: false # false by default Enable Cross user grouping when using user-task-polling policy. (close it if task of any user can not merge others).
      maxPendingTaskPerUser: 1024 # 50 by default, max pending task in scheduler per user.

  # can specify ip for example
  # ip: 127.0.0.1
  ip: # if not specify address, will use the first unicastable address as local ip
  port: 21123
  grpc:
    serverMaxSendSize: 536870912
    serverMaxRecvSize: 536870912
    clientMaxSendSize: 268435456
    clientMaxRecvSize: 268435456

indexCoord:
  bindIndexNodeMode:
    enable: false
    address: localhost:22930
    withCred: false
    nodeID: 0
  segment:
    minSegmentNumRowsToEnableIndex: 1024 # It's a threshold. When the segment num rows is less than this value, the segment will not be indexed

indexNode:
  scheduler:
    buildParallel: 1
  enableDisk: true # enable index node build disk vector index
  maxDiskUsagePercentage: 95
  # can specify ip for example
  # ip: 127.0.0.1
  ip: # if not specify address, will use the first unicastable address as local ip
  port: 21121
  grpc:
    serverMaxSendSize: 536870912
    serverMaxRecvSize: 536870912
    clientMaxSendSize: 268435456
    clientMaxRecvSize: 268435456

dataCoord:
  channel:
    watchTimeoutInterval: 300 # Timeout on watching channels (in seconds). Datanode tickler update watch progress will reset timeout timer.
    balanceSilentDuration: 300 # The duration before the channelBalancer on datacoord to run
    balanceInterval: 360 #The interval for the channelBalancer on datacoord to check balance status
  segment:
    maxSize: 512 # Maximum size of a segment in MB
    diskSegmentMaxSize: 2048 # Maximum size of a segment in MB for collection which has Disk index
    sealProportion: 0.23
    # The time of the assignment expiration in ms
    # Warning! this parameter is an expert variable and closely related to data integrity. Without specific
    # target and solid understanding of the scenarios, it should not be changed. If it's necessary to alter
    # this parameter, make sure that the newly changed value is larger than the previous value used before restart
    # otherwise there could be a large possibility of data loss
    assignmentExpiration: 2000
    maxLife: 86400 # The max lifetime of segment in seconds, 24*60*60
    # If a segment didn't accept dml records in maxIdleTime and the size of segment is greater than
    # minSizeFromIdleToSealed, Milvus will automatically seal it.
    # The max idle time of segment in seconds, 10*60.
    maxIdleTime: 600
    minSizeFromIdleToSealed: 16 # The min size in MB of segment which can be idle from sealed.
    # The max number of binlog file for one segment, the segment will be sealed if
    # the number of binlog file reaches to max value.
    maxBinlogFileNumber: 32
    smallProportion: 0.5 # The segment is considered as "small segment" when its # of rows is smaller than
    # (smallProportion * segment max # of rows).
    # A compaction will happen on small segments if the segment after compaction will have
    compactableProportion: 0.85
    # over (compactableProportion * segment max # of rows) rows.
    # MUST BE GREATER THAN OR EQUAL TO <smallProportion>!!!
    # During compaction, the size of segment # of rows is able to exceed segment max # of rows by (expansionRate-1) * 100%.
    expansionRate: 1.25
  enableCompaction: true # Enable data segment compaction
  compaction:
    enableAutoCompaction: true
    rpcTimeout: 10 # compaction rpc request timeout in seconds
    maxParallelTaskNum: 10 # max parallel compaction task number
    indexBasedCompaction: true

  enableGarbageCollection: true
  gc:
    interval: 3600 # gc interval in seconds
    missingTolerance: 3600 # file meta missing tolerance duration in seconds, 3600
    dropTolerance: 10800 # file belongs to dropped entity tolerance duration in seconds. 10800
  enableActiveStandby: false
  # can specify ip for example
  # ip: 127.0.0.1
  ip: # if not specify address, will use the first unicastable address as local ip
  port: 13333
  grpc:
    serverMaxSendSize: 536870912
    serverMaxRecvSize: 536870912
    clientMaxSendSize: 268435456
    clientMaxRecvSize: 268435456

dataNode:
  dataSync:
    flowGraph:
      maxQueueLength: 16 # Maximum length of task queue in flowgraph
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
    maxParallelSyncTaskNum: 6 # Maximum number of sync tasks executed in parallel in each flush manager
    skipMode:
      # when there are only timetick msg in flowgraph for a while (longer than coldTime),
      # flowGraph will turn on skip mode to skip most timeticks to reduce cost, especially there are a lot of channels
      enable: true
      skipNum: 4
      coldTime: 60
  segment:
    insertBufSize: 16777216 # Max buffer size to flush for a single segment.
    deleteBufBytes: 67108864 # Max buffer size to flush del for a single channel
    syncPeriod: 600 # The period to sync segments if buffer is not empty.
  # can specify ip for example
  # ip: 127.0.0.1
  ip: # if not specify address, will use the first unicastable address as local ip
  port: 21124
  grpc:
    serverMaxSendSize: 536870912
    serverMaxRecvSize: 536870912
    clientMaxSendSize: 268435456
    clientMaxRecvSize: 268435456
  memory:
    forceSyncEnable: true # `true` to force sync if memory usage is too high
    forceSyncSegmentNum: 1 # number of segments to sync, segments with top largest buffer will be synced.
    watermarkStandalone: 0.2 # memory watermark for standalone, upon reaching this watermark, segments will be synced.
    watermarkCluster: 0.5 # memory watermark for cluster, upon reaching this watermark, segments will be synced.
  timetick:
    byRPC: true
  channel:
    # specify the size of global work pool of all channels
    # if this parameter <= 0, will set it as the maximum number of CPUs that can be executing
    # suggest to set it bigger on large collection numbers to avoid blocking
    workPoolSize: -1
    # specify the size of global work pool for channel checkpoint updating
    # if this parameter <= 0, will set it as 1000
    # suggest to set it bigger on large collection numbers to avoid blocking
    updateChannelCheckpointMaxParallel: 1000

# Configures the system log output.
log:
  level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  file:
    rootPath: # root dir path to put logs, default "" means no log file will print. please adjust in embedded Milvus: /tmp/milvus/logs
    maxSize: 300 # MB
    maxAge: 10 # Maximum time for log retention in day.
    maxBackups: 20
  format: text # text or json
  stdout: true # Stdout enable or not

grpc:
  log:
    level: WARNING
  serverMaxSendSize: 536870912
  serverMaxRecvSize: 536870912
  client:
    compressionEnabled: false
    dialTimeout: 200
    keepAliveTime: 10000
    keepAliveTimeout: 20000
    maxMaxAttempts: 10
    initialBackOff: 0.2 # seconds
    maxBackoff: 10 # seconds
    backoffMultiplier: 2.0 # deprecated
  clientMaxSendSize: 268435456
  clientMaxRecvSize: 268435456

# Configure the proxy tls enable.
tls:
  serverPemPath: configs/cert/server.pem
  serverKeyPath: configs/cert/server.key
  caPemPath: configs/cert/ca.pem

common:
  chanNamePrefix:
    cluster: by-dev
    rootCoordTimeTick: rootcoord-timetick
    rootCoordStatistics: rootcoord-statistics
    rootCoordDml: rootcoord-dml
    replicateMsg: replicate-msg
    rootCoordDelta: rootcoord-delta
    search: search
    searchResult: searchResult
    queryTimeTick: queryTimeTick
    dataCoordStatistic: datacoord-statistics-channel
    dataCoordTimeTick: datacoord-timetick-channel
    dataCoordSegmentInfo: segment-info-channel
  subNamePrefix:
    proxySubNamePrefix: proxy
    rootCoordSubNamePrefix: rootCoord
    queryNodeSubNamePrefix: queryNode
    dataCoordSubNamePrefix: dataCoord
    dataNodeSubNamePrefix: dataNode
  defaultPartitionName: _default # default partition name for a collection
  defaultIndexName: _default_idx # default index name
  entityExpiration: -1 # Entity expiration in seconds, CAUTION -1 means never expire
  indexSliceSize: 16 # MB
  threadCoreCoefficient:
    highPriority: 7 # This parameter specify how many times the number of threads is the number of cores in high priority thread pool
    middlePriority: 5 # This parameter specify how many times the number of threads is the number of cores in middle priority thread pool
    lowPriority: 1 # This parameter specify how many times the number of threads is the number of cores in low priority thread pool
  DiskIndex:
    MaxDegree: 56
    SearchListSize: 100
    PQCodeBudgetGBRatio: 0.125
    BuildNumThreadsRatio: 1
    SearchCacheBudgetGBRatio: 0.1
    LoadNumThreadRatio: 8
    BeamWidthRatio: 4
  consistencyLevelUsedInDelete: "Bounded"
  gracefulTime: 5000 # milliseconds. it represents the interval (in ms) by which the request arrival time needs to be subtracted in the case of Bounded Consistency.
  gracefulStopTimeout: 1800 # seconds. it will force quit the server if the graceful stop process is not completed during this time.
  storageType: remote # please adjust in embedded Milvus: local, available values are [local, remote, opendal], value minio is deprecated, use remote instead
  # Default value: auto
  # Valid values: [auto, avx512, avx2, avx, sse4_2]
  # This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
  simdType: auto
  security:
    authorizationEnabled: false
    # The superusers will ignore some system check processes,
    # like the old password verification when updating the credential
    # superUsers: root
    tlsMode: 0
  session:
    ttl: 30 # ttl value when session granting a lease to register service
    retryTimes: 30 # retry times when session sending etcd requests

  # preCreatedTopic decides whether using existed topic
  preCreatedTopic:
    enabled: false
    # support pre-created topics
    # the name of pre-created topics
    names: ["topic1", "topic2"]
    # need to set a separated topic to stand for currently consumed timestamp for each channel
    timeticker: "timetick-channel"

  ImportMaxFileSize: 17179869184 # 16 * 1024 * 1024 * 1024
  # max file size to import for bulkInsert

  locks:
    metrics:
      enable: false
    threshold:
      info: 500 # minimum milliseconds for printing durations in info level
      warn: 1000 # minimum milliseconds for printing durations in warn level
  ttMsgEnabled: true # Whether the instance disable sending ts messages

# QuotaConfig, configurations of Milvus quota and limits.
# By default, we enable:
#   1. TT protection;
#   2. Memory protection.
#   3. Disk quota protection.
# You can enable:
#   1. DML throughput limitation;
#   2. DDL, DQL qps/rps limitation;
#   3. DQL Queue length/latency protection;
#   4. DQL result rate protection;
# If necessary, you can also manually force to deny RW requests.
quotaAndLimits:
  enabled: true # `true` to enable quota and limits, `false` to disable.
  limits:
    maxCollectionNum: 65536
    maxCollectionNumPerDB: 65536
  # quotaCenterCollectInterval is the time interval that quotaCenter
  # collects metrics from Proxies, Query cluster and Data cluster.
  # seconds, (0 ~ 65536)
  quotaCenterCollectInterval: 3
  ddl:
    enabled: false
    collectionRate: -1 # qps, default no limit, rate for CreateCollection, DropCollection, LoadCollection, ReleaseCollection
    partitionRate: -1 # qps, default no limit, rate for CreatePartition, DropPartition, LoadPartition, ReleasePartition
  indexRate:
    enabled: false
    max: -1 # qps, default no limit, rate for CreateIndex, DropIndex
  flushRate:
    enabled: false
    max: -1 # qps, default no limit, rate for flush
  compactionRate:
    enabled: false
    max: -1 # qps, default no limit, rate for manualCompaction
  dml:
    # dml limit rates, default no limit.
    # The maximum rate will not be greater than max.
    enabled: enable
    insertRate:
      collection:
        max: 500 # MB/s, default no limit
      max: -1 # MB/s, default no limit
    upsertRate:
      collection:
        max: 500 # MB/s, default no limit
      max: -1 # MB/s, default no limit
    deleteRate:
      collection:
        max: 500 # MB/s, default no limit
      max: -1 # MB/s, default no limit
    bulkLoadRate:
      collection:
        max: 500 # MB/s, default no limit, not support yet. TODO: limit bulkLoad rate
      max: -1 # MB/s, default no limit, not support yet. TODO: limit bulkLoad rate
  dql:
    # dql limit rates, default no limit.
    # The maximum rate will not be greater than max.
    enabled: false
    searchRate:
      collection:
        max: -1 # vps (vectors per second), default no limit
      max: -1 # vps (vectors per second), default no limit
    queryRate:
      collection:
        max: -1 # qps, default no limit
      max: -1 # qps, default no limit
  limitWriting:
    # forceDeny false means dml requests are allowed (except for some
    # specific conditions, such as memory of nodes to water marker), true means always reject all dml requests.
    forceDeny: false
    ttProtection:
      enabled: false
      # maxTimeTickDelay indicates the backpressure for DML Operations.
      # DML rates would be reduced according to the ratio of time tick delay to maxTimeTickDelay,
      # if time tick delay is greater than maxTimeTickDelay, all DML requests would be rejected.
      # seconds
      maxTimeTickDelay: 300
    memProtection:
      # When memory usage > memoryHighWaterLevel, all dml requests would be rejected;
      # When memoryLowWaterLevel < memory usage < memoryHighWaterLevel, reduce the dml rate;
      # When memory usage < memoryLowWaterLevel, no action.
      enabled: true
      dataNodeMemoryLowWaterLevel: 0.6 # (0, 1], memoryLowWaterLevel in DataNodes
      dataNodeMemoryHighWaterLevel: 0.65 # (0, 1], memoryHighWaterLevel in DataNodes
      queryNodeMemoryLowWaterLevel: 0.6 # (0, 1], memoryLowWaterLevel in QueryNodes
      queryNodeMemoryHighWaterLevel: 0.65 # (0, 1], memoryHighWaterLevel in QueryNodes
    growingSegmentsSizeProtection:
      # No action will be taken if the growing segments size is less than the low watermark.
      # When the growing segments size exceeds the low watermark, the dml rate will be reduced,
      # but the rate will not be lower than `minRateRatio * dmlRate`.
      enabled: false
      minRateRatio: 0.5
      lowWaterLevel: 0.2
      highWaterLevel: 0.4
    diskProtection:
      enabled: true # When the total file size of object storage is greater than `diskQuota`, all dml requests would be rejected;
      diskQuota: -1 # MB, (0, +inf), default no limit
      diskQuotaPerCollection: -1 # MB, (0, +inf), default no limit
  limitReading:
    # forceDeny false means dql requests are allowed (except for some
    # specific conditions, such as collection has been dropped), true means always reject all dql requests.
    forceDeny: false
    queueProtection:
      enabled: false
      # nqInQueueThreshold indicated that the system was under backpressure for Search/Query path.
      # If NQ in any QueryNode's queue is greater than nqInQueueThreshold, search&query rates would gradually cool off
      # until the NQ in queue no longer exceeds nqInQueueThreshold. We think of the NQ of query request as 1.
      # int, default no limit
      nqInQueueThreshold: -1
      # queueLatencyThreshold indicated that the system was under backpressure for Search/Query path.
      # If dql latency of queuing is greater than queueLatencyThreshold, search&query rates would gradually cool off
      # until the latency of queuing no longer exceeds queueLatencyThreshold.
      # The latency here refers to the averaged latency over a period of time.
      # milliseconds, default no limit
      queueLatencyThreshold: -1
    resultProtection:
      enabled: false
      # maxReadResultRate indicated that the system was under backpressure for Search/Query path.
      # If dql result rate is greater than maxReadResultRate, search&query rates would gradually cool off
      # until the read result rate no longer exceeds maxReadResultRate.
      # MB/s, default no limit
      maxReadResultRate: -1
    # colOffSpeed is the speed of search&query rates cool off.
    # (0, 1]
    coolOffSpeed: 0.9

trace:
  # trace exporter type, default is stdout,
  # optional values: ['stdout', 'jaeger', 'otlp']
  exporter: stdout
  # fraction of traceID based sampler,
  # optional values: [0, 1]
  # Fractions >= 1 will always sample. Fractions < 0 are treated as zero.
  sampleFraction: 0
  otlp:
    endpoint: # "127.0.0.1:4318"
    secure: true
  jaeger:
    url: # "http://127.0.0.1:14268/api/traces"
    # when exporter is jaeger should set the jaeger's URL

autoIndex:
  params:
    build: '{"M": 18,"efConstruction": 240,"index_type": "HNSW", "metric_type": "IP"}'

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

cron_upload_warn.log

Anything else?

why datacoord list_index will report this error , what should i config for this

@JamesBonddu JamesBonddu added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 17, 2024
@yanliang567
Copy link
Contributor

it looks like that the milvus cluster is not healthy, but i don't know what did yo do before that?please share more info and the full milvus pod logs for investigation. Could you please refer this doc to export the whole Milvus logs for investigation?

/assign @JamesBonddu
/unassign

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 17, 2024
@yanliang567 yanliang567 added this to the 2.4.2 milestone May 17, 2024
@JamesBonddu
Copy link
Author

看起来 milvus 集群不健康,但我不知道您之前做了什么?请分享更多信息和完整的 milvus pod 日志以供调查。您能否参考此文档导出整个 Milvus 日志以进行调查?

/分配@JamesBonddu /取消分配

我使用docker-compose部署的,如何导出这些日志.
目前milvus-cluster大概有4w多的collection, 每次重启 应该直接docker-compose up -d还是 先docker-compose down 然后再docker-compose up -d ,每次重启cluster会变成恢复状态 恢复成可查询状态需要很久,而且需要删除milvus的pulsar中的队列中积压的数据才能恢复。

@JamesBonddu
Copy link
Author

@yanliang567
Copy link
Contributor

yanliang567 commented May 20, 2024

looks like there is a failure in describeIndex as there are 40K collections, @yiwangdr could you please help to take a look

/assign @yiwangdr
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels May 20, 2024
@yiwangdr
Copy link
Contributor

@JamesBonddu Datacoord is not ready.

  1. How much data do you have per collection?
  2. Please check if datacoord constantly restarts. Do you have enough resources (CPU/memory usage) on datacoord and datanode?

@JamesBonddu
Copy link
Author

@JamesBonddu Datacoord is not ready.

  1. How much data do you have per collection?
  2. Please check if datacoord constantly restarts. Do you have enough resources (CPU/memory usage) on datacoord and datanode?

ech collection has 6000 entity

cpu 512
memory 1T

image

@xiaofan-luan
Copy link
Contributor

why not put everything in one collection with partition key features ?
It doesn't a good design to have one collection with only 6000 data

@yanliang567 yanliang567 modified the milestones: 2.4.2, 2.4.3, 2.4.4 May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants