Poor performance with a large collection and simultaneous queries #17166

simon-fauconnier · 2022-05-23T09:42:19Z

simon-fauconnier
May 23, 2022

Hello,

We use milvus in our solution to search for similar vectors.

Our collection contains 2 fields:

a key (int64)
a FloatVector of 128 dimensions (indexed in HNSW -> metric_type: IP, M: 32, efConstruction: 200)

This is our query (python) :

collection.search( 
  [list of vectors (max length of 600)],
  "collection",
  param = {
	"metric_type": "IP",
	"params": {
	  "nprobe": 256,
	  "ef": 1024,
	}
  },
  limit = 5,
  expr  = None,
)

The milvus is deployed in cluster mode and has 2 datanodes (4cpu, 24Gi), 2 indexnodes (4 cpu, 24Gi) and 3 querynodes (8 cpu, 32Gi).
The collection contains 12,383,090 entities (but the number of elements is expected to increase significantly) and will reach for sure about 42,000,000,000 entities.

We have observed a latency that increases exponentially with the increasing number of competing requests :

The first request is done in less than 5 seconds (looks already slow compared to some published benchmarks).
The following requests take more and more time until they stabilize at 1m30

We use several jobs that run in parallel and contact the milvus simultaneously
The milvus is up to date

What can we do to reduce the response time of the milvus?
Knowing that we would like to increase the number of entities and the number of jobs that perform the requests (expected to reach >300 per second)

Can the milvus hold the load with a collection of several billion entities and such query load ?

xiaofan-luan · 2022-05-27T03:26:35Z

xiaofan-luan
May 27, 2022
Maintainer

Hi simon, 42b seems to be a huge number and require great amount of resources~
So far our benchmark only focus on 50m and the result are published here https://milvus.io/docs/v2.0.x/benchmark.md.

To increase performance, there are a few things you may want to do:

wait for 2.1 release(low hanging fruit, we did a lot of optimization of the throughput in 2.1)
tune your hnsw parameter, M, efconstruction and ef could be smaller for fast retrieving
tune segment size, the larger the faster it will be, but each segment is highly recommend not to be larger than your machine memory size * 1/8
partition your data. For instance, if you data can be partitioned by date and your search can be limited to single data then partition would help a lot.
change your SDK -> Go/Java SDK could be much faster than python in our real world test cases.
Use memory replica in 2.1~ It will replicate your data and then your concurrency will be doubled but again you will have to consider about cost efficienty.
Reduce your data, TTL might be the feature you are looking for

0 replies

majedtaki · 2022-06-18T01:05:00Z

majedtaki
Jun 18, 2022

Hi @xiaofan-luan I have deployed milvus 2.1 (master-20220613-e9dcda16) and I specified the replica num to be 2, 4 , 10, 16 but the throughput only increased a little bit. I’m running 20 query nodes with 8c32Gi. I loaded 50M 512dim vectors and I’m using HNSWPQ which is the fastest index.

With no replicas, I can get 220 QPS. When I tried multiple replicas, the most I got was 300QPS. In my case, all the quries are nq=1.

How can I scale so milvus can do 500QPS+ with low p95 of <200ms??

13 replies

majedtaki Jun 18, 2022

@xiaofan-luan I also noticed that memory is not getting completely free after I release a collection from the query nodes. Is that normal?

xiaofan-luan Jun 19, 2022
Maintainer

@xiaofan-luan I also noticed that memory is not getting completely free after I release a collection from the query nodes. Is that normal?

It's usually hard to shrink all the memories, especially for go gc and libc alloacated memory.
But if there is a a big gap pls let us know

majedtaki Jun 19, 2022

@xiaofan-luan I tried RHNSQW index and I only could get a throughput of 100 QPS with 20 query nodes and 4 replicas. Whats happening is that that in the query node prometheus metrics, the segcore request latency and the search segment latency are around 100ms each but in the proxy metrics, it shows the search latency to be 500ms? Also the query nodes cpu usage is very low. Its around only 100% out of 800%. If I increase the concurrency of the client, I just get higher latency and it does not increase the cpui utilization. I can't get higher than 100 QPS

Not sure if version 2.1 has any better performance than 2.0.2 in regards of scaling throughput :(

xiaofan-luan Jun 20, 2022
Maintainer

what about the proxy cpu usage?

xiaofan-luan Jun 20, 2022
Maintainer

what about the proxy cpu usage?

If the cpu usage is high, thinking of increasing proxy numbers

rere950303 · 2024-05-11T05:42:03Z

rere950303
May 11, 2024

@xiaofan-luan thanks for your works!
could you explain how the bigger segment improve query performance???

8 replies

yhmo May 13, 2024
Collaborator

From the above description, you can know:
With more collections created, there are more "data-channels" to be maintained. With a large number of collections, the time-tick synchronize tasks will slow down the entire system.

rere950303 May 13, 2024

@yhmo
Thank you so much for your detailed explanation. So I understood that if the replica of proxy increases when there are many query requests from users, the performance will improve to some extent, is that correct? And I'm using version 2.3.x, and I saw the default shade of collection was 2, has it changed?

yhmo May 13, 2024
Collaborator

When you call load(), there is a parameter "replica_number", default value is 1. This parameter determines how many copies of this collection are loaded in memory. If you set replica_number=2, there will be 2 copies in memory, and Milvus can balance search requests between the 2 copies. For each copy, there is a query node to lead/manage requests.

When you call search() on the client side:

A search request is sent to a proxy node of Milvus
The proxy node passes the request to a "leader query node" which is leading a copy, the query node knows the distribution of all the segments of this copy. Some segments might be loaded in other query nodes. The "leader" passes the request to other query nodes, gets the results from other query nodes, combine the results to a final result and returns the the proxy.
The proxy returns the final result to the client

Increase number of proxy nodes can accept more requests for each second. But the latency of request mainly depends on the performance of the query nodes.
If you have high-performance query nodes, you can increase proxy nodes to get hight QPS. Eventually, with more and more proxy nodes, the QPS is limited by the performance of query nodes. If your query node's performance is poor, definitely it is useless to increase proxy nodes.
Increase replica_number also can increase QPS, with replica_number=2, it will requires 2X memory size to load the collection.

yhmo May 13, 2024
Collaborator

In earlier versions of v2.3, the default value of shard_num is 2. In the later versions of v2.3, the default value has been changed to 1.

rere950303 May 13, 2024

@yhmo i using milvus v2.3.12 then, i understood the default value of shard_num is 1 in v2.3.12

Yherealtita · 2024-05-13T02:25:26Z

Yherealtita
May 13, 2024

Hello,

We use Milvus in our solution to search for similar vectors.

Our collection contains 2 fields:

a key (int64)

a FloatVector of 128 dimensions (indexed in HNSW -> metric_type: IP, M: 32, construction: 200)

This is our query (python) :
collection.Search( 

  [list of vectors (max length of 600)],

  "collection",

  param = {

	"metric_type": "IP",

	"params": {

	  "probe": 256,

	  "ef": 1024,

	}

  },

  limit = 5,

  expr  = None,

)
The milvus is deployed in cluster mode and has 2 datanodes (4 CPU, 24Gi), 2 index nodes (4 CPU, 24Gi) and 3 query nodes (8 CPU, 32Gi).

The collection contains 12,383,090 entities (but the number of elements is expected to increase significantly) and will reach for sure about 42,000,000,000 entities.

We have observed a latency that increases exponentially with the increasing number of competing requests :

The first request is done in less than 5 seconds (looks already slow compared to some published benchmarks).

The following requests take more and more time until they stabilize at 1m30

We use several jobs that run in parallel and contact the Milvus simultaneously

The Milvus is up to date

What can we do to reduce the response time of the milvus?

Knowing that we would like to increase the number of entities and the number of jobs that perform the requests (expected to reach >300 per second)

Can the milvus hold the load with a collection of several billion entities and such query load?

1 reply

yhmo May 13, 2024
Collaborator

No problem, Milvus can handle a collection with billions of entities and query/search.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance with a large collection and simultaneous queries #17166

{{title}}

Replies: 4 comments 22 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Poor performance with a large collection and simultaneous queries #17166

Replies: 4 comments · 22 replies

xiaofan-luan May 27, 2022 Maintainer

xiaofan-luan Jun 19, 2022 Maintainer

xiaofan-luan Jun 20, 2022 Maintainer

xiaofan-luan Jun 20, 2022 Maintainer

yhmo May 13, 2024 Collaborator

yhmo May 13, 2024 Collaborator

yhmo May 13, 2024 Collaborator

yhmo May 13, 2024 Collaborator

Replies: 4 comments 22 replies

xiaofan-luan
May 27, 2022
Maintainer

xiaofan-luan Jun 19, 2022
Maintainer

xiaofan-luan Jun 20, 2022
Maintainer

xiaofan-luan Jun 20, 2022
Maintainer

yhmo May 13, 2024
Collaborator

yhmo May 13, 2024
Collaborator

yhmo May 13, 2024
Collaborator

yhmo May 13, 2024
Collaborator