Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance with scaling #4572

Open
1 task done
rrajp opened this issue Mar 28, 2024 · 9 comments
Open
1 task done

Poor performance with scaling #4572

rrajp opened this issue Mar 28, 2024 · 9 comments
Labels

Comments

@rrajp
Copy link

rrajp commented Mar 28, 2024

How to reproduce this bug?

Ingested 18 million objects
Weaviate version 1.24.4 running on docker
Persistent data size ~180Gb on disk
Instance configuration: 256 Gb RAM , 128 Core

What is the expected behavior?

Average query latency should be < 1 sec

What is the actual behavior?

Every new query, It is taking anywhere between 5 to 15 sec per query.
Exact query second time hit take 3-4 sec

Supporting information

I am using the db with default configurations.

Earlier I faced this issue on weaviate 1.23.3 . Then got suggestion from the forum to upgrade the version. It took me 24 hrs to migrate all the data, and situation is almost same even after migration.

Current RAM usages is 120Gb

I am using custom vectorizer (xlm Roberta 1024 dim)

Server Version

1.24.4

Code of Conduct

@rrajp rrajp added the bug label Mar 28, 2024
@rthiiyer82
Copy link

@rrajp : Thank you for reporting the issue.

Could you provide more information on the type of query being used? Are you able to share code snippet here?

@scd10
Copy link

scd10 commented Apr 12, 2024

Same problem.
I am using 1.24.8 docker with 26 million objects. The mathine is with 80 cores and 512G memory.
After restart docker the using memory increases for a long time, like 20 minutes or more, took about 260G mem finally.
New query will causes "weaviate.exceptions.WeaviateQueryError: Query call with protocol GRPC search failed with message Deadline Exceeded." every time! Then after a while if I retry with the same query it will return results very fast, like retrieve from cache or something.

@rrajp
Copy link
Author

rrajp commented Apr 12, 2024

I did little bit of experiments and found that main cause of latency has been the hybrid search. I am using hybrid with BM25 (alpha=0.5). If I make only vector query, it usually returns in 10ms. Also, I query is not global but applicable to almost 5000 data points ( applied with where on document id).
It seems like bm25 is indexing across all the data set and slowing everything down.

@scd10
Copy link

scd10 commented Apr 12, 2024

I'v tried vector search and hybrid search, both will cause timeout. I was using 1.23.10 earlier but it was very slow, normly it will costs 10 seconds or more with hybrid search, so I tried newest version, but I get timeout instead. I found some suggestion about setting timeout in python client, but it won't fix the slow response problem.

@pommedeterresautee
Copy link

FWIW we have same problems on bm25 search, super slow on moderately large index, and clearly second class citizen (no stemmer, etc). We just accepted the fact that we can't use it outside of creation of dataset and other offline scenario.

@amourao
Copy link
Contributor

amourao commented Apr 17, 2024

@pommedeterresautee @scd10 @rrajp
I'm sorry you are having issues with BM25 and hybrid search performance.
This is a topic we are actively working on.
I can't promise a timeline, but we should see considerable performance improvements over the next major releases (1.26+)

Until then, there are a couple of tips that can help:

  • The issue with the first query timing out and the second being slow points to BM25 indices being placed in cache on first read. If you are using MacOS, Linux or other Unix-like systems, you can try a tool like vmtouch to ensure that the BM25 information is placed in memory before having to be queried. Ideally, you'd cache the _searchable folders for the properties you are searching (you can find those folders inside the LSM folder for your index).
  • Depending on how much memory you have, consider testing setting the environmental variable PERSISTENCE_LSM_ACCESS_STRATEGY to pread and/or mmap (it is set to mmap by default).
  • By default, Weaviate will index and search all textual fields. To improve query time performance, you can search only on specific fields
  • Removing stopwords can also greatly help with performance.

About the scale of data, I have a couple of questions that may help me with further advise:

  • Do you know how many properties you have and how many words are there per-property?
  • What is the average query size (in number of words)?

@pommedeterresautee
Copy link

Tks for your answer. Our index is made of 40m of docs, each having 2 fields indexed for bm25 search.
One field is usually short (< 20 tokens) and the other a few hundred tokens max. Stop words are removed and everything is stemmed (lower vocab size in theory but we don t count vocab size). Query are short questions, usually < 20 tokens after stop words removal. Timing of each query is between 0.5s and 1s which is ... unexpected (we are used to ES query time).

@scd10
Copy link

scd10 commented Apr 18, 2024

@amourao Thanks for the reply. For your questions:

  • Our scheme have about 20 properties, most of them are metadata and set to "indexFilterable", only one property is set to "indexSearchable". the indexSeachable property is in Chinese and we tokenized it beforehand and seperate words with whitespace, so weaviate can tokenize the property with "whitespace" and build index. Most of this property have 100~200 seperated tokens.
  • About the query, we will also segment it and send the whitespaced query with a embedding vector to search in weavaite. The query are normally a sentence, about 10~30 tokens.

@amourao
Copy link
Contributor

amourao commented Apr 19, 2024

Ok, both your setups are well though, the only thing that jumps to mind are the slightly large queries with 10~30 tokens.
We will keep that type of scenario in mind when improving performance, and get it closer to <100 ms in most scenarios.

So, keyword search on Weaviate is quite disk-heavy, and relies on OS memory caching.
This can result in scenarios like the one described by @scd10 of very different cold and hot query performance.
I've used vmtouch before in a similar cold index scenario before to great effect.
An alternative would be to do a set of queries to warm up the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants