-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: chromadb.api.configuration.InvalidConfigurationError: batch_size must be less than or equal to sync_threshold #2574
Comments
I've just spent three evenings tracking down the same bug and have managed to figure this out in the last half hour or so. I think this is a regression introduced by https://github.com/chroma-core/chroma/pull/2526/files I'm still figuring out the reproduction steps, but I think the process is
This will create the collection with the defaults in 0.5.4 where sync_threshold=100 and batch_size=1000
I haven't read through all of the other changes to the HNSW work in 0.5.5 but it looks like there's some changes to persistent properties and similar. I actually was trying to change the configured properties specifically with different metadata definitions and similar, but was having a lot of troubles. Specifically, this was not fixed by changing that code to
As a short term, I would suggest a downgrade to 0.5.4 (this has worked for me) and wait for a patch as the 0.5.5 is still in pre-release. |
@dddxst and @mikethemerry, thanks for reporting and investigating this. Indeed, it was a bug (#2338) released with 0.5.4 which was fixed (#2526) in 0.5.5. The issue is that any DB created with 0.5.4 would result in a validation issue you reporeted. To fix the problem (ideally, we should've added a migration script to do that, but alas): If in docker: Connect to your docker container: apt update && apt install sqlite3
sqlite3 /chroma/chroma/chroma.sqlite3 "update collections set config_json_str=json_set(config_json_str,'$.hnsw_configuration.batch_size',100,'$.hnsw_configuration.sync_threshold',1000) where name='test';"
# you don't have to run the below, but for consistency reasons:
sqlite3 /chroma/chroma/chroma.sqlite3 "update collection_metadata set int_value = 100 where key='hnsw:batch_size' and collection_id in (select id from collections where name='test');"
sqlite3 /chroma/chroma/chroma.sqlite3 "update collection_metadata set int_value = 1000 where key='hnsw:hnsw:sync_threshold' and collection_id in (select id from collections where name='test');" |
@mikethemerry, thanks to you it did not take three evenings to me to solve my problem, but only 3 minutes... |
tks,it works when update to 0.5.5,but error occur on windows ... |
tks |
Can you share the error you get on Windows? |
Hey everyone--I believe this is caused by a version mismatch; this shouldn't happen if your client and server are on the same version. Please make sure that your server and client are both on 0.5.5 and let us know if this is still happening. |
What happened?
from typing import List
import chromadb
from chromadb.api.configuration import HNSWConfiguration
from chromadb.api.models.Collection import Collection
from chromadb.utils.embedding_functions.sentence_transformer_embedding_function import
SentenceTransformerEmbeddingFunction
from read_word import extract_titles
class EmbeddingDB:
def init(self, db, embedding_function=None):
"""
docker pull chromadb/chroma
docker run -p 8000:8000 chromadb/chroma
embedding_function1 = SentenceTransformerEmbeddingFunction(model_name=m3_model)
client = chromadb.HttpClient(host='xx.xx.xx.xx', port=8000)
eDB = EmbeddingDB(client, embedding_function1)
titles, docs = extract_titles('wt.docx')
def load_data():
# eDB.delete_collection('docs')
# eDB.delete_collection('titles')
if name == 'main':
load_data()
the error occur on ubuntu,but it will not occur on windows
Versions
v0.5.4, ubuntu22 (or centos7.9), python3.11.9
Relevant log output
The text was updated successfully, but these errors were encountered: