Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unable to modify (update) collection metadata - "hnsw:space" will be lost. #2515

Open
amaxcz opened this issue Jul 14, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@amaxcz
Copy link

amaxcz commented Jul 14, 2024

What happened?

So, if collection declared with "hnsw:space" attribute, there is no way to UPDATE existing metadata and save "hnsw:space" tag, due to code.

def _validate_modify_request(self, metadata: Optional[CollectionMetadata]) -> None:
if metadata is not None:
validate_metadata(metadata)
if "hnsw:space" in metadata:
raise ValueError(
"Changing the distance function of a collection once it is created is not supported currently."
)

So, currently, I have no way to update metadata in any way without broke existing metadata.

Versions

latest version, python 3.12

Relevant log output

No response

@amaxcz amaxcz added the bug Something isn't working label Jul 14, 2024
@tazarov
Copy link
Contributor

tazarov commented Jul 15, 2024

hey @amaxcz, thanks for reporting this. We have an ongoing work that addresses this issue - #1637

Additionally, in 0.6.0 we'll split the HNSW index configuration from the collection metadata.

It is important to note that your HNSW config is not lost otherwise Chroma wouldn't be able to perform its basic operation - semantic search. The actual aka usable HNSW configuration is kept in a separate table in the system DB of Chroma.

As a workaround to the above, I'd suggest you use (temporarily) get_or_create_collection to update your metadata:

Using collection.modify():

import chromadb

client = chromadb.PersistentClient("test_metadata_update")
col = client.get_or_create_collection("test_metadata_update", metadata={"test": "test", "hnsw:space":"cosine"})

col.add(ids=["1"], documents=["document 1"])
print(col.metadata)
# {'test': 'test', 'hnsw:space': 'cosine'}
col.modify(metadata={"test": "test2"})
print(col.metadata) #metadata gets overriden and if you try to add hnsw:space you get the error above
# {'test': 'test2'} 

Using collection.get_or_create_collection():

import chromadb

client = chromadb.PersistentClient("test_metadata_update")
col = client.get_or_create_collection("test_metadata_update", metadata={"test": "test", "hnsw:space":"cosine"})

col.add(ids=["1"], documents=["document 1"])
print(col.metadata)

col = client.get_or_create_collection("test_metadata_update", metadata={"test": "new_value", "hnsw:space":"cosine"})
print(col.metadata)
# {'hnsw:space': 'ip', 'test': 'new_value'}

IMPORTANT: HNSW indeed cannot be changed and even if you enter the wrong space (e.g. hnsw:space = ip) it will not affect Chroma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants