You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the doc, adding a document with an existing id should raise an ValueError.
Somehow, the ValueError is not raised when running the following commands
importchromadbc=chromadb.Client()
cc=c.get_or_create_collection("test")
cc.add(ids=['id1'], documents=['some document'], embeddings=[[1,2,3]])
cc.get()
cc.add(ids=['id1'], documents=['updated document'], embeddings=[[2,3,4]]) # this prints a warning instead of throwing an errorcc.get()
The above code only prints a warning message instead of throwing the actual error
In [14]: cc.add(ids=['id1'], documents=['updated document'], embeddings=[[2,3,4]])
Insert of existing embedding ID: id1
Add of existing embedding ID: id1
If this is expected, then maybe the doc should be modified to reflect that.
Versions
chroma -- 0.4.8
python -- 3.8
os -- ubuntu 20.04
Relevant log output
No response
The text was updated successfully, but these errors were encountered:
Thank you for providing a detailed report. I've been able to reproduce the issue based on your description.
According to the documentation, it is expected that inserting a document with an existing ID should raise a ValueError. However, based on the code you've provided, it seems that only a warning is printed instead.
Upon investigation, I have identified two potential solutions:
Implementation Change: The implementation of the seen and dup sets in the validate_ids function located at chromadb/api/types.py needs revision. Currently, these sets become empty every time the function is called, resulting in the behaviour you've observed. A change to persist these sets between function calls or an alternative method to track duplicates should address the issue.
Documentation Update and Clarity:
import chromadb
client = chromadb.Client()
collection = client.create_collection("sample_collection")
collection.add(
documents=["This is document1", "This is document2"],
metadatas=[{"source": "notion"}, {"source": "google-docs"}],
ids=["doc1", "doc1"]
)
When you try running this code, it does give a DuplicateIDError. So the documentation needs to be updated accordingly.
I've submitted a pull request for the doc update chroma-core/docs#130
I'd like to ask whether the change should be done in validate_ids()?
This function seems to be correctly identifying and rejecting duplicate id values in a single call. This looks like the correct implementation when it's called directly from get() and delete() as well as when called indirectly from add().
Code changes to prevent adding the same id in separate calls to add() should likely not interfere with this area?
What happened?
According to the doc, adding a document with an existing id should raise an
ValueError
.Somehow, the
ValueError
is not raised when running the following commandsThe above code only prints a warning message instead of throwing the actual error
If this is expected, then maybe the doc should be modified to reflect that.
Versions
chroma -- 0.4.8
python -- 3.8
os -- ubuntu 20.04
Relevant log output
No response
The text was updated successfully, but these errors were encountered: