-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Fixed test_collections.py property test #1716
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Moved the model update after conditional checks for new_name and metadata.
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
@atroyn PTAL. This should fix the failing test - https://github.com/chroma-core/chroma/actions/runs/7894637194/job/21545660175 |
HammadB
approved these changes
Feb 14, 2024
- Problem to solve: same collection names in multiple tenants/db
tazarov
force-pushed
the
bugfix/prop-test-coll
branch
from
February 15, 2024 01:25
10fda97
to
819685c
Compare
@HammadB, can you re-run the failed test? Seems like an intermittent failure. |
beggers
added a commit
that referenced
this pull request
Feb 24, 2024
## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - Fix test flakiness mentioned in #1716 and seen after. The problem was subtle: - There are three places Collection state is stored: in Hypothesis's Bundle (== test data ~randomly generated according to our models), in our test's `.model` field, and in Chroma itself. - Each test step gets a random Collection from the Bundle. Our test code is responsible for doing stuff with it, modifying the test model, executing operations on Chroma itself, and verifying that model state matches Chroma's state. - Hypothesis's Bundle has (but no longer will once this PR lands) a full Collection model with all internal bookeeping, incl UUID and other fields. So we could have two Collections in the Bundle with the same name but different UUIDs or other fields. In our test's model and in Chroma, there would only be one Collection. - The bug arose when we updated metadata on one Bundle collection, on our model, and in Chroma; then tried to read metadata from another Bundle collection with the same name but different metadata. The fix to this was to read expected collection metadata from our test model, not from the Bundle. I changed line 204 to `_metadata = self.model[coll.name]` instead of `_metadata = coll.metadata`. - To save people others this grief in the future, I created a new `ExternalCollection` model which only contains externally visible Collection data. This simplifies Bundle state for this test and should make it easier to reason about in the future. - I also added the previously failing test cases to protect us from regressions. Since the model is much simpler now they don't give us much, but I feel better knowing they're there and laying the pattern for us to add future test cases. ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python, `yarn test` for js ## Documentation Changes *Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?*
atroyn
pushed a commit
to csbasil/chroma
that referenced
this pull request
Apr 3, 2024
Needed to fix the failing property tests in chroma-core#1715 ## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - Moved the model update after conditional checks for new_name and metadata. - New functionality - ... ## Test plan *How are these changes tested?* - [ ] Tests pass locally with `pytest` for python, `yarn test` for js ## Documentation Changes Failure logs + Error analysis: ``` > assert c.metadata == self.model[coll.name] E AssertionError: assert {'g': 1.1, 'n...': 31734, ...} == {'3': 'd71IL'...235e-208, ...} E E Left contains 5 more items: E {'g': 1.1, E 'n1dUTalF-MY': -1000000.0, E 'ugXZ_hK': 5494, E 'xVW09xUpDZA': 31734, E 'y': 'G3EtXTZ'} E Right contains 9 more items: E {'3': 'd71IL', E '45227B': '65', E '7DjCkbusc-K': 'vc94', E '8-tD9nJd': 4.8728578364902235e-208, E 'Bpyj': -675165.8688164671, E 'Uy6KZu6abCD9Z': -72, E 'giC': -6.103515625e-05, E 'pO4': -0.0, E 'r3': -41479} E E Full diff: E { E + 'g': 1.1, E + 'n1dUTalF-MY': -1000000.0, E + 'ugXZ_hK': 5494, E + 'xVW09xUpDZA': 31734, E + 'y': 'G3EtXTZ', E - '3': 'd71IL', E - '45227B': '65', E - '7DjCkbusc-K': 'vc94', E - '8-tD9nJd': 4.8728578364902235e-208, E - 'Bpyj': -675165.8688164671, E - 'Uy6KZu6abCD9Z': -72, E - 'giC': -6.103515625e-05, E - 'pO4': -0.0, E - 'r3': -41479, E } E Falsifying example: E state = CollectionStateMachine() E state.initialize() E state.list_collections_with_limit_offset(limit=5, offset=0) E state.list_collections_with_limit_offset(limit=4, offset=5) E (v1,) = state.get_or_create_coll(coll=Collection(name='E60V1ekr9eDcL\n', id=UUID('4435abf2-9fc6-4d5a-bb7b-33177a956d44'), metadata={'_m5jalwo': -228}, dimension=1356, dtype=<class 'numpy.float64'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=False, has_embeddings=True, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181bb0590>), new_metadata={'k5o6Q': 'Op', E 'LP': -5.960464477539063e-08, E 'pzHdzczVCn': '81', E '7': False, E 'e4Lz': 999999.0, E '206': False}) E (v2,) = state.get_or_create_coll(coll=v1, new_metadata=None) E (v3,) = state.get_or_create_coll(coll=v1, new_metadata={'4OQN': -2097032423, E 'cW': -0.99999, E 'o6wq3': -147, E 'M8j3KBU': -2.2250738585072014e-308, E 'D8nZrA0': 252, E 'up4P_': 34761, E 'L_win': -6.103515625e-05, E '5kt': '_q', E 'UybO2dJF4': -0.3333333333333333, E 'NfQ83VsmI': 'Qpy', E 'fk': -1.192092896e-07, E 'J1ck': 'ozL'}) E (v4,) = state.get_or_create_coll(coll=Collection(name='nOeHg-OXVl', id=UUID('9c28b027-9f22-409c-b3fd-c5de03b60018'), metadata=None, dimension=1009, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=True, has_embeddings=True, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181bdfe50>), new_metadata={'p4isW': 'k8l', E 'k2tFn3v1E': True, E 'R': 'ji-2d5lDGV', E 'K5vdi': False, E 'TZs': False, E 'OgJ_DZ2j': False, E 'ovZjD3': -64297, E '9p': True, E '32f3nw8h2d54LPCzsV': 1733994327, E '4P': 2.896381722565434e-121}) E state.list_collections_with_limit_offset(limit=2, offset=0) E state.list_collections_with_limit_offset(limit=3, offset=0) E state.list_collections_with_limit_offset(limit=5, offset=5) E (v5,) = state.modify_coll(coll=v4, new_metadata=None, new_name=None) E (v6,) = state.get_or_create_coll(coll=Collection(name='A1w5m1l5I\n', id=UUID('606d59a6-6f66-456d-81ca-a8ea029c318c'), metadata={'3': '6Y'}, dimension=1544, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=False, has_embeddings=True, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d6d110>), new_metadata=None) E (v7,) = state.get_or_create_coll(coll=v4, new_metadata={'01316': -0.0, '14UwVu': 81, 'C9eMDDdnB0oy': False, 'n964': '0a'}) E state.modify_coll(coll=v7, new_metadata={}, new_name='B-5Z2m2j52121') E state.get_or_create_coll(coll=Collection(name='E31\n', id=UUID('e67426e8-8595-4916-92a6-b2777b52f157'), metadata={'0Kr5Wp': -769, '9xT': 143980.04500299558, '8': True}, dimension=1800, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=True, has_embeddings=True, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d6d6d0>), new_metadata={}) E state.list_collections_with_limit_offset(limit=2, offset=1) E state.list_collections_with_limit_offset(limit=2, offset=0) E state.list_collections_with_limit_offset(limit=1, offset=0) E state.list_collections_with_limit_offset(limit=1, offset=1) E (v8,) = state.get_or_create_coll(coll=Collection(name='A00\n', id=UUID('01522a4f-3383-4a58-8b18-0418e38e3ec6'), metadata=None, dimension=1032, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=False, has_embeddings=True, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d94bd0>), new_metadata=None) E (v9,) = state.get_or_create_coll(coll=v6, new_metadata=None) E state.list_collections_with_limit_offset(limit=3, offset=2) E (v10,) = state.modify_coll(coll=v3, new_metadata=None, new_name=None) E (v11,) = state.modify_coll(coll=v10, new_metadata=None, new_name=None) E state.modify_coll(coll=v9, new_metadata={}, new_name=None) E (v12,) = state.get_or_create_coll(coll=Collection(name='A10\n', id=UUID('01efb806-fffa-4ce6-b285-b9aae55f50af'), metadata={}, dimension=258, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=False, has_embeddings=True, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd183bbe5d0>), new_metadata=None) E state.modify_coll(coll=v11, new_metadata={}, new_name='A01011110\n') E state.list_collections_with_limit_offset(limit=3, offset=1) ------ Problem start here ------ E (v13,) = state.get_or_create_coll(coll=Collection(name='C1030', id=UUID('7858d028-1295-4769-96c1-e58bf242b7bd'), metadata={}, dimension=2, dtype=<class 'numpy.float32'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=False, has_embeddings=True, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181bbff10>), new_metadata=None) E (v14,) = state.get_or_create_coll(coll=Collection(name='A01200671\n', id=UUID('f77d01a4-e43f-4b17-9579-daadccad2f71'), metadata={'0': 'L', '01': -4}, dimension=1282, dtype=<class 'numpy.float32'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=True, has_embeddings=False, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d5a9d0>), new_metadata=None) E state.list_collections_with_limit_offset(limit=2, offset=1) E (v15,) = state.modify_coll(coll=v13, new_metadata={'0': '10', '40': '0', 'p1nviWeL7fO': 'qN', '7b': 'YS', 'VYWq4LEMWjCo': True}, new_name='OF5F0MzbQg\n') E (v16,) = state.get_or_create_coll(coll=Collection(name='VS0QGh', id=UUID('c6b85c1d-c3e9-4d37-b9ca-c4b4266193e9'), metadata={'h': 5.681951615025145e-227, 'A1': 61126, 'uhUhLEEMfeC_kN': 2147483647, 'weF': 'pSP', 'B3DSaP': False, '6H533K': 1.192092896e-07}, dimension=1915, dtype=<class 'numpy.float32'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=False, has_embeddings=True, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d202d0>), new_metadata={'xVW09xUpDZA': 31734, E 'g': 1.1, E 'n1dUTalF-MY': -1000000.0, E 'y': 'G3EtXTZ', E 'ugXZ_hK': 5494}) E state.list_collections_with_limit_offset(limit=4, offset=5) E state.modify_coll(coll=v16, new_metadata={'giC': -6.103515625e-05, E '45227B': '65', E 'Uy6KZu6abCD9Z': -72, E 'r3': -41479, E 'pO4': -0.0, E 'Bpyj': -675165.8688164671, E '8-tD9nJd': 4.8728578364902235e-208, E '7DjCkbusc-K': 'vc94', E '3': 'd71IL'}, new_name='OF5F0MzbQg\n') E state.list_collections_with_limit_offset(limit=4, offset=4) E (v17,) = state.modify_coll(coll=v15, new_metadata={'L35J2S': 'K0l026'}, new_name='Ai1\n') E (v18,) = state.get_or_create_coll(coll=v13, new_metadata=None) E state.list_collections_with_limit_offset(limit=3, offset=1) E (v19,) = state.modify_coll(coll=v14, new_metadata=None, new_name='F0K570\n') E (v20,) = state.get_or_create_coll(coll=Collection(name='Ad5m003\n', id=UUID('5e23b560-7f62-4f14-bf80-93f5ff4e906a'), metadata={'3M': 'q_'}, dimension=57, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=True, has_embeddings=False, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d5aad0>), new_metadata={'_000': 852410}) E (v21,) = state.get_or_create_coll(coll=v14, new_metadata=None) E state.list_collections_with_limit_offset(limit=4, offset=1) E (v22,) = state.modify_coll(coll=v21, new_metadata=None, new_name=None) E (v23,) = state.modify_coll(coll=v22, new_metadata=None, new_name=None) E state.list_collections_with_limit_offset(limit=1, offset=1) E state.get_or_create_coll(coll=Collection(name='VS0QGh', id=UUID('ca92837d-3425-436c-bf11-dba969f0f8c7'), metadata=None, dimension=326, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=True, has_embeddings=False, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d6f4d0>), new_metadata=None) E state.teardown() ``` The problem starts in v13 where we create a new collection named `C1030` In v15 we modify the collection `C1030` and rename it to `OF5F0MzbQg\n` In v16 we create a new collection named `VS0QGh` We try to modify the collection `VS0QGh` and rename it to `OF5F0MzbQg\n` which is the same name as the collection `C1030` which is fails in the and we return empty from the rule. However we have already updated the model: ```python if new_metadata is not None: if len(new_metadata) == 0: with pytest.raises(Exception): c = self.api.get_or_create_collection( name=coll.name, metadata=new_metadata, embedding_function=coll.embedding_function, ) return multiple() coll.metadata = new_metadata self.set_model(coll.name, coll.metadata) # <--- here we update the metadata if new_name is not None: if new_name in self.model and new_name != coll.name: with pytest.raises(Exception): # <--- fail here to rename the collection to `OF5F0MzbQg\n` c.modify(metadata=new_metadata, name=new_name) return multiple() prev_metadata = self.model[coll.name] self.delete_from_model(coll.name) self.set_model(new_name, prev_metadata) coll.name = new_name ``` then in `E state.get_or_create_coll(coll=Collection(name='VS0QGh', id=UUID('ca92837d-3425-436c-bf11-dba969f0f8c7'), metadata=None, dimension=326, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=True, has_embeddings=False, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d6f4d0>), new_metadata=None)` We try to create or get collection `VS0QGh` which exists in API and in state. Metadata and new metadata are None so we fall into case 0. Existing collection with old metadata and but we take the metadata from model which has been updated after the failure above. So we have API version of the metadata and partly updated model metadata, which causes the failure.
atroyn
pushed a commit
to csbasil/chroma
that referenced
this pull request
Apr 3, 2024
## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - Fix test flakiness mentioned in chroma-core#1716 and seen after. The problem was subtle: - There are three places Collection state is stored: in Hypothesis's Bundle (== test data ~randomly generated according to our models), in our test's `.model` field, and in Chroma itself. - Each test step gets a random Collection from the Bundle. Our test code is responsible for doing stuff with it, modifying the test model, executing operations on Chroma itself, and verifying that model state matches Chroma's state. - Hypothesis's Bundle has (but no longer will once this PR lands) a full Collection model with all internal bookeeping, incl UUID and other fields. So we could have two Collections in the Bundle with the same name but different UUIDs or other fields. In our test's model and in Chroma, there would only be one Collection. - The bug arose when we updated metadata on one Bundle collection, on our model, and in Chroma; then tried to read metadata from another Bundle collection with the same name but different metadata. The fix to this was to read expected collection metadata from our test model, not from the Bundle. I changed line 204 to `_metadata = self.model[coll.name]` instead of `_metadata = coll.metadata`. - To save people others this grief in the future, I created a new `ExternalCollection` model which only contains externally visible Collection data. This simplifies Bundle state for this test and should make it easier to reason about in the future. - I also added the previously failing test cases to protect us from regressions. Since the model is much simpler now they don't give us much, but I feel better knowing they're there and laying the pattern for us to add future test cases. ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python, `yarn test` for js ## Documentation Changes *Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?*
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Needed to fix the failing property tests in #1715
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for jsDocumentation Changes
Failure logs + Error analysis:
The problem starts in v13 where we create a new collection named
C1030
In v15 we modify the collection
C1030
and rename it toOF5F0MzbQg\n
In v16 we create a new collection named
VS0QGh
We try to modify the collection
VS0QGh
and rename it toOF5F0MzbQg\n
which is the same name as the collectionC1030
which is fails in the and we return empty from the rule. However we have already updated the model:then in
E state.get_or_create_coll(coll=Collection(name='VS0QGh', id=UUID('ca92837d-3425-436c-bf11-dba969f0f8c7'), metadata=None, dimension=326, dtype=<class 'numpy.float16'>, topic='topic', known_metadata_keys={}, known_document_keywords=[], has_documents=True, has_embeddings=False, embedding_function=<chromadb.test.property.strategies.hashing_embedding_function object at 0x7fd181d6f4d0>), new_metadata=None)
We try to create or get collection
VS0QGh
which exists in API and in state. Metadata and new metadata are None so we fall into case 0. Existing collection with old metadata and but we take the metadata from model which has been updated after the failure above.So we have API version of the metadata and partly updated model metadata, which causes the failure.