Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TST] Property Test Generation Fixes #2383

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

HammadB
Copy link
Collaborator

@HammadB HammadB commented Jun 19, 2024

Description of changes

Summarize the changes made by this PR.
The primary intent of this PR is to remove the is_metadata_valid invariant which was a workaround for our metadata strategy generating faulty metadata and then us special casing all uses of the record set strategy to handle invalid generations. This PR patches the metadata generation to not generate invalid metadata.

  • Adds modes in test_add to add a medium sized record set. This was initially timing out in hypothesis's generation. Hypothesis bounds the buffer size of the bytes it uses to do random generation, so generating larger metadata was resulting in examples being marked at OVERRUN by conjecture (gleaned from issues like Tests fail with StopTest (OVERRUN) when generating a random integer (strategies.randoms) HypothesisWorks/hypothesis#3999 + reading hypothesis code + stepping through it). This PR adds the ability to generate N fixed metadata entries and uniformly distribute them over the record set, reducing the overall entropy.

  • Fixes a bug that test_embeddings was not handling None as a possible metadata state, since this state was never generated. Added an explicit test for this.

  • Fixes a bug in the reference filtering implementation in test_filtering that did not handle None metadata since that state was never generated.

This PR is forced to touch types related to metadata, which are incorrect and cause typing errors. I ignored the errors to minimize the surface area of this change and defer those changes to the pass mentioned in #2292.

Test plan

How are these changes tested?
These changes are covered by existing tests, and

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

No external changes required.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Collaborator Author

HammadB commented Jun 19, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @HammadB and the rest of your teammates on Graphite Graphite

HammadB and others added 5 commits June 19, 2024 12:57
)

Closes #2377 #2379

## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Making dimension and version lookup optional in the Collection model
creation in fastapi client

## Test plan
*How are these changes tested?*

- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
N/A
imaffe and others added 7 commits June 20, 2024 08:34
## Description of changes

Fix a typo in comment section in chromadb/db/system.py

```

"""
Create a new collection any (-> and) associated resources
        in the SysDB.

"""
```

## Test plan

Do not need test

## Documentation Change

Not public facing API documentation change.
@HammadB HammadB changed the title [TST] Make metadata strategy return valid metadata and remove invariant in favor of point test [TST] Property Test Generation Fixes Jun 20, 2024
@HammadB HammadB marked this pull request as ready for review June 20, 2024 20:49
metadatas = []
for i in range(len(ids)):
metadatas.append(generated_metadatas[i % len(generated_metadatas)])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

Copy link
Contributor

@atroyn atroyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me.

@@ -75,6 +118,8 @@ def test_add(
)


# Hypothesis struggles to generate large record sets so we explicitly create
# a large record set
def create_large_recordset(
Copy link
Contributor

@atroyn atroyn Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like we could add some more randomization in here. For example, all embeddings are the same - this is guaranteed to produce a bad HNSW graph. Unrelated to the focus of this PR However.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I tried to replace this with hypothesis but still need to do some munging, I cut myself a task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants