Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Update][Ease of Use] Fix Types #2292

Open
atroyn opened this issue Jun 5, 2024 · 1 comment
Open

[Update][Ease of Use] Fix Types #2292

atroyn opened this issue Jun 5, 2024 · 1 comment
Assignees
Labels
by-chroma Local Chroma An improvement to Local (single node) Chroma

Comments

@atroyn
Copy link
Contributor

atroyn commented Jun 5, 2024

Fix Types

Types are currently broken in many places throughout the project. Some of these, like CollectionMetadata have been broken for over a year. A good type system improves code correctness and hence development velocity.

We must have 𝔬𝔯𝔡𝔢𝔯.

API Design

N/A

[Complexity] Subtask

@atroyn atroyn added the Local Chroma An improvement to Local (single node) Chroma label Jun 5, 2024
@atroyn atroyn added this to the Local Chroma v.0.6 milestone Jun 5, 2024
@atroyn atroyn self-assigned this Jun 19, 2024
HammadB added a commit that referenced this issue Jun 26, 2024
## Description of changes

*Summarize the changes made by this PR.*
The primary intent of this PR is to remove the is_metadata_valid
invariant which was a workaround for our metadata strategy generating
faulty metadata and then us special casing all uses of the record set
strategy to handle invalid generations. This PR patches the metadata
generation to not generate invalid metadata.

- Adds modes in test_add to add a medium sized record set. This was
initially timing out in hypothesis's generation. Hypothesis bounds the
buffer size of the bytes it uses to do random generation, so generating
larger metadata was resulting in examples being marked at OVERRUN by
conjecture (gleaned from issues like
HypothesisWorks/hypothesis#3999 + reading
hypothesis code + stepping through it). This PR adds the ability to
generate N fixed metadata entries and uniformly distribute them over the
record set, reducing the overall entropy.

- Fixes a bug that test_embeddings was not handling None as a possible
metadata state, since this state was never generated. Added an explicit
test for this.

- Fixes a bug in the reference filtering implementation in
test_filtering that did not handle None metadata since that state was
never generated.


This PR is forced to touch types related to metadata, which are
incorrect and cause typing errors. I ignored the errors to minimize the
surface area of this change and defer those changes to the pass
mentioned in #2292.

## Test plan
*How are these changes tested?*
These changes are covered by existing tests, and 
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
No external changes required.
Anush008 pushed a commit to Anush008/chroma that referenced this issue Jun 27, 2024
## Description of changes

*Summarize the changes made by this PR.*
The primary intent of this PR is to remove the is_metadata_valid
invariant which was a workaround for our metadata strategy generating
faulty metadata and then us special casing all uses of the record set
strategy to handle invalid generations. This PR patches the metadata
generation to not generate invalid metadata.

- Adds modes in test_add to add a medium sized record set. This was
initially timing out in hypothesis's generation. Hypothesis bounds the
buffer size of the bytes it uses to do random generation, so generating
larger metadata was resulting in examples being marked at OVERRUN by
conjecture (gleaned from issues like
HypothesisWorks/hypothesis#3999 + reading
hypothesis code + stepping through it). This PR adds the ability to
generate N fixed metadata entries and uniformly distribute them over the
record set, reducing the overall entropy.

- Fixes a bug that test_embeddings was not handling None as a possible
metadata state, since this state was never generated. Added an explicit
test for this.

- Fixes a bug in the reference filtering implementation in
test_filtering that did not handle None metadata since that state was
never generated.


This PR is forced to touch types related to metadata, which are
incorrect and cause typing errors. I ignored the errors to minimize the
surface area of this change and defer those changes to the pass
mentioned in chroma-core#2292.

## Test plan
*How are these changes tested?*
These changes are covered by existing tests, and 
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
No external changes required.
@atroyn
Copy link
Contributor Author

atroyn commented Jun 28, 2024

We also alias chromadb.types.Collection as CollectionModel, because we now use it as an interface. We still have chromadb.api.Collection. We should rename the former to disambiguate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
by-chroma Local Chroma An improvement to Local (single node) Chroma
Projects
None yet
Development

No branches or pull requests

2 participants