Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bson builder should handle schema flexibly #2334

Merged
merged 12 commits into from
Jan 2, 2024
Merged

Conversation

tychoish
Copy link
Collaborator

The RecordStructBuilder for bson previously assumed (as is the case
with our MongoDB implementation) that it would never see bson
documents with fields that didn't appear in the schema/projection and
would error otherwise.

While it does mean that the schema inference (which for MongoDB is
done somewhat probabalistically) controls the projection, it does mean
that you will never see documents that have fields that aren't in the
schema.

For handling the pure-BSON document streams, it does mean that it
would be very easy to see a document that has fields that aren't in
the schema. This increases with #2333, but could easily happen
otherwise.

@tychoish tychoish requested a review from scsmithr January 1, 2024 18:12
@scsmithr
Copy link
Member

scsmithr commented Jan 1, 2024

While it does mean that the schema inference (which for MongoDB is
done somewhat probabalistically) controls the projection, it does mean
that you will never see documents that have fields that aren't in the
schema.

Is this saying that the current implementation in main would skip bson documents that have unexpected schemas? And the fix here is we can now sanely handle those documents? What happens to the extra columns/fields? Are those just dropped?

Also would we be able to throw together a quick test for this?

Co-authored-by: Sean Smith <scsmithr@gmail.com>
@tychoish
Copy link
Collaborator Author

tychoish commented Jan 2, 2024

Is this saying that the current implementation in main would skip bson documents that have unexpected schemas? And the fix here is we can now sanely handle those documents? What happens to the extra columns/fields? Are those just dropped?

I believe that the current mongodb implementation doesn't have this problem because we project out the fields that we don't expect to see, so that we're never in a case where a document would have a field we don't expect (caveat: nested documents/arrays are an edge case that I'd want to think about.)

The failure mode, is that the query will fail, because we'll be building a result set, see a field we don't expect to see and return an error, which could happen with bson-files.

For MongoDB if the schema inference does not pick up a field, the query will remove that field from the result set, so we are already dropping data silently (not great, but the solution is to not infer schema, (addressable with #2333),) so this just brings the implementations on par with each other.

Also would we be able to throw together a quick test for this?

So the bug, (which to be fair, isn't released), requires the schema inference to be "wrong," (or incomplete) relative to our expectations. Building a test case that has that would be pretty fragile (all of the new code is hit by the existing test, so I'm not worried about this being worse than baseline.)

@universalmind303
Copy link
Contributor

can we add some tests for this.

tests/tests/dupes.bson Outdated Show resolved Hide resolved
tests/tests/scripts/make_bad_bson.go Outdated Show resolved Hide resolved
@tychoish
Copy link
Collaborator Author

tychoish commented Jan 2, 2024

In any case, I wrote the duplicate fields always picks the first one test as a rust unittest.

@tychoish tychoish requested a review from scsmithr January 2, 2024 21:43
@tychoish tychoish enabled auto-merge (squash) January 2, 2024 23:29
@tychoish tychoish merged commit f1b0523 into main Jan 2, 2024
13 checks passed
@tychoish tychoish deleted the tycho/bson-builder branch January 2, 2024 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants