New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/mongodb): support disabling schemaSamplingSize #9295
Conversation
@@ -100,7 +100,7 @@ class MongoDBConfig( | |||
enableSchemaInference: bool = Field( | |||
default=True, description="Whether to infer schemas. " | |||
) | |||
schemaSamplingSize: Optional[PositiveInt] = Field( | |||
schemaSamplingSize: Optional[NonNegativeInt] = Field( | |||
default=1000, | |||
description="Number of documents to use when inferring schema size. If set to `0`, all documents will be scanned.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the current behavior will already do the right thing if schemaSamplingSize is set to null. I might be missing something, but it seems like it'd make more sense to simply fix the docs here and leave the actual code unchanged
description="Number of documents to use when inferring schema size. If set to `0`, all documents will be scanned.", | |
description="Number of documents to use when inferring schema size. If set to `null`, all documents will be scanned.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a small update on the linked issue on my first message
As far as I have reached, I think that it's imposible to force a full mongodb collection scan, because the yaml parser + validator forces to provide a value greater than 0, but internal code checks for null
I'm gonna try to explain my thoughts on detail:
schemaSamplingSize: Optional[PositiveInt]
does allownull
value, but not0
value so I changed it toNonNegativeInt
schemaSamplingSize
is also asigned to a Pydantic Field with the following default ->default=1000,
. So, if no value is provided, this will have that default value- You can't explicit provide a
null
ornone
value trough the yaml parser (as i also reflect now on the issue)
So, if I need to provide a value to the parser and it can't be null, providing a 0 value could be a good option.
@diegoreico I've made a small tweak to your code - let me know what you think Setting We actually had both in our documentation, so my hope is this makes it consistent and avoids having a "magic value" of 0. |
fix: allow mongodb source connector to have a 0 input for field schemaSamplingSize
Fixes #9287
Checklist