Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix schema inference in case of empty messages in Protobuf/CapnProto formats #39357

Merged
merged 11 commits into from
Jul 26, 2022

Conversation

Avogar
Copy link
Member

@Avogar Avogar commented Jul 18, 2022

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix bug in schema inference in case of empty messages in Protobuf/CapnProto formats that allowed to create column with empty Tuple type. Closes #39051
Add 2 new settings input_format_{protobuf/capnproto}_skip_fields_with_unsupported_types_in_schema_inference that allow to skip fields with unsupported types while schema inference for Protobuf and CapnProto formats

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-ch-test-poll robot-ch-test-poll added the pr-bugfix Pull request with bugfix, not backported by default label Jul 18, 2022
@rschu1ze rschu1ze self-assigned this Jul 19, 2022
@ekpdt
Copy link

ekpdt commented Jul 19, 2022

Our current workaround for this bug is to use schema inference to get a preliminary schema on our (extremely) complex Protobuf messages and then edit the inferred schema to remove illegal empty tuples.

This patch would make it impossible for us to use schema inference at all, which is unfortunate.

Would you consider allowing some way to DESCRIBE a protobuf-based schema via the File engine but prohibit creating any actual tables? I admit this seems like a weird solution because CH will still create a schema that is illegal.

If you integrate this patch, we will probably pre-process our .proto files to remove empty messages.

@Avogar
Copy link
Member Author

Avogar commented Jul 19, 2022

Our current workaround for this bug is to use schema inference to get a preliminary schema on our (extremely) complex Protobuf messages and then edit the inferred schema to remove illegal empty tuples.

This patch would make it impossible for us to use schema inference at all, which is unfortunate.

Would you consider allowing some way to DESCRIBE a protobuf-based schema via the File engine but prohibit creating any actual tables? I admit this seems like a weird solution because CH will still create a schema that is illegal.

If you integrate this patch, we will probably pre-process our .proto files to remove empty messages.

What if I add special setting like input_format_protobuf_skip_columns_with_unsupported_types_in_schema_inference to silently skip bad messages (like empty tuples) instead of throwing an exception? Will it help?

Like, if we see a field that is an empty message and this setting is enabled, we will just skip this field while schema inference

@ekpdt
Copy link

ekpdt commented Jul 19, 2022

What if I add special setting like input_format_protobuf_skip_columns_with_unsupported_types_in_schema_inference to silently skip bad messages (like empty tuples) instead of throwing an exception? Will it help?

That would be perfect, thanks.

src/Core/Settings.h Outdated Show resolved Hide resolved
@Avogar Avogar merged commit c362551 into ClickHouse:master Jul 26, 2022
@tavplubix
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix Pull request with bugfix, not backported by default
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Protobuf schema inference can make an illegal table
5 participants