fix: reject nullable map keys in schema parsing instead of silently overriding#226
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates schema parsing to fail fast when a MAP type defines a nullable key, aligning paimon-cpp behavior with Apache Arrow’s requirement that map keys are non-nullable.
Changes:
- Enforce non-nullable map keys in
DataTypeJsonParser::ParseMapTypeby returning an error if the parsed key field is nullable. - Update test data schema fixtures to explicitly mark map keys as
NOT NULL. - Add/adjust unit tests and documentation to reflect the new constraint.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/paimon/common/types/data_type_json_parser.cpp |
Reject nullable map keys during JSON schema parsing with a clear validation error. |
src/paimon/common/types/data_type_json_parser_test.cpp |
Updates MAP parsing success test to use NOT NULL key. |
src/paimon/core/schema/table_schema_test.cpp |
Adds a regression test asserting schema creation fails when a MAP key is nullable. |
docs/source/user_guide/schema.rst |
Documents the MAP key non-null limitation and provides valid/invalid examples. |
docs/source/user_guide/data_types.rst |
Updates MAP type description and adds a note about NOT NULL keys in paimon-cpp. |
test/test_data/parquet/pk_table_nested_type.db/pk_table_nested_type/schema/schema-0 |
Marks MAP key as NOT NULL in parquet fixture schema. |
test/test_data/parquet/parquet_append_table.db/parquet_append_table/schema/schema-0 |
Marks complex (ARRAY) MAP key type as NOT NULL in parquet fixture schema. |
test/test_data/parquet/append_complex_build_in_fieldid.db/append_complex_build_in_fieldid/schema/schema-0 |
Marks MAP key as NOT NULL in parquet fixture schema. |
test/test_data/orc/pk_table_nested_type.db/pk_table_nested_type/schema/schema-0 |
Marks MAP key as NOT NULL in orc fixture schema. |
test/test_data/orc/append_table_with_nested_type.db/append_table_with_nested_type/schema/schema-0 |
Marks complex (ROW) MAP key type as NOT NULL in orc fixture schema. |
test/test_data/orc/append_complex_build_in_fieldid.db/append_complex_build_in_fieldid/schema/schema-0 |
Marks MAP key as NOT NULL in orc fixture schema. |
test/test_data/avro/pk_with_multiple_type.db/pk_with_multiple_type/schema/schema-0 |
Marks MAP key as NOT NULL in avro fixture schema. |
test/test_data/avro/append_with_multiple_map.db/append_with_multiple_map/schema/schema-0 |
Marks MAP keys (including nested MAP key) as NOT NULL in avro fixture schema. |
test/test_data/avro/append_simple.db/append_simple/schema/schema-0 |
Marks MAP key as NOT NULL in avro fixture schema. |
test/test_data/avro/append_multiple.db/append_multiple/schema/schema-0 |
Marks MAP key as NOT NULL in avro fixture schema. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
lszskye
approved these changes
Apr 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
No Linked issue.
Previously,
DataTypeJsonParser::ParseMapTypesilently forced map keys tonullable=falseregardless of the schema definition. This hid potential data inconsistencies — if a schema defined a nullable map key, the parser would quietly change the semantics without any warning. And read process may fail when read map field with null key.This PR changes the behavior to fail fast: if a MAP key is not explicitly marked as
NOT NULLin the schema, parsing returns an error. This aligns with Apache Arrow's constraint that map keys must be non-nullable, and ensures schema authors are aware of this requirement upfront.Tests
TableSchemaTest.MapKeyMustBeNotNull
API and Format
Schemas that previously relied on the silent override will now receive a clear error message guiding them to add
NOT NULL.Documentation
schema and data_types
Generative AI tooling
Generated-by: Aone Copilot (Claude claude4.6)