Skip to content

fix: reject nullable map keys in schema parsing instead of silently overriding#226

Merged
lxy-9602 merged 4 commits into
alibaba:mainfrom
lxy-9602:map-key-not-null
Apr 14, 2026
Merged

fix: reject nullable map keys in schema parsing instead of silently overriding#226
lxy-9602 merged 4 commits into
alibaba:mainfrom
lxy-9602:map-key-not-null

Conversation

@lxy-9602
Copy link
Copy Markdown
Collaborator

Purpose

No Linked issue.

Previously, DataTypeJsonParser::ParseMapType silently forced map keys to nullable=false regardless of the schema definition. This hid potential data inconsistencies — if a schema defined a nullable map key, the parser would quietly change the semantics without any warning. And read process may fail when read map field with null key.

This PR changes the behavior to fail fast: if a MAP key is not explicitly marked as NOT NULL in the schema, parsing returns an error. This aligns with Apache Arrow's constraint that map keys must be non-nullable, and ensures schema authors are aware of this requirement upfront.

Tests

TableSchemaTest.MapKeyMustBeNotNull

API and Format

Schemas that previously relied on the silent override will now receive a clear error message guiding them to add NOT NULL.

Documentation

schema and data_types

Generative AI tooling

Generated-by: Aone Copilot (Claude claude4.6)

@lxy-9602 lxy-9602 changed the title fix: Reject nullable map keys in schema parsing instead of silently overriding fix: reject nullable map keys in schema parsing instead of silently overriding Apr 13, 2026
@lucasfang lucasfang requested a review from Copilot April 14, 2026 01:00
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates schema parsing to fail fast when a MAP type defines a nullable key, aligning paimon-cpp behavior with Apache Arrow’s requirement that map keys are non-nullable.

Changes:

  • Enforce non-nullable map keys in DataTypeJsonParser::ParseMapType by returning an error if the parsed key field is nullable.
  • Update test data schema fixtures to explicitly mark map keys as NOT NULL.
  • Add/adjust unit tests and documentation to reflect the new constraint.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/paimon/common/types/data_type_json_parser.cpp Reject nullable map keys during JSON schema parsing with a clear validation error.
src/paimon/common/types/data_type_json_parser_test.cpp Updates MAP parsing success test to use NOT NULL key.
src/paimon/core/schema/table_schema_test.cpp Adds a regression test asserting schema creation fails when a MAP key is nullable.
docs/source/user_guide/schema.rst Documents the MAP key non-null limitation and provides valid/invalid examples.
docs/source/user_guide/data_types.rst Updates MAP type description and adds a note about NOT NULL keys in paimon-cpp.
test/test_data/parquet/pk_table_nested_type.db/pk_table_nested_type/schema/schema-0 Marks MAP key as NOT NULL in parquet fixture schema.
test/test_data/parquet/parquet_append_table.db/parquet_append_table/schema/schema-0 Marks complex (ARRAY) MAP key type as NOT NULL in parquet fixture schema.
test/test_data/parquet/append_complex_build_in_fieldid.db/append_complex_build_in_fieldid/schema/schema-0 Marks MAP key as NOT NULL in parquet fixture schema.
test/test_data/orc/pk_table_nested_type.db/pk_table_nested_type/schema/schema-0 Marks MAP key as NOT NULL in orc fixture schema.
test/test_data/orc/append_table_with_nested_type.db/append_table_with_nested_type/schema/schema-0 Marks complex (ROW) MAP key type as NOT NULL in orc fixture schema.
test/test_data/orc/append_complex_build_in_fieldid.db/append_complex_build_in_fieldid/schema/schema-0 Marks MAP key as NOT NULL in orc fixture schema.
test/test_data/avro/pk_with_multiple_type.db/pk_with_multiple_type/schema/schema-0 Marks MAP key as NOT NULL in avro fixture schema.
test/test_data/avro/append_with_multiple_map.db/append_with_multiple_map/schema/schema-0 Marks MAP keys (including nested MAP key) as NOT NULL in avro fixture schema.
test/test_data/avro/append_simple.db/append_simple/schema/schema-0 Marks MAP key as NOT NULL in avro fixture schema.
test/test_data/avro/append_multiple.db/append_multiple/schema/schema-0 Marks MAP key as NOT NULL in avro fixture schema.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/common/types/data_type_json_parser.cpp Outdated
@lxy-9602 lxy-9602 merged commit b69fd7b into alibaba:main Apr 14, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants