[hive] Fix insert into static partitions on managed Paimon tables by ArnavBalyan · Pull Request #7824 · apache/paimon

ArnavBalyan · 2026-05-12T05:13:00Z

Purpose

Insert into static partition against a Hive managed table fails with ArrayIndexOutOfBoundsException in TableWriteImpl.checkNullability.
This is because Hive sends data rows without the static partition columns and instead passes their values in path/partition_columns serde property.
PaimonOutputFormat.getHiveRecordWriter does not respect these properties, and row arrives incomplete.
Read the partition column names, parse the values and wrap the writer which is aware of the extra properies and can construct the row properly.
Closes user reported bug [Bug] When executing an insert operation on a Hive partitioned table, an array out-of-bounds error occurs #7064.

Tests

UT

ArnavBalyan · 2026-05-12T05:13:07Z

cc @JingsongLi thanks!

ArnavBalyan · 2026-05-16T12:23:43Z

cc @JingsongLi gentle reminder if you could PTAL thanks!

ArnavBalyan · 2026-05-22T07:04:56Z

cc @leaves12138 @JingsongLi gentle reminder thanks ! :)

JingsongLi

Review: [hive] Fix insert into static partitions on managed Paimon tables

Overall this is a well-structured fix for a legitimate bug (#7064). The approach of detecting static partition columns via serde properties and wrapping the writer to reconstruct full rows is sound. A few observations:

Correctness

Partition column ordering assumption: PartitionedRecordWriter uses JoinedRow(source, partitionRow) which assumes partition columns are always appended after data columns. This is correct for Hive-managed tables (Hive always places PARTITIONED BY columns at the end of the schema), but if a table were originally created via Flink/Spark with partition columns in the middle and later accessed from Hive, this could silently produce incorrect rows. Consider adding a defensive check that validates the partition column indices are actually at the tail of the schema in buildStaticPartitionRow.
Mixed static/dynamic partitions: If a user writes PARTITION (region='us', year) (one static, one dynamic), META_TABLE_PARTITION_COLUMNS will list both columns, but the path will only contain region=us. The lookupCaseInsensitive for year returns null, causing buildStaticPartitionRow to return null, which falls through to the unwrapped writer. That writer will then also fail because Hive still strips the static partition column from the data. This is not a regression (it was already broken), but it might be worth a code comment noting this limitation.
Separator in META_TABLE_PARTITION_COLUMNS: The split uses org.apache.paimon.fs.Path.SEPARATOR which is "/". This matches the Hive convention for this property. Correct, but somewhat fragile -- a named constant or comment clarifying why / is the right delimiter here would improve readability.

Design

Direct access to inner.batchTableWrite().write(toWrite): PartitionedRecordWriter bypasses PaimonRecordWriter.write() and calls the underlying BatchTableWrite directly. Currently this is fine since PaimonRecordWriter.write() does nothing beyond unwrapping RowDataContainer and forwarding. However, this creates tight coupling -- if PaimonRecordWriter.write() later gains additional logic (metrics, validation, etc.), PartitionedRecordWriter would silently miss it. A safer pattern would be to accept an InternalRow transformer in PaimonRecordWriter, or at minimum add a package-private writeRow(InternalRow) method that both paths invoke.
forWriteOnly extraction: Nice refactoring to separate the WRITE_ONLY option application from writer creation. Clean.

Minor

The exception wrapping differs: PaimonRecordWriter.write() throws RuntimeException on write failure, while PartitionedRecordWriter.write() wraps in IOException. The inconsistency is minor but could confuse error handling upstream.
Test coverage is good -- unit tests for buildStaticPartitionRow cover the key scenarios (typed values, unpartitioned tables, missing path segments), and the integration test validates end-to-end correctness.

Summary

The fix correctly addresses the reported bug for the common case (fully static partition inserts on Hive-managed tables). The main suggestion is to add a defensive assertion that partition columns are indeed at the schema tail, and to reduce the coupling between PartitionedRecordWriter and PaimonRecordWriter's internals.

JingsongLi

-1 It seems that there are really issues with the testing

ArnavBalyan · 2026-05-24T00:29:38Z

Thanks for the review! Have addressed all comments

1. Partition column ordering assumption
Added a check that throws a clear error when partition columns aren't the trailing columns of the schema. Covered by a new unit test.

2. Mixed static/dynamic partitions
Documented why we fall back to the unwrapped writer when a partition key has no value in the path, and flags this as a known unsupported case.

3. Separator in the partition columns property
Extracted the forward slash into a constant with a comment explaining it's the Hive metastore convention, not the OS path separator.

4. Direct access to the underlying batch write
Added a write method on the inner writer. Both the wrapped and unwrapped paths now go through this, so any future logic added to the write path won't be silently bypassed.

5. Inconsistent exception wrapping
Aligned both write paths to throw the same exception type, so error handling upstream is consistent.

update

13980ce

JingsongLi reviewed May 23, 2026

View reviewed changes

JingsongLi requested changes May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hive] Fix insert into static partitions on managed Paimon tables#7824

[hive] Fix insert into static partitions on managed Paimon tables#7824
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/hive-static-partition-insert

ArnavBalyan commented May 12, 2026

Uh oh!

ArnavBalyan commented May 12, 2026

Uh oh!

ArnavBalyan commented May 16, 2026

Uh oh!

ArnavBalyan commented May 22, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

JingsongLi left a comment

Uh oh!

ArnavBalyan commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArnavBalyan commented May 12, 2026

Purpose

Tests

Uh oh!

ArnavBalyan commented May 12, 2026

Uh oh!

ArnavBalyan commented May 16, 2026

Uh oh!

ArnavBalyan commented May 22, 2026

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Review: [hive] Fix insert into static partitions on managed Paimon tables

Correctness

Design

Minor

Summary

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

ArnavBalyan commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants