Skip to content

branch-4.1: [feature](iceberg) Support static partition overwrite for Iceberg tables #58396 #59951#61372

Merged
morningman merged 4 commits intoapache:branch-4.1from
suxiaogang223:codex/pick-58396-59951-branch-4.1
Mar 16, 2026
Merged

branch-4.1: [feature](iceberg) Support static partition overwrite for Iceberg tables #58396 #59951#61372
morningman merged 4 commits intoapache:branch-4.1from
suxiaogang223:codex/pick-58396-59951-branch-4.1

Conversation

@suxiaogang223
Copy link
Contributor

Cherry-pick #58396 #59951 to branch-4.1

What problem does this PR solve?

Support static partition overwrite for Iceberg tables on branch-4.1, including the follow-up fix for INSERT OVERWRITE ... PARTITION (...) VALUES ... column count validation.

Cherry-pick commit

…les (apache#58396)

### What problem does this PR solve?

### Proposed changes

This PR implements static partition overwrite functionality for Iceberg
external tables, allowing users to precisely overwrite specific
partitions using the `INSERT OVERWRITE ... PARTITION (col='value', ...)`
syntax.

### Background

Before this PR, Doris supports:
- ✅ `INSERT INTO` with dynamic partition for Iceberg tables
- ✅ `INSERT OVERWRITE` for full table replacement
- ❌ `INSERT OVERWRITE ... PARTITION (...)` for static partition
overwrite

### New Features

1. **Full Static Partition Mode**: Overwrite a specific partition when
all partition columns are specified
   ```sql
INSERT OVERWRITE TABLE iceberg_db.tbl PARTITION (dt='2025-01-25',
region='bj')
   SELECT id, name FROM source_table;
   ```

2. **Hybrid Partition Mode**: Partial static + partial dynamic partition
   ```sql
   -- dt is static, region comes from SELECT dynamically
   INSERT OVERWRITE TABLE iceberg_db.tbl PARTITION (dt='2025-01-25')
   SELECT id, name, region FROM source_table;
   ```

### Implementation Details

#### FE Changes
- **Parser** (`DorisParser.g4`, `LogicalPlanBuilder.java`): Extended
partition spec parsing to support `PARTITION (col='value', ...)` syntax
- **InsertPartitionSpec**: New unified data structure to represent
partition modes (auto-detect, dynamic, static)
- **UnboundIcebergTableSink**: Added `staticPartitionKeyValues` field to
carry static partition info
- **BindSink**: Added validation for static partition columns and
generate constant expressions for static partition values
- **IcebergTransaction**: Implemented `commitStaticPartitionOverwrite()`
using Iceberg's `OverwriteFiles.overwriteByRowFilter()` API
- **IcebergUtils**: Added `parsePartitionValueFromString()` utility for
partition value type conversion

#### BE Changes
- **VIcebergTableWriter**:
- Support full static partition mode (all data goes to single partition)
- Support hybrid partition mode (static columns from config, dynamic
columns from data)
- Added `_is_full_static_partition` and
`_dynamic_partition_column_indices` for mode detection

#### Thrift Changes
- Added `static_partition_values` field to `TIcebergTableSink` for
passing static partition info from FE to BE

(cherry picked from commit 8a974d4)
…se column count validation (apache#59951)

- Related Pr: apache#58396
### What problem does this PR solve?

Related: apache#58396

## Problem

When using `INSERT OVERWRITE` with static partition syntax and `VALUES`
clause, the operation incorrectly failed with:

```
Column count doesn't match value count. Expected: N, but got: M
```

### Example that was failing:
```sql
-- Table has 2 columns: id (int), par (string) - partitioned by par
INSERT OVERWRITE TABLE test_partition_branch
PARTITION (par='a')
VALUES (11), (12);
```

**Error**: Column count doesn't match value count. Expected: 2, but got:
1

### Root Cause

The column count validation in `InsertUtils.normalizePlan()` did not
account for static partition columns. When using `PARTITION
(col='value')` syntax:
- The partition column value is **already fixed** in the PARTITION
clause
- The VALUES should **only provide non-partition column values**
- This is standard SQL behavior (Hive, Iceberg, etc.)

The validation was comparing VALUES count against **all** table columns
instead of **non-partition** columns only.

## Solution

Modified `InsertUtils.java:363-372` to:
1. Detect when the sink is `UnboundIcebergTableSink`
2. Extract static partition columns from `staticPartitionKeyValues`
3. Filter out static partition columns from the column list before
validation
4. Only compare VALUES count against non-partition columns

```java
if (unboundLogicalSink instanceof UnboundIcebergTableSink
        && CollectionUtils.isEmpty(unboundLogicalSink.getColNames())) {
    UnboundIcebergTableSink<?> icebergSink = (UnboundIcebergTableSink<?>) unboundLogicalSink;
    Map<String, Expression> staticPartitions = icebergSink.getStaticPartitionKeyValues();
    if (staticPartitions != null && !staticPartitions.isEmpty()) {
        Set<String> staticPartitionColNames = staticPartitions.keySet();
        columns = columns.stream()
                .filter(column -> !staticPartitionColNames.contains(column.getName()))
                .collect(ImmutableList.toImmutableList());
    }
}
```

(cherry picked from commit e572788)
@suxiaogang223 suxiaogang223 requested a review from yiguolei as a code owner March 16, 2026 06:55
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.49% (31941/49530)
Region Coverage 65.34% (15987/24468)
Branch Coverage 55.96% (8513/15214)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 21.76% (42/193) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 6.86% (12/175) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.93% (19243/36355)
Line Coverage 36.18% (179358/495696)
Region Coverage 32.75% (138888/424098)
Branch Coverage 33.78% (60391/178752)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 8.57% (15/175) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.53% (25459/35590)
Line Coverage 54.17% (268051/494810)
Region Coverage 51.70% (221452/428372)
Branch Coverage 53.23% (95492/179395)

@morningman morningman merged commit 41c9623 into apache:branch-4.1 Mar 16, 2026
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants