Skip to content

feat(datafusion): add TRUNCATE TABLE and DROP PARTITION SQL support#292

Merged
JingsongLi merged 2 commits intoapache:mainfrom
JingsongLi:truncate
Apr 28, 2026
Merged

feat(datafusion): add TRUNCATE TABLE and DROP PARTITION SQL support#292
JingsongLi merged 2 commits intoapache:mainfrom
JingsongLi:truncate

Conversation

@JingsongLi
Copy link
Copy Markdown
Contributor

Purpose

Wire existing TableCommit::truncate_table() and truncate_partitions() APIs to the DataFusion SQL layer, supporting:

  • TRUNCATE TABLE db.t
  • TRUNCATE TABLE db.t PARTITION (col = val, ...)
  • ALTER TABLE db.t DROP PARTITION (col = val, ...)

Brief change log

Tests

API and Format

Documentation

Wire existing TableCommit::truncate_table() and truncate_partitions()
APIs to the DataFusion SQL layer, supporting:
- TRUNCATE TABLE db.t
- TRUNCATE TABLE db.t PARTITION (col = val, ...)
- ALTER TABLE db.t DROP PARTITION (col = val, ...)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@jerry-024 jerry-024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three must-fix items, posted inline. The HIGH one (partial partition spec) is silent wrong-data deletion, not just a behavior nuance.

}

let field = field_map.get(col_name.as_str()).ok_or_else(|| {
DataFusionError::Plan(format!("Column '{col_name}' not found in table schema"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH: silent wrong-partition deletion on incomplete specs.

This builds partition only from keys the user supplied. The map flows through TableCommit::truncate_partitionspartitions_to_bytes:

let datum = p.get(key).cloned().flatten();  // missing key → None

None is encoded into the partition filter as IS NULL. So on a (dt, region) partitioned table, DROP PARTITION (dt = '2026-04-28') becomes a filter dt='2026-04-28' AND region IS NULL — it does not delete all dt='2026-04-28' partitions like Hive/Spark prefix semantics would. If a NULL-region partition exists, it gets deleted (silent wrong-target). If not, silent no-op. Either way, user intent is mishandled with no error.

Two acceptable fixes:

  • Reject incomplete specs explicitly: after the loop, if partition.len() != partition_keys.len()Err listing the missing keys.
  • Implement true prefix-match deletion: requires a partition-filter primitive that matches on a subset of keys; not just a downstream-API tweak.

For reference, Java's Spark PaimonPartitionManagement.toPaimonPartitions does require(partitionFieldCount <= partitionKeys.length) and projects the partition row type so prefix specs work as Hive expects.

.map_err(to_datafusion_error)?;

let wb = table.new_write_builder();
let commit = wb.new_commit();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM: empty PARTITION () clause falls through to a full-table truncate.

If truncate.partitions is Some(empty_vec) (whatever AST shape sqlparser produces for a degenerate parser edge case), this branch is skipped and execution drops into commit.truncate_table() below — the entire table is wiped despite the user writing a PARTITION clause. Opposite of intent.

Suggest tightening:

if let Some(partitions) = &truncate.partitions {
    if partitions.is_empty() {
        return Err(DataFusionError::Plan(
            "PARTITION clause requires at least one column = value".to_string(),
        ));
    }
    // ... existing branch
    return ok_result(...);
}
commit.truncate_table().await?;

This way only TRUNCATE TABLE t (no PARTITION clause at all, i.e. truncate.partitions is None) reaches truncate_table().

}
}
// DropPartitions is a data operation (not a schema change), so we handle it
// separately and return early — it cannot be combined with schema changes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM: if_exists dropped at three levels — none honored.

  1. Inner AlterTableOperation::DropPartitions { if_exists: _ } (this match arm): partition-level IF EXISTS is explicitly bound and discarded. The doc comment claims this "matches IF EXISTS semantics" because the underlying overwrite is a no-op on missing partitions (verified — truncate_partitions early-returns on empty resolved entries). But that makes plain DROP PARTITION behave identically to DROP PARTITION IF EXISTS. Spark's AlterTableDropPartitionCommand errors when IF EXISTS is omitted and the partition doesn't exist — this PR diverges silently.

  2. Outer Statement::AlterTable { if_exists, .. }: returning early from this branch into handle_drop_partitions skips whatever outer IF EXISTS handling the rest of handle_alter_table does. ALTER TABLE IF EXISTS missing_table DROP PARTITION (...) will fail with a hard error from catalog.get_table(...) instead of being a silent no-op.

  3. Statement::Truncate { if_exists, .. } (handler at line 570): sqlparser parses the flag (confirmed in sqlparser::ast::Truncate), but handle_truncate_table never reads it. TRUNCATE TABLE IF EXISTS missing_table errors instead of no-oping.

Suggested fix: at each layer, read the flag, and on get_table returning NotFound (or by list_tables first), short-circuit to ok_result when if_exists is set, otherwise propagate the error.

Copy link
Copy Markdown

@jerry-024 jerry-024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 07de603 into apache:main Apr 28, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants