Skip to content

Conversation

@ethan-tyler
Copy link

@ethan-tyler ethan-tyler commented Dec 7, 2025

Add infrastructure for row-level DML operations (DELETE/UPDATE) to the TableProvider trait, enabling storage engines to implement SQL-based mutations.

Changes:

  • Add DmlCapabilities struct to declare DELETE/UPDATE support
  • Add TableProvider::dml_capabilities() method (defaults to NONE)
  • Add TableProvider::delete_from() method for DELETE operations
  • Add TableProvider::update() method for UPDATE operations
  • Wire physical planner to route DML operations to TableProvider
  • Add helper functions for extracting filters and assignments

This provides the API surface for downstream projects (iceberg-rust, delta-rs) to implement DML without custom query planners. A reference MemTable implementation follows in a subsequent PR.

Which issue does this PR close?

Rationale for this change

DataFusion parses DELETE/UPDATE but returns NotImplemented("Unsupported logical plan: Dml(Delete)") at physical planning. Downstream projects (iceberg-rust, delta-rs) must implement custom planners to work around this.

What changes are included in this PR?

Adds TableProvider hooks for row-level DML:

  • DmlCapabilities struct with NONE, ALL, DELETE_ONLY, UPDATE_ONLY constants
  • dml_capabilities() returns supported operations (default: NONE)
  • delete_from(state, filters) deletes matching rows
  • update(state, assignments, filters) updates matching rows

Physical planner routes WriteOp::Delete and WriteOp::Update to these methods. Return type matches insert_into(): Arc producing {count: UInt64}.

Are these changes tested?

Yes. Unit tests for DmlCapabilities:

cargo test -p datafusion-expr -- dml_capabilities

A follow-up PR provides MemTable implementation with comprehensive sqllogictest coverage.

Are there any user-facing changes?

New trait methods on TableProvider:

fn dml_capabilities(&self) -> DmlCapabilities;
async fn delete_from(&self, state: &dyn Session, filters: Vec<Expr>) -> Result<Arc<dyn ExecutionPlan>>;
async fn update(&self, state: &dyn Session, assignments: Vec<(String, Expr)>, filters: Vec<Expr>) -> Result<Arc<dyn ExecutionPlan>>;

Fully backward compatible. Defaults return NONE or NotImplemented.

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate catalog Related to the catalog crate labels Dec 7, 2025
@ethan-tyler
Copy link
Author

Hi! This is my first contribution. Could a committer please help trigger CI? This addresses the request for DELETE/UPDATE support.

cc @alamb @adriangb

@ethan-tyler ethan-tyler force-pushed the feat/tableprovider-dml-hooks branch from 7505171 to 695716c Compare December 7, 2025 18:29
@ethan-tyler
Copy link
Author

Hi! This is my first contribution. Could a committer please help trigger CI? This addresses the request for DELETE/UPDATE support.

cc @alamb @adriangb

Rebased on latest main. Fixed clippy warnings in tests and updated delete.slt/update.slt expected output to reflect new DmlResultExec physical plans. All tests pass locally. Could you please re-trigger the CI? Thanks!

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 8, 2025
Add infrastructure for row-level DML operations (DELETE/UPDATE) to the
TableProvider trait, enabling storage engines to implement SQL-based
mutations.

Changes:
- Add `DmlCapabilities` struct to declare DELETE/UPDATE support
- Add `TableProvider::dml_capabilities()` method (defaults to NONE)
- Add `TableProvider::delete_from()` method for DELETE operations
- Add `TableProvider::update()` method for UPDATE operations
- Wire physical planner to route DML operations to TableProvider
- Add helper functions for extracting filters and assignments

This provides the API surface for downstream projects (iceberg-rust,
delta-rs) to implement DML without custom query planners. A reference
MemTable implementation follows in a subsequent PR.
Clippy warns about assertions on constants being optimized away.
Changed to assert_eq! for explicit value comparison.
- Change assert_eq!(x, true/false) to assert!(x)/assert!(!x) per clippy
- Update delete.slt and update.slt expected error messages to match
  new physical planner behavior for unsupported DML operations
@ethan-tyler ethan-tyler force-pushed the feat/tableprovider-dml-hooks branch from 22c7273 to b038d04 Compare December 8, 2025 00:45
@ethan-tyler
Copy link
Author

Hi! This is my first contribution. Could a committer please help trigger CI? This addresses the request for DELETE/UPDATE support.
cc @alamb @adriangb

Rebased on latest main. Fixed clippy warnings in tests and updated delete.slt/update.slt expected output to reflect new DmlResultExec physical plans. All tests pass locally. Could you please re-trigger the CI? Thanks!

Rebased on main. Previous CI failures were due to clippy bool_assert_comparison warnings and mismatched expected error messages in test files. Requesting a CI trigger to confirm I've successfully stopped embarrassing myself. 🙏

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @ethan-tyler 🙏

I have a few suggestions for API and then I think this PR needs some tests for the new planning code

Otherwise this is looking very nice 👌

Comment on lines 602 to 623
}) => {
if let Some(provider) =
target.as_any().downcast_ref::<DefaultTableSource>()
{
let capabilities = provider.table_provider.dml_capabilities();
if !capabilities.delete {
return plan_err!(
"Table '{}' does not support DELETE operations",
table_name
);
}
let filters = extract_dml_filters(input)?;
provider
.table_provider
.delete_from(session_state, filters)
.await?
} else {
return exec_err!(
"Table source can't be downcasted to DefaultTableSource"
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better (smaller API surface) would be to simply try and call provider.delete_from(session_state, filters) and pass along the context

Then you could remove DmlCapabilities entirely

Something like:

            LogicalPlan::Dml(DmlStatement {
                table_name,
                target,
                op: WriteOp::Delete,
                input,
                ..
            }) => {
                if let Some(provider) =
                    target.as_any().downcast_ref::<DefaultTableSource>()
                {
                    let filters = extract_dml_filters(input)?;
                    provider
                        .table_provider
                        .delete_from(session_state, filters)
                        .await
                        .context(format!(
                            "DELETE operation on table '{}'",
                            table_name
                        ))?
                }
            }


/// DML operations supported by a table.
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Hash)]
pub struct DmlCapabilities {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this API is necessary (see above for alternate)

Ok((physical_expr, physical_name))
}

/// Extract filter predicates from a DML input plan (DELETE/UPDATE).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some tests for the planning of delete and update -- specifically that the columns extracted are correct

Perhaps somewhere in https://github.com/apache/datafusion/blob/2db3aeaa53c2fe7cbc709ad517fd6255d5f02074/datafusion/core/tests/custom_sources_cases/mod.rs#L215-L214

The custom provider doesn't actually need to delete anything, just print out the columns that it was passed, for example

@ethan-tyler
Copy link
Author

Thanks so much for the review @alamb! Really appreciate you taking the time And yeah, you're totally right about the redudent DmlCapabilities. The default impl already returns the error, so the capability check is basically just... checking if we're going to get an error before we get the error. lol.

Will rip that out and simplify the planner logic. Also adding .context() for better errors and I'll throw in some tests for the filter/assignment extraction, was thinking a mock TableProvider that just captures what gets passed to it so we can assert on it.

Should have something pushed soon. Thanks again for taking the time on this!

- Remove DmlCapabilities struct and dml_capabilities() trait method
- Simplify physical planner to try-call pattern with .context() errors
- Add planning tests verifying filter/assignment extraction
@ethan-tyler
Copy link
Author

@alamb addressed your feedback and pushed the changes:

  • Removed DmlCapabilities entirely
  • Simplified planner to try-call pattern with .context() error wrapping
  • Added planning tests (dml_planning.rs) - 7 tests with mock providers that capture filters/assignments

Clippy is clean locally. Ready for another look when you get a chance.

Update expected error messages in sqllogictests to match new error
format from .context() wrapping in physical planner. The error
messages now include context about the DML operation being performed.

Old format:
  physical_plan_error Error during planning: Table 't1' does not support DELETE operations

New format:
  physical_plan_error
  01)DELETE operation on table 't1'
  02)caused by
  03)This feature is not implemented: DELETE not supported for Base table
@adriangb
Copy link
Contributor

adriangb commented Dec 9, 2025

@ethan-tyler could you update the PR description since DmlCapabilities is gone?


//! Tests for DELETE and UPDATE planning to verify filter and assignment extraction.
use std::any::Any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to integrate this (or a different implementation) into our in-memory tables? It would be very cool to see an SLT test showing it working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support delete_from and update in TableProvider

3 participants