-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Add DELETE/UPDATE hooks to TableProvider trait #19142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7505171 to
695716c
Compare
Rebased on latest main. Fixed clippy warnings in tests and updated delete.slt/update.slt expected output to reflect new DmlResultExec physical plans. All tests pass locally. Could you please re-trigger the CI? Thanks! |
Add infrastructure for row-level DML operations (DELETE/UPDATE) to the TableProvider trait, enabling storage engines to implement SQL-based mutations. Changes: - Add `DmlCapabilities` struct to declare DELETE/UPDATE support - Add `TableProvider::dml_capabilities()` method (defaults to NONE) - Add `TableProvider::delete_from()` method for DELETE operations - Add `TableProvider::update()` method for UPDATE operations - Wire physical planner to route DML operations to TableProvider - Add helper functions for extracting filters and assignments This provides the API surface for downstream projects (iceberg-rust, delta-rs) to implement DML without custom query planners. A reference MemTable implementation follows in a subsequent PR.
Clippy warns about assertions on constants being optimized away. Changed to assert_eq! for explicit value comparison.
- Change assert_eq!(x, true/false) to assert!(x)/assert!(!x) per clippy - Update delete.slt and update.slt expected error messages to match new physical planner behavior for unsupported DML operations
22c7273 to
b038d04
Compare
Rebased on main. Previous CI failures were due to clippy bool_assert_comparison warnings and mismatched expected error messages in test files. Requesting a CI trigger to confirm I've successfully stopped embarrassing myself. 🙏 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR @ethan-tyler 🙏
I have a few suggestions for API and then I think this PR needs some tests for the new planning code
Otherwise this is looking very nice 👌
| }) => { | ||
| if let Some(provider) = | ||
| target.as_any().downcast_ref::<DefaultTableSource>() | ||
| { | ||
| let capabilities = provider.table_provider.dml_capabilities(); | ||
| if !capabilities.delete { | ||
| return plan_err!( | ||
| "Table '{}' does not support DELETE operations", | ||
| table_name | ||
| ); | ||
| } | ||
| let filters = extract_dml_filters(input)?; | ||
| provider | ||
| .table_provider | ||
| .delete_from(session_state, filters) | ||
| .await? | ||
| } else { | ||
| return exec_err!( | ||
| "Table source can't be downcasted to DefaultTableSource" | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better (smaller API surface) would be to simply try and call provider.delete_from(session_state, filters) and pass along the context
Then you could remove DmlCapabilities entirely
Something like:
LogicalPlan::Dml(DmlStatement {
table_name,
target,
op: WriteOp::Delete,
input,
..
}) => {
if let Some(provider) =
target.as_any().downcast_ref::<DefaultTableSource>()
{
let filters = extract_dml_filters(input)?;
provider
.table_provider
.delete_from(session_state, filters)
.await
.context(format!(
"DELETE operation on table '{}'",
table_name
))?
}
}|
|
||
| /// DML operations supported by a table. | ||
| #[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Hash)] | ||
| pub struct DmlCapabilities { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this API is necessary (see above for alternate)
| Ok((physical_expr, physical_name)) | ||
| } | ||
|
|
||
| /// Extract filter predicates from a DML input plan (DELETE/UPDATE). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some tests for the planning of delete and update -- specifically that the columns extracted are correct
Perhaps somewhere in https://github.com/apache/datafusion/blob/2db3aeaa53c2fe7cbc709ad517fd6255d5f02074/datafusion/core/tests/custom_sources_cases/mod.rs#L215-L214
The custom provider doesn't actually need to delete anything, just print out the columns that it was passed, for example
|
Thanks so much for the review @alamb! Really appreciate you taking the time And yeah, you're totally right about the redudent DmlCapabilities. The default impl already returns the error, so the capability check is basically just... checking if we're going to get an error before we get the error. lol. Will rip that out and simplify the planner logic. Also adding .context() for better errors and I'll throw in some tests for the filter/assignment extraction, was thinking a mock TableProvider that just captures what gets passed to it so we can assert on it. Should have something pushed soon. Thanks again for taking the time on this! |
- Remove DmlCapabilities struct and dml_capabilities() trait method - Simplify physical planner to try-call pattern with .context() errors - Add planning tests verifying filter/assignment extraction
|
@alamb addressed your feedback and pushed the changes:
Clippy is clean locally. Ready for another look when you get a chance. |
Update expected error messages in sqllogictests to match new error format from .context() wrapping in physical planner. The error messages now include context about the DML operation being performed. Old format: physical_plan_error Error during planning: Table 't1' does not support DELETE operations New format: physical_plan_error 01)DELETE operation on table 't1' 02)caused by 03)This feature is not implemented: DELETE not supported for Base table
|
@ethan-tyler could you update the PR description since |
|
|
||
| //! Tests for DELETE and UPDATE planning to verify filter and assignment extraction. | ||
| use std::any::Any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would it be to integrate this (or a different implementation) into our in-memory tables? It would be very cool to see an SLT test showing it working.
Add infrastructure for row-level DML operations (DELETE/UPDATE) to the TableProvider trait, enabling storage engines to implement SQL-based mutations.
Changes:
DmlCapabilitiesstruct to declare DELETE/UPDATE supportTableProvider::dml_capabilities()method (defaults to NONE)TableProvider::delete_from()method for DELETE operationsTableProvider::update()method for UPDATE operationsThis provides the API surface for downstream projects (iceberg-rust, delta-rs) to implement DML without custom query planners. A reference MemTable implementation follows in a subsequent PR.
Which issue does this PR close?
delete_fromandupdateinTableProvider#16959Rationale for this change
DataFusion parses DELETE/UPDATE but returns NotImplemented("Unsupported logical plan: Dml(Delete)") at physical planning. Downstream projects (iceberg-rust, delta-rs) must implement custom planners to work around this.
What changes are included in this PR?
Adds TableProvider hooks for row-level DML:
Physical planner routes WriteOp::Delete and WriteOp::Update to these methods. Return type matches insert_into(): Arc producing {count: UInt64}.
Are these changes tested?
Yes. Unit tests for DmlCapabilities:
cargo test -p datafusion-expr -- dml_capabilitiesA follow-up PR provides MemTable implementation with comprehensive sqllogictest coverage.
Are there any user-facing changes?
New trait methods on TableProvider:
Fully backward compatible. Defaults return NONE or NotImplemented.