-
Notifications
You must be signed in to change notification settings - Fork 128
feat: expose DataFrame.write_table #1264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR exposes the DataFrame.write_table functionality from DataFusion to Python, along with supporting dataframe writer options. It enables users to write DataFrame results directly to registered tables with configurable write operations and formatting options.
- Adds Python wrappers for
DataFrameWriteOptions
andInsertOp
enum - Introduces
DataFrame.write_table
method for writing to registered tables - Enhances existing write methods (
write_csv
,write_json
,write_parquet
) with optional write options
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/lib.rs | Registers new Python classes for insert operations and write options |
src/dataframe.rs | Implements Rust bindings for write options and insert operations, updates write methods |
python/datafusion/dataframe.py | Adds Python wrapper classes and updates DataFrame write methods with new options |
python/datafusion/init.py | Exports new classes in public API |
python/tests/test_dataframe.py | Adds comprehensive test coverage for new functionality |
Comments suppressed due to low confidence (1)
python/tests/test_dataframe.py:1
- The parameter name
write_options
is inconsistent with the Rust function signature which expects a positional parameter, not a keyword argument.
# Licensed to the Apache Software Foundation (ASF) under one
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Would anyone be able to review? Maybe @kosiew @crystalxyz @mesejo @kevinjqliu ? This is really just exposing options that already exist upstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found something that you'll want to change.
self._raw_write_options = DataFrameWriteOptionsInternal( | ||
insert_operation, single_file_output, partition_by, sort_by_raw | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can't pass insert_operation directly here.
eg this test will fail
def test_dataframe_write_options_accepts_insert_op() -> None:
"""DataFrameWriteOptions should accept InsertOp enums."""
try:
DataFrameWriteOptions(insert_operation=InsertOp.REPLACE)
except TypeError as exc:
pytest.fail(f"DataFrameWriteOptions rejected InsertOp: {exc}")
Which issue does this PR close?
Closes #1005
Rationale for this change
In addition to closing #1005 this exposes an important function in DataFrame operations, writing to tables. This functionality exists in the upstream DataFusion project but it has not previously been exposed to python. Now that we have external table support and external catalogs, we should make this function accessible to users.
What changes are included in this PR?
DataFrame.write_table
write_csv
,write_json
, andwrite_parquet
Are there any user-facing changes?
There are no breaking changes. The existing methods have a new optional parameter. If it is not provided then the operations are unchanged.