Skip to content

Conversation

@nuno-faria
Copy link
Contributor

Which issue does this PR close?

N/A.

Rationale for this change

Upgrade to the latest version of datafusion.

What changes are included in this PR?

Fixed breaking changes.

Are there any user-facing changes?

No.

Comment on lines +898 to +901
"List(nullable Int64)",
"List(nullable Int64)",
"List(nullable Int64)",
"List(nullable Int64)",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists have a new render format: apache/arrow-rs#8290

(
f.regexp_replace(column("a"), literal("(ell|orl)"), literal("-")),
pa.array(["H-o", "W-d", "!"]),
pa.array(["H-o", "W-d", "!"], type=pa.string_view()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regexp_replace now uses UTF8View: apache/datafusion#17195

impl PyCatalog {
#[new]
fn new(catalog: PyObject) -> Self {
fn new(catalog: Py<PyAny>) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)))?;

Python::with_gil(|py| {
Python::attach(|py| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let streams = spawn_future(py, async move { df.execute_stream_partitioned().await })?;

let mut schema: Schema = self.df.schema().to_owned().into();
let mut schema: Schema = self.df.schema().to_owned().as_arrow().clone();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[allow(clippy::too_many_arguments)]
#[new]
#[pyo3(signature = (schema, name, location, file_type, table_partition_cols, if_not_exists, temporary, order_exprs, unbounded, options, constraints, column_defaults, definition=None))]
#[pyo3(signature = (schema, name, location, file_type, table_partition_cols, if_not_exists, or_replace, temporary, order_exprs, unbounded, options, constraints, column_defaults, definition=None))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl PyPrepare {
#[new]
pub fn new(name: String, data_types: Vec<PyDataType>, input: PyLogicalPlan) -> Self {
pub fn new(name: String, fields: Vec<PyArrowType<Field>>, input: PyLogicalPlan) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepare now stores fields: apache/datafusion#17986

use datafusion::functions_aggregate::all_default_aggregate_functions;
use datafusion::functions_window::all_default_window_functions;
use datafusion::logical_expr::expr::{Alias, FieldMetadata, WindowFunction, WindowFunctionParams};
use datafusion::logical_expr::sqlparser::ast::NullTreatment as DFNullTreatment;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NullTreatment has been moved to not rely on the sqlparser: apache/datafusion#17332

@timsaucer
Copy link
Member

Thank you for getting started on this! I was hoping we might delay slightly until DF 51.1.0 releases apache/datafusion#18843 Do you think it's okay to wait a bit before release? Either way we can get this PR in and the update the lock.

@nuno-faria
Copy link
Contributor Author

Thank you for getting started on this! I was hoping we might delay slightly until DF 51.1.0 releases apache/datafusion#18843 Do you think it's okay to wait a bit before release? Either way we can get this PR in and the update the lock.

Yeah no problem! Meanwhile it appears that the actions are stuck on the test_arrow_c_stream_interrupted, so it's better to stop the workflow. I will have to take a look at it.

@nuno-faria
Copy link
Contributor Author

I don't know what might be causing the read_all method in the arrow_c_stream_interrupted test to not receive the interrupt. I searched in the datafusion and arrow repos but could not find recent changes that could affect this. @kosiew do you have a clue what might be causing this?

@timsaucer
Copy link
Member

From a little investigation I suspect that the py.check_signals() is no longer running on the main thread. From the documentation "If the function is called from a non-main thread, or under a non-main Python interpreter, it does nothing yet still returns Ok(())."

I wonder if the new py.detach() and py.attach() from 0.25 -> 0.26 is more than simply name change on the deprecation.

@kosiew
Copy link
Contributor

kosiew commented Nov 26, 2025

@nuno-faria
Thanks for the ping.
I am travelling and will not be able to look into this for the rest of the week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants