Skip to content

ComputeError: caught exception during execution of a Python source #128

@mouzkolit

Description

@mouzkolit

What happens?

First of all nice, feature with the lazy DataFrame. I have a problem with this new feature in duckdb v1.4.1 also tested with 1.4.0 and polars version 1.34.0, but also tested with versions earlier than this.
Mostly the first loading and filtering ... with pl(lazy = True) works but e.g. joins with other tables are not working and results in this Error:

ComputeError: caught exception during execution of a Python source, exception: InvalidInputException: Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result.

Full Trace:

File ~/.venv/lib/python3.9/site-packages/polars/_utils/deprecation.py:97, in deprecate_streaming_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     93         kwargs["engine"] = "in-memory"
     95     del kwargs["streaming"]
---> [97](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a225468726f6d626f7365227d.vscode-resource.vscode-cdn.net/home/cdsw/notebooks/~/.venv/lib/python3.9/site-packages/polars/_utils/deprecation.py:97) return function(*args, **kwargs)

File ~/.venv/lib/python3.9/site-packages/polars/lazyframe/opt_flags.py:328, in forward_old_opt_flags.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325         optflags = cb(optflags, kwargs.pop(key))  # type: ignore[no-untyped-call,unused-ignore]
    327 kwargs["optimizations"] = optflags
--> [328](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a225468726f6d626f7365227d.vscode-resource.vscode-cdn.net/home/cdsw/notebooks/~/.venv/lib/python3.9/site-packages/polars/lazyframe/opt_flags.py:328) return function(*args, **kwargs)

File ~/.venv/lib/python3.9/site-packages/polars/lazyframe/frame.py:2415, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, collapse_joins, no_optimization, engine, background, optimizations, **_kwargs)
   2413 # Only for testing purposes
   2414 callback = _kwargs.get("post_opt_callback", callback)
-> [2415](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a225468726f6d626f7365227d.vscode-resource.vscode-cdn.net/home/cdsw/notebooks/~/.venv/lib/python3.9/site-packages/polars/lazyframe/frame.py:2415) return wrap_df(ldf.collect(engine, callback))

To Reproduce

con = duckdb.connect(db.db_path, read_only= True)
df_lab = con.sql("SELECT * FROM data1").pl(lazy = True)
df_main = con.sql("SELECT * FROM data2").pl(lazy = True)
df_lab.join(df_main, on = "account_id").collect()

this would be the LazyFrame:

naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)
INNER JOIN:
LEFT PLAN ON: [col("account_id")]
PYTHON SCAN []
PROJECT */10 COLUMNS
RIGHT PLAN ON: [col("account_id")]
PYTHON SCAN []
PROJECT */70 COLUMNS
END INNER JOIN

OS:

linux

DuckDB Version:

1.41.0

DuckDB Client:

Python

Hardware:

No response

Full Name:

Maximilian Zeidler

Affiliation:

Helios

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions