Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] dask-sql fails to import with dask-expr enabled #1308

Closed
hendrikmakait opened this issue Feb 23, 2024 · 6 comments
Closed

[BUG] dask-sql fails to import with dask-expr enabled #1308

hendrikmakait opened this issue Feb 23, 2024 · 6 comments
Labels
bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer

Comments

@hendrikmakait
Copy link

What happened:

dask-sql fails to import when trying to use it with dask-expr because dask-expr does not implement dd.Aggregation, which is subclassed in multiple places. We plan to use dask-expr as the default for the dask.dataframe API soon (see discussion issue here: dask/dask#10934), so this will cause problems in the future.

Minimal Complete Verifiable Example:

$ export DASK_DATAFRAME__QUERY_PLANNING=True
$ python
>>> import dask_sql
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/__init__.py", line 8, in <module>
    from .cmd import cmd_loop
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/cmd.py", line 26, in <module>
    from dask_sql.context import Context
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/context.py", line 43, in <module>
    from dask_sql.physical.rel import RelConverter, custom, logical
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/rel/logical/__init__.py", line 1, in <module>
    from .aggregate import DaskAggregatePlugin
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/rel/logical/aggregate.py", line 24, in <module>
    class ReduceAggregation(dd.Aggregation):
TypeError: function() argument 'code' must be code, not str

Environment:

@hendrikmakait hendrikmakait added bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer labels Feb 23, 2024
@hendrikmakait
Copy link
Author

Update: The problem mentioned above has been resolved via dask/dask#10947 and dask/dask-expr#893.

There is now a new error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/__init__.py", line 8, in <module>
    from .cmd import cmd_loop
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/cmd.py", line 26, in <module>
    from dask_sql.context import Context
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/context.py", line 43, in <module>
    from dask_sql.physical.rel import RelConverter, custom, logical
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/rel/logical/__init__.py", line 5, in <module>
    from .filter import DaskFilterPlugin
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/rel/logical/filter.py", line 11, in <module>
    from dask_sql.physical.utils.filter import attempt_predicate_pushdown
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/utils/filter.py", line 309, in <module>
    M.fillna: dd._Frame.fillna,
              ^^^^^^^^^^^^^^^^
AttributeError: 'function' object has no attribute 'fillna'

@hendrikmakait
Copy link
Author

@charlesbluca , @rjzamora: Fixing the usage of dd._Frame shouldn't be much work. Are you aware of any other issues (e.g., other internals dask-sql relies upon) that will get in the way of dask-sql supporting dask-expr? I'm trying to scope the required effort here.

@rjzamora
Copy link
Contributor

Thanks for scoping this out @hendrikmakait !

My understanding is that dask-sql uses a lot of custom HLG-optimization code to implement predicate pushdown. If dask-sql can use simpler dask-expr logic for this, then most of that code can probably go away (need to take a careful look soon to say for sure). I know @phofl found that the TPCh benchmarks were a bit faster without predicate pushdown, but it would be nice if users/dask-sql could still "opt in".

@rjzamora
Copy link
Contributor

@charlesbluca @ayushdg - How high of a priority is it to migrate dask-sql's predicate-pushdown logic to dask-expr? It shouldn't be too much work, but it would also be good to know if simply disabling predicate pushdown is a reasonable temporary workaround?

@hendrikmakait
Copy link
Author

hendrikmakait commented Mar 4, 2024

FYI, we have now flipped the switch and dask-expr has become the default dataframe engine (dask/dask#10967).

@charlesbluca
Copy link
Collaborator

Closed with #1319

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer
Projects
None yet
Development

No branches or pull requests

3 participants