Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logical plan generation inconsistency #568

Open
emanueledomingo opened this issue Jan 24, 2024 · 0 comments
Open

Logical plan generation inconsistency #568

emanueledomingo opened this issue Jan 24, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@emanueledomingo
Copy link

Hi Everyone,

I'm not sure if this is the right place for this. It's more a question than a real bug.

Describe the bug
I tried to generate the logical plan of a query, instead of passing the query's text in the .sql function, using substrait. The substrait compilation fails while the function executes it without any problem.

What is the reason behind this behavior?

To Reproduce

import datafusion
from datafusion.substrait import substrait as ss
import pyarrow as pa
import pyarrow.dataset as pda
from faker import Faker

print(f"DF: {datafusion.__version__}\nPA: {pa.__version__}")  # DF: 32.0.0 PA: 14.0.2

fake = Faker()

N_ROWS = 1_000

dummy_table = pa.Table.from_pydict(
    {
        "id": range(N_ROWS),
        "name": (fake.name() for _ in range(N_ROWS)),
        "country_code": (fake.country_code() for _ in range(N_ROWS)),
    }
)

q = """
SELECT
    "t1".*
    , "t2".*
FROM "table" "t1"
INNER JOIN "table" "t2"
    ON "t1"."id" = CASE WHEN "t2"."id" < 10 THEN "t2"."id" ELSE 10 END
"""

ctx = datafusion.SessionContext()
ctx.register_dataset(name="table", dataset=pda.dataset(dummy_table))

df = ctx.sql(q)
default_plan = df.logical_plan()

plan = ss.serde.serialize_to_plan(q, ctx)
logical_plan = ss.consumer.from_substrait_plan(ctx, plan)  # <- Exception here
df = ctx.create_dataframe_from_logical_plan(plan=logical_plan)
ss_plan = df.logical_plan()

Exception is:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[6], line 2
      1 plan = ss.serde.serialize_to_plan(q2, ctx)
----> 2 logical_plan = ss.consumer.from_substrait_plan(ctx, plan)
      3 df = ctx.create_dataframe_from_logical_plan(plan=logical_plan)
      4 ss_plan = df.logical_plan()

Exception: DataFusion error: Plan("invalid join condition expression")

Expected behavior

assert ss_plan == default_plan
# True
@emanueledomingo emanueledomingo added the bug Something isn't working label Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant