Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x NOT IN y works but NOT (x IN y) doesn't #10830

Closed
lostmygithubaccount opened this issue Jun 7, 2024 · 1 comment · Fixed by #10936
Closed

x NOT IN y works but NOT (x IN y) doesn't #10830

lostmygithubaccount opened this issue Jun 7, 2024 · 1 comment · Fixed by #10936
Assignees
Labels
bug Something isn't working

Comments

@lostmygithubaccount
Copy link

lostmygithubaccount commented Jun 7, 2024

Describe the bug

in running some benchmarking, was getting errors like:

Exception: This feature is not implemented: Physical plan does not support logical expression InSubquery(InSubquery { expr: Column(Column { relation: None, name: "ps_suppkey" }), subquery: <subquery>, negated: false })

tracking this down shows x NOT IN y works but NOT (x IN y) doesn't

To Reproduce

probably a more succinct way, but:

import datafusion

ctx = datafusion.SessionContext()

tables = {
    "part": {
        "p_partkey": [1, 2, 3, 4, 5],
        "p_brand": ["Brand#45", "Brand#45", "Brand#46", "Brand#47", "Brand#48"],
        "p_type": ["MEDIUM POLISHED", "MEDIUM POLISHED", "MEDIUM POLISHED", "MEDIUM POLISHED", "MEDIUM POLISHED"],
        "p_size": [49, 14, 23, 45, 19],
    },
    "partsupp": {
        "ps_partkey": [1, 2, 3, 4, 5],
        "ps_suppkey": [1, 2, 3, 4, 5],
    },
    "supplier": {
        "s_suppkey": [1, 2, 3, 4, 5],
        "s_comment": ["Customer Complaints", "Customer Complaints", "Customer Complaints", "Customer Complaints", "Customer Complaints"],
    },
}

for table_name, data in tables.items():
    ctx.from_pydict(data, name=table_name)

sqlA = """
select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
        p_partkey = ps_partkey
  and p_brand <> 'Brand#45'
  and p_type not like 'MEDIUM POLISHED%'
  and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
  and ps_suppkey not in (
    select
        s_suppkey
    from
        supplier
    where
            s_comment like '%Customer%Complaints%'
)
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size;
"""

ctx.sql(sqlA).collect() # this is fine

sqlB = """
SELECT
  p_brand,
  p_type,
  p_size,
  COUNT(DISTINCT ps_suppkey) AS supplier_cnt
FROM partsupp, part
WHERE
  p_partkey = ps_partkey
  AND p_brand <> 'Brand#45'
  AND NOT p_type LIKE 'MEDIUM POLISHED%'
  AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
  AND NOT ps_suppkey IN (
    SELECT
      s_suppkey
    FROM supplier
    WHERE
      s_comment LIKE '%Customer%Complaints%'
  )
GROUP BY
  p_brand,
  p_type,
  p_size
ORDER BY
  supplier_cnt DESC NULLS LAST,
  p_brand,
  p_type,
  p_size;
"""

ctx.sql(sqlB).collect() # this is not fine

# Exception: This feature is not implemented: Physical plan does not support logical expression InSubquery(InSubquery { expr: Column(Column { relation: Some(Bare { table: "partsupp" }), name: "ps_suppkey" }), subquery: <subquery>, negated: false })

Expected behavior

query works in both cases

Additional context

SQLGlot makes the above transformation that should be valid, which is how I hit this initially

@lostmygithubaccount lostmygithubaccount added the bug Something isn't working label Jun 7, 2024
@akoshchiy
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants