Skip to content

Uncorrelated subquery in JOIN panics (SQLStorm) #17772

@2010YOUY01

Description

@2010YOUY01

Describe the bug

datafusion-cli is compiled from the latest main commit 5bbdb7e

To reproduce, tpch dataset can be generated via https://github.com/clflushopt/tpchgen-rs/tree/main/tpchgen-cli, and update the following paths

CREATE EXTERNAL TABLE customer
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/customer.parquet';

CREATE EXTERNAL TABLE nation
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/nation.parquet';

CREATE EXTERNAL TABLE part
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/part.parquet';

CREATE EXTERNAL TABLE region
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/region.parquet';

CREATE EXTERNAL TABLE lineitem
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/lineitem.parquet';

CREATE EXTERNAL TABLE orders
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/orders.parquet';

CREATE EXTERNAL TABLE partsupp
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/partsupp.parquet';

CREATE EXTERNAL TABLE supplier
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/supplier.parquet';

WITH RankedOrders AS (
  SELECT
    o_orderkey,
    o_orderdate,
    DENSE_RANK() OVER (
      PARTITION BY o_orderpriority
      ORDER BY o_totalprice DESC
    ) AS priority_rank
  FROM orders
  WHERE o_orderstatus = 'F'
),
CustomerNation AS (
  SELECT c_custkey, c_name
  FROM customer
),
OrderLineStats AS (
  SELECT
    l_orderkey,
    SUM(l_extendedprice) AS total_line_value
  FROM lineitem
  GROUP BY l_orderkey
),
TopOrder AS (
  -- Pick one top-priced order (priority_rank = 1)
  SELECT o_orderkey
  FROM RankedOrders
  WHERE priority_rank = 1
  LIMIT 1
)
SELECT DISTINCT
  c.c_name,
  r.o_orderkey,
  COALESCE(o.total_line_value, 0) AS total_order_value
FROM CustomerNation AS c
LEFT JOIN RankedOrders AS r
  ON r.o_orderkey = (SELECT o_orderkey FROM TopOrder)
LEFT JOIN OrderLineStats AS o
  ON o.l_orderkey = r.o_orderkey
WHERE r.o_orderdate >= DATE '1995-01-01'
  AND r.o_orderdate <  DATE '1996-01-01';

thread 'main' panicked at datafusion/common/src/dfschema.rs:365:31:
index out of bounds: the len is 2 but the index is 2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To Reproduce

No response

Expected behavior

No response

Additional context

Found by SQLStorm #17698

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions