Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayIndexOutOfBoundsException in complex query #15901

Closed
hlcianfagna opened this issue Apr 23, 2024 · 3 comments · Fixed by #15969
Closed

ArrayIndexOutOfBoundsException in complex query #15901

hlcianfagna opened this issue Apr 23, 2024 · 3 comments · Fixed by #15969
Assignees
Labels
bug Clear identification of incorrect behaviour

Comments

@hlcianfagna
Copy link
Contributor

CrateDB version

5.7.0 and 5.6.3

CrateDB setup information

Number of nodes: 3

Problem description

The query below (an anonymised and simplified version of a much more complex query I was trying to run), fails with an ArrayIndexOutOfBoundsException.

Please note that removing any one of the following elements the query runs without error messages:

  • unused_field1
  • the ORDER BY sorting_field
  • the filters in the sorting_field subquery
  • FROM information_schema.tables limit 1

Steps to Reproduce

WITH cte1
AS (
	SELECT 'def' AS field1
		,5 AS unused_field1
	FROM information_schema.tables limit 1
	)
	,cte2
AS (
	SELECT 'abc' AS field2
		,cte1.field1
		,(
			SELECT count(*)
			FROM (
				SELECT 2 AS id
					,'abc' AS field2
					,'def' AS field1
				) subquery2
			WHERE 'abc' = subquery2.field2
				AND cte1.field1 = subquery2.field1
			) AS sorting_field
	FROM cte1
	)
SELECT (
		SELECT cte2.field2
		FROM cte2
		WHERE cte2.field2 = subquery1.target_field2
		ORDER BY sorting_field LIMIT 1
		) AS field3
FROM (
	SELECT 'abc' AS target_field2
		,'def' AS field1
	) subquery1;

Actual Result

java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
	at io.crate.data.BiArrayRow.get(BiArrayRow.java:57)
	at io.crate.execution.engine.collect.RowCollectExpression.setNextRow(RowCollectExpression.java:42)
	at io.crate.execution.engine.collect.RowCollectExpression.setNextRow(RowCollectExpression.java:29)
	at io.crate.execution.engine.sort.BoundedSortingLimitAndOffsetCollector.onNextRow(BoundedSortingLimitAndOffsetCollector.java:129)
	at io.crate.data.BatchIterator.lambda$collect$1(BatchIterator.java:228)
	at io.crate.data.BatchIterator.move(BatchIterator.java:164)
	at io.crate.data.BatchIterator.lambda$move$0(BatchIterator.java:182)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
	at io.crate.data.BatchIterators.lambda$collect$1(BatchIterators.java:50)
	at io.crate.data.BatchIterator.move(BatchIterator.java:167)
	at io.crate.data.BatchIterators.collect(BatchIterators.java:47)
	at io.crate.data.CollectingRowConsumer.accept(CollectingRowConsumer.java:40)
	at io.crate.execution.engine.InterceptingRowConsumer.lambda$tryForwardResult$1(InterceptingRowConsumer.java:93)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Expected Result

 field3 
--------
 abc
(1 row)
@hlcianfagna hlcianfagna added the triage An issue that needs to be triaged by a maintainer label Apr 23, 2024
@BaurzhanSakhariev BaurzhanSakhariev added bug Clear identification of incorrect behaviour and removed triage An issue that needs to be triaged by a maintainer labels Apr 25, 2024
@BaurzhanSakhariev BaurzhanSakhariev self-assigned this Apr 25, 2024
@BaurzhanSakhariev
Copy link
Contributor

BaurzhanSakhariev commented May 6, 2024

Original query plan
  [Eval[(SELECT field2 FROM (cte2)) AS field3] (rows=unknown)
  └ CorrelatedJoin[target_field2, field1, (SELECT field2 FROM (cte2))] (rows=unknown)
    └ Rename[target_field2, field1] AS subquery1 (rows=unknown)
      └ Eval['abc' AS target_field2, 'def' AS field1] (rows=unknown)
        └ TableFunction[empty_row | [] | true] (rows=unknown)
    └ SubPlan
      └ Eval[field2] (rows=0)
        └ Rename[field2] AS cte2 (rows=0)
          └ Eval['abc' AS field2] (rows=0)
            └ Limit[2::bigint;0::bigint] (rows=0)
              └ Limit[1::bigint;0] (rows=0)
                └ OrderBy[(SELECT count(*) FROM (subquery2)) AS sorting_field ASC] (rows=0)
                  └ Filter[('abc' AS field2 = target_field2)] (rows=0)
                    └ CorrelatedJoin[field1, unused_field1, (SELECT count(*) FROM (subquery2))] (rows=unknown)
                      └ Rename[field1] AS cte1 (rows=unknown)
                        └ Limit[1::bigint;0] (rows=unknown)
                          └ Collect[information_schema.tables | ['def' AS field1] | true] (rows=unknown)
                      └ SubPlan
                        └ Limit[2::bigint;0::bigint] (rows=1)
                          └ HashAggregate[count(*)] (rows=1)
                            └ Rename[id, field2, field1] AS subquery2 (rows=0)
                              └ Eval[2 AS id, 'abc' AS field2, 'def' AS field1] (rows=0)
                                └ Filter[(('abc' AS field2 = 'abc') AND ('def' AS field1 = field1))] (rows=0)
                                  └ TableFunction[empty_row | [] | true] (rows=unknown)]
Plan of the query without unused field
[Eval[(SELECT field2 FROM (cte2)) AS field3] (rows=unknown)
  └ CorrelatedJoin[target_field2, field1, (SELECT field2 FROM (cte2))] (rows=unknown)
    └ Rename[target_field2, field1] AS subquery1 (rows=unknown)
      └ Eval['abc' AS target_field2, 'def' AS field1] (rows=unknown)
        └ TableFunction[empty_row | [] | true] (rows=unknown)
    └ SubPlan
      └ Eval[field2] (rows=0)
        └ Rename[field2, field1, sorting_field] AS cte2 (rows=0)
          └ Eval['abc' AS field2, field1, (SELECT count(*) FROM (subquery2)) AS sorting_field] (rows=0)
            └ Limit[2::bigint;0::bigint] (rows=0)
              └ Limit[1::bigint;0] (rows=0)
                └ OrderBy[(SELECT count(*) FROM (subquery2)) AS sorting_field ASC] (rows=0)
                  └ Filter[('abc' AS field2 = target_field2)] (rows=0)
                    └ CorrelatedJoin[field1, (SELECT count(*) FROM (subquery2))] (rows=unknown)
                      └ Rename[field1] AS cte1 (rows=unknown)
                        └ Limit[1::bigint;0] (rows=unknown)
                          └ Collect[information_schema.tables | ['def' AS field1] | true] (rows=unknown)
                      └ SubPlan
                        └ Limit[2::bigint;0::bigint] (rows=1)
                          └ HashAggregate[count(*)] (rows=1)
                            └ Rename[id, field2, field1] AS subquery2 (rows=0)
                              └ Eval[2 AS id, 'abc' AS field2, 'def' AS field1] (rows=0)
                                └ Filter[(('abc' AS field2 = 'abc') AND ('def' AS field1 = field1))] (rows=0)
                                  └ TableFunction[empty_row | [] | true] (rows=unknown)]
Plan of the query without FROM information_schema.tables limit 1
[Eval[(SELECT field2 FROM (cte2)) AS field3] (rows=unknown)
  └ CorrelatedJoin[target_field2, field1, (SELECT field2 FROM (cte2))] (rows=unknown)
    └ Rename[target_field2, field1] AS subquery1 (rows=unknown)
      └ Eval['abc' AS target_field2, 'def' AS field1] (rows=unknown)
        └ TableFunction[empty_row | [] | true] (rows=unknown)
    └ SubPlan
      └ Eval[field2] (rows=0)
        └ Rename[field2, field1, sorting_field] AS cte2 (rows=0)
          └ Eval['abc' AS field2, field1, (SELECT count(*) FROM (subquery2)) AS sorting_field] (rows=0)
            └ Limit[2::bigint;0::bigint] (rows=0)
              └ Limit[1::bigint;0] (rows=0)
                └ OrderBy[(SELECT count(*) FROM (subquery2)) AS sorting_field ASC] (rows=0)
                  └ Filter[('abc' AS field2 = target_field2)] (rows=0)
                    └ CorrelatedJoin[field1, unused_field1, (SELECT count(*) FROM (subquery2))] (rows=unknown)
                      └ Rename[field1, unused_field1] AS cte1 (rows=unknown)
                        └ Eval['def' AS field1, 5 AS unused_field1] (rows=unknown)
                          └ TableFunction[empty_row | [] | true] (rows=unknown)
                      └ SubPlan
                        └ Limit[2::bigint;0::bigint] (rows=1)
                          └ HashAggregate[count(*)] (rows=1)
                            └ Rename[id, field2, field1] AS subquery2 (rows=0)
                              └ Eval[2 AS id, 'abc' AS field2, 'def' AS field1] (rows=0)
                                └ Filter[(('abc' AS field2 = 'abc') AND ('def' AS field1 = field1))] (rows=0)
                                  └ TableFunction[empty_row | [] | true] (rows=unknown)]
Plan of the query without filters in the sorting_field subquery
[Eval[(SELECT field2 FROM (cte2)) AS field3] (rows=unknown)
  └ CorrelatedJoin[target_field2, field1, (SELECT field2 FROM (cte2))] (rows=unknown)
    └ Rename[target_field2, field1] AS subquery1 (rows=unknown)
      └ Eval['abc' AS target_field2, 'def' AS field1] (rows=unknown)
        └ TableFunction[empty_row | [] | true] (rows=unknown)
    └ SubPlan
      └ Limit[2::bigint;0::bigint] (rows=0)
        └ Eval[field2] (rows=0)
          └ Limit[1::bigint;0] (rows=0)
            └ OrderBy[sorting_field ASC] (rows=0)
              └ Filter[(field2 = target_field2)] (rows=0)
                └ Rename[field2, sorting_field] AS cte2 (rows=unknown)
                  └ MultiPhase (rows=unknown)
                    └ Eval['abc' AS field2, (SELECT count(*) FROM (subquery2)) AS sorting_field] (rows=unknown)
                      └ Rename[] AS cte1 (rows=unknown)
                        └ Limit[1::bigint;0] (rows=unknown)
                          └ Collect[information_schema.tables | [] | true] (rows=unknown)
                    └ Limit[2::bigint;0::bigint] (rows=1)
                      └ HashAggregate[count(*)] (rows=1)
                        └ Rename[id, field2, field1] AS subquery2 (rows=unknown)
                          └ Eval[2 AS id, 'abc' AS field2, 'def' AS field1] (rows=unknown)
                            └ TableFunction[empty_row | [] | true] (rows=unknown)]

@BaurzhanSakhariev
Copy link
Contributor

BaurzhanSakhariev commented May 6, 2024

Not posting plan without ORDER BY as issue is probably related to ORDER BY + correlated subquery.

Last posted plan (without filters) is quite different from other 3, has only 1 CorrelatedJoin operator.

The most interesting are first 2 workarounds (plan without unused_field and plan without FROM information_schema).

Diff is basically that top-most CorrelatedJoin gets a different cte2 -
both workarounds
have

└ Rename[field2, field1, sorting_field] AS cte2 (rows=0)
          └ Eval['abc' AS field2, field1, (SELECT count(*) FROM (subquery2)) AS sorting_field] (rows=0)

whereas not-working query has

        └ Rename[field2] AS cte2 (rows=0)
          └ Eval['abc' AS field2] (rows=0)          

UPD: Reverting #13895 helps, prunning should not be done in some cases.

@BaurzhanSakhariev
Copy link
Contributor

BaurzhanSakhariev commented May 22, 2024

Not posting plan without ORDER BY as issue is probably related to ORDER BY + correlated subquery.

left side is a plan for not working query, right side is a plan for a query without order by
https://editor.mergely.com/UAQGMEXF

what's interesting is that #15969 targets issue "Correlated join references to a column, that is prunned in the sub-plan" but plan without OrderBy ^ has the same issue and yet it works. I think prunning fix is needed, it only shows that initial idea about the root cause was not correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Clear identification of incorrect behaviour
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants