Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WHERE clause on ARRAY incorrectly applied when GIN index exists #117979

Closed
quentez opened this issue Jan 19, 2024 · 3 comments · Fixed by #118256
Closed

WHERE clause on ARRAY incorrectly applied when GIN index exists #117979

quentez opened this issue Jan 19, 2024 · 3 comments · Fixed by #118256
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner

Comments

@quentez
Copy link

quentez commented Jan 19, 2024

Describe the problem

I'm running a query with multiple conditions on arrays using <@ and && operators in its WHERE clause combined with both AND and OR operators.

  • When there are no indices on the table, the results are correct.

  • After adding a GIN index on the array column, other rows start being included in the results. The incorrect results are consistent with the query conditions if the OR condition was applied at the top-level and not nested in parentheses.

Removing the index restores the correct behavior.

To Reproduce

  1. Spin up a CockroachCloud Serverless cluster running v23.1.14.
  2. Create a table with the following schema:
CREATE TABLE test_bug
(
    id bigint not null
      constraint test_bug_pk primary key,
    links text[]
);
  1. Fill it with the following test data:
INSERT INTO public.test_bug (id, links) VALUES (1, '{str1}');
INSERT INTO public.test_bug (id, links) VALUES (2, '{str2}');
INSERT INTO public.test_bug (id, links) VALUES (3, '{str2,str3}');
INSERT INTO public.test_bug (id, links) VALUES (4, '{str1,str2,str3}');
  1. Run the query:
SELECT * 
FROM test_bug as t
WHERE 
    ARRAY['str1'] <@ t.links 
    AND (
        ARRAY ['str1'] && t.links
        OR (ARRAY ['str2'] && t.links AND ARRAY ['str3'] && t.links)
    );

This returns rows 1 and 4.

  1. Create the GIN index:
CREATE INDEX "idx_links" ON test_bug USING gin (links);
  1. Run the query again (I force the index but it also happens when it gets picked automatically):
SELECT * 
FROM test_bug@{FORCE_INDEX="idx_links"} as t
WHERE 
    ARRAY['str1'] <@ t.links 
    AND (
        ARRAY ['str1'] && t.links
        OR (ARRAY ['str2'] && t.links AND ARRAY ['str3'] && t.links)
    );

This now returns rows 1, 4, and 3.
(Row 3 does not match the mandatory ARRAY['str1'] <@ t.links condition).

  1. Remove the index and try again, the results should be back to normal.

Expected behavior
The query results should always be the same regardless of the index being used.

Environment:

  • CockroachDB version Serverless v23.1.14
  • Server OS: Linux
  • Client app: Any SQL client

Jira issue: CRDB-35441

@quentez quentez added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 19, 2024
Copy link

blathers-crl bot commented Jan 19, 2024

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/disaster-recovery (found keywords: restore)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added A-disaster-recovery O-community Originated from the community X-blathers-triaged blathers was able to find an owner T-disaster-recovery labels Jan 19, 2024
Copy link

blathers-crl bot commented Jan 19, 2024

cc @cockroachdb/disaster-recovery

@dt dt removed this from Triage in Disaster Recovery Backlog Jan 22, 2024
@exalate-issue-sync exalate-issue-sync bot added T-sql-queries SQL Queries Team and removed T-disaster-recovery labels Jan 22, 2024
@yuzefovich yuzefovich self-assigned this Jan 23, 2024
@yuzefovich
Copy link
Member

Thanks for the very detailed report @quentez! I can confirm that this is a bug, and it's present in 23.2.0, 23.1.12, and 22.2.16 versions. Chances are it has been present since #49446 which was merged around 20.2 time frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants