Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using match with ^$ as anchors on a primary key column returns incorrect results #53222

Closed
aaron276h opened this issue Aug 9, 2023 · 2 comments · Fixed by #54696
Closed

Using match with ^$ as anchors on a primary key column returns incorrect results #53222

aaron276h opened this issue Aug 9, 2023 · 2 comments · Fixed by #54696
Assignees
Labels
bug Confirmed user-visible misbehaviour in official release major v23.3-affected v23.8-affected

Comments

@aaron276h
Copy link

Describe what's wrong

When running a query that applies a regex expression to the primary key column and uses ^$ as anchors on the first element, Clickhouse incorrectly returns only a portion of the expected results.

Does it reproduce on recent release?

Running release 23.3.8.21.

How to reproduce
Create table and insert data into different parts. Note that this bug appears to apply to only the primary key condition as inserting both data items into same part will return the correct result.

CREATE TABLE example(
    "time" Int64 CODEC(ZSTD(1)),
    "svc" LowCardinality(String) CODEC(ZSTD(1)),
    "title" String CODEC(ZSTD(1)),
 ) ENGINE = MergeTree
 PARTITION BY intDiv("time", 1000)
 ORDER BY ("svc", "time");
 
INSERT INTO example(*) VALUES(toInt64(4500), 'first', 'blah blah')
INSERT INTO example(*) VALUES(toInt64(3500), 'second', 'blah blah blah')

Reading data with ^$ returns only the 1/2 expected rows:

SELECT svc, title FROM example WHERE match(svc, '^first$|^second$')

Running the query with index explain enabled (EXPLAIN indexes = 1) shows the problem right away:

Expression ((Projection + Before ORDER BY))
	  ReadFromMergeTree (7_ca_43_de_8_a_195_42_c_7_b_590_7_a_270_a_945433.example)
	  Indexes:
	    MinMax
	      Condition: true
	      Parts: 2/2
	      Granules: 2/2
	    Partition
	      Condition: true
	      Parts: 2/2
	      Granules: 2/2
	    PrimaryKey
	      Keys: 
	        svc
	      Condition: (svc in ['first', 'firsu')) <-- this is the problem
	      Parts: 1/2
	      Granules: 1/2

Additional context

A simple workaround is to use \A\z for anchoring which returns the expected results:

SELECT svc, title FROM example WHERE match(svc, '\Afirst\z|\Asecond\z')
@aaron276h aaron276h added the potential bug To be reviewed by developers and confirmed/rejected. label Aug 9, 2023
@tavplubix tavplubix added bug Confirmed user-visible misbehaviour in official release major and removed potential bug To be reviewed by developers and confirmed/rejected. labels Sep 12, 2023
@tavplubix
Copy link
Member

tavplubix commented Sep 12, 2023

Reproduced with the latest release https://fiddle.clickhouse.com/b221f8ad-ee2d-41ba-acb4-717eca37bf25

@den-crane
Copy link
Contributor

den-crane commented Sep 12, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed user-visible misbehaviour in official release major v23.3-affected v23.8-affected
Projects
None yet
4 participants