Skip to content

Regex simplification of anchored patterns produces wrong results #22726

@lyne7-sc

Description

@lyne7-sc

Describe the bug

The optimizer rule that simplifies regex match operators has two bugs:

  1. Anchored matches (^...$) on a Utf8View / LargeUtf8 column fail at execution with Arrow error: Invalid comparison operation: Utf8View == Utf8.

  2. Case-insensitive ~* anchored matches return wrong (incomplete) results

To Reproduce

  CREATE TABLE t(s VARCHAR) AS VALUES ('foo'), ('Bazzz'); 

  SELECT * FROM t WHERE s ~ '^Bazzz$';   -- Bug 1: Invalid comparison Utf8View == Utf8
  SELECT * FROM t WHERE s ~* '^foo$';    -- Bug 2: case-insensitive match returns wrong result

Expected behavior

  • Anchored simplification works on Utf8View / LargeUtf8 without errors.
  • ~* anchored matches return correct case-insensitive results.

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions