Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex match turns non-breakable space into regular space #10058

Closed
1 task done
ttomasz opened this issue Dec 21, 2023 · 1 comment · Fixed by #10061
Closed
1 task done

Regex match turns non-breakable space into regular space #10058

ttomasz opened this issue Dec 21, 2023 · 1 comment · Fixed by #10061

Comments

@ttomasz
Copy link

ttomasz commented Dec 21, 2023

What happens?

When trying to use regex to match specific character by providing unicode code it seems that non-breakable space (chr: 160) is converted to regular space (chr: 32).

The RE2 engine seems to supports this fine: https://regex101.com/r/7SjXN9/1

To Reproduce

with
data(wsc, zipcode) as (
values (32, '00' || chr(32) || '001'), (160, '00' || chr(160) || '001')
)
select *
from data
where 1=1
and regexp_matches(zipcode, '^00\x{00A0}001$')
and regexp_matches(zipcode, '^00\x{0020}001$')

OS:

Linux

DuckDB Version:

0.9.2

DuckDB Client:

CLI

Full Name:

Tomasz Taraś

Affiliation:

Orsted

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@Mytherin
Copy link
Collaborator

Thanks for the report! I've pushed a fix in #10061. This was an issue with the handling of unicode literals in an optimizer that converts static regexp expressions into string comparisons or LIKE expressions.

Mytherin added a commit that referenced this issue Dec 22, 2023
Fix #10058: correctly handle unicode literals in regexp optimizer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants