Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] regexp_instr does not validate regex in PostgreSQL #249

Closed
2 tasks done
lookslikeitsnot opened this issue Mar 24, 2023 · 1 comment
Closed
2 tasks done

[BUG] regexp_instr does not validate regex in PostgreSQL #249

lookslikeitsnot opened this issue Mar 24, 2023 · 1 comment

Comments

@lookslikeitsnot
Copy link
Contributor

Is this a new bug in dbt-expectations?

  • I believe this is a new bug in dbt-expectations
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Any call to the regexp_instr macro will be valid for PostgreSQL. This is due to the generated expression and the handling of nulls by Postgres.

Dissecting the generated expression:

  with grouped_expression as (
    select array_length(
        (
          select regexp_matches(postal_code_5_3, '^\d{5}-\d{3}', '')
        ),
        1
      ) > 0 as expression
    from <db_name>.dbt_expectations_integration_tests.data_text
  ),

  validation_errors as (
    select *
    from grouped_expression
    where not(expression = true)
  )

  select *
  from validation_errors
  1. select regexp_matches('<valid_source>', '<valid_regex>', '<any_flags>') will always return either no results or a text[].
  2. Getting the array_length of that will either be null or >0
  3. Comparing that to 0 will return null or true respectively
  4. Comparing that to true will once again return null or true respectively
  5. Negating that will return null or false which have the same full filter effect in a where clause

Expected Behavior

regexp_instr should fail when regex doesn't match.

Steps To Reproduce

Change any regex in "dbt-expectations/integration_tests/models/schema_tests/schema.yml" with anything else (e.g. replace the '^\d{5}-\d{3}' regex with 'definitely_not_a_valid_postal_code') , the tests will still succeed.

Environment

- OS: Manjaro Linux 22.0.2
- Python: 3.10.9
- dbt: 1.4.5
- dbt-expectations: 0.8.3

Which database adapter are you using with dbt?

Postgres

Suggested fix

Fix with the widest Postgres version compatibility range would probably be to coalesce the array_length
i.e. to replace
array_length((select regexp_matches({{ source_value }}, '{{ regexp }}', '{{ flags }}')), 1)
with
coalesce(array_length((select regexp_matches({{ source_value }}, '{{ regexp }}', '{{ flags }}')), 1), 0)
in "dbt-expectations/macros/regex/regexp_instr.sql"

clausherther added a commit that referenced this issue Mar 27, 2023
* Add regex error tests

* Fix postgres__regexp_instr

* cleaned up test regex

* specified target adapters for regex flagged tests

* Add regex config comments

---------

Co-authored-by: Claus Herther <claus@calogica.com>
@clausherther
Copy link
Contributor

Closed via #250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants