Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regexp_like, improve docs and examples for regexp_match` #9137

Merged
merged 8 commits into from
Feb 9, 2024

Conversation

Omega359
Copy link
Contributor

@Omega359 Omega359 commented Feb 5, 2024

Which issue does this PR close?

Closes #9102

Rationale for this change

Add a new regexp_like function.

What changes are included in this PR?

Code, tests, documentation, perf test

Are these changes tested?

Yes,

Are there any user-facing changes?

A new regexp_like function is available for use.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Feb 5, 2024
@Omega359
Copy link
Contributor Author

Omega359 commented Feb 5, 2024

FYI - the prettier error is not something I see on my machine

arrow-datafusion on  feature/regexp_like [$!] is 📦 v35.0.0 via 🦀 v1.75.0 on ☁️  (us-east-1)
❯ prettier --version
3.2.4

arrow-datafusion on  feature/regexp_like [$!] is 📦 v35.0.0 via 🦀 v1.75.0 on ☁️  (us-east-1)
❯ prettier -w docs/**/scalar_functions.md
docs/source/user-guide/sql/scalar_functions.md 356ms (unchanged)

regexp_like, regexp_match, regexp_replace,
};

fn data(rng: &mut ThreadRng) -> StringArray {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that bench is really needed taking into account how much probs we had with regex performance

make_scalar_function_inner(func)(args)
}
other => {
internal_err!("Unsupported data type {other:?} for function regexp_like")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal_err!("Unsupported data type {other:?} for function regexp_like")
exec_err!("Unsupported data type {other:?} for function regexp_like")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a few instances in the code that have this pattern (I was mirroring the existing regexp_... functions there) - I was going to create a followup PR that fixes them across the whole code base

Copy link
Contributor

@alamb alamb Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be most appreciated. Cleaning up internal errors that are possible to trigger via incrrect queries (rather than a bug) is very good improvement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb alamb changed the title Add regexp_like scalar function Add regexp_like and regexp_match scalar functions Feb 8, 2024
@alamb alamb changed the title Add regexp_like and regexp_match scalar functions Add regexp_like, improve docs and examples for regexp_match` Feb 8, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯 -- this PR is amazing. Thank you so much @Omega359

The thoroughness of test coverage and documentation should serve as a model best practice.

its-so-beautiful-crying-gif

- [`flight_sql_server.rs`](examples/flight/flight_sql_server.rs): Run DataFusion as a standalone process and execute SQL queries from JDBC clients
- [`make_date.rs`](examples/make_date.rs): Examples of using the make_date function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🙏

datafusion/proto/proto/datafusion.proto Show resolved Hide resolved
@@ -0,0 +1,303 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you -- this is much better

return plan_err!("regexp_like() does not support the \"global\" option");
}

let array = arrow_string::regexp::regexp_is_match_utf8(values, regex, Some(flags))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


// use dataframe and regexp_like function to test col 'values', against patterns in col 'patterns' with flags
let df = ctx
assert_batches_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 30d2be9 into apache:main Feb 9, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 9, 2024

Thanks again @Omega359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add regexp_like scalar function
3 participants