Skip to content

Expose eq ignore ascii case from arrow-string #9870

@albertlockett

Description

@albertlockett

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

In otel-arrow, we're building a query-engine based on arrow & datafusion. There are two features we'd like to support:
a) perform a case insensitive match on telemetry attribute's key (a string)
b) perform case insensitive match on some other string value

This was implemented in open-telemetry/otel-arrow#2501 using ilike and escaping the special like characters (&, _, \) (see here). This is not ideal if the text we're comparing against has these escaped characters because, if it does, the comparison gets done using a regex match (which is slower) instead of simply using eq_ignore_ascii_case:

pub(crate) fn ilike(pattern: &'a str, is_ascii: bool) -> Result<Self, ArrowError> {
if is_ascii && pattern.is_ascii() {
if !contains_like_pattern(pattern) {
return Ok(Self::IEqAscii(pattern));
} else if pattern.ends_with('%')
&& !pattern.ends_with("\\%")
&& !contains_like_pattern(&pattern[..pattern.len() - 1])
{
return Ok(Self::IStartsWithAscii(&pattern[..pattern.len() - 1]));
} else if pattern.starts_with('%') && !contains_like_pattern(&pattern[1..]) {
return Ok(Self::IEndsWithAscii(&pattern[1..]));
}
}
Ok(Self::Regex(regex_like(pattern, true)?))

I'm thinking if I could simply expose a way to evaluate Predicate::IEqAscii on my arrays, it would be simple for me to write a ScalarUDF to achieve what I need in my query-engine.

Describe the solution you'd like

I'd like if we could expose a like::eq_ignore_ascii_case function from the arrow-string care that does a equality comparison on two string Datums using a case insensitive ascii match.

Describe alternatives you've considered

  • Ilike and escape (can have performance overhead when there are special characters)
  • Duplicate the predicate code into my query-engine (fixing in arrow-string seemed like less work)

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions