Skip to content

Explore implementing some Comet accelerated expressions in Scala #4153

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Just a crazy idea, but I was thinking about regexp expression support. Java and Rust have different regexp engines with different features and behavior, so we'll never be able to be fully compatible with a native implementation.

In Spark RAPIDS, I spend significant time working on a regexp transpiler to try and translate Java regexp into a format that would be compatible in native code (cuDF in that case). This was a huge effort and did not reach full compatibility.

When we think about accelerating expressions in Comet, we really mean "write a native implementation", but it doesn't really have to be this way in all cases. We could also implement Comet expressions in Scala.

Rather than fall back to Spark for a projection or predicate with a regexp expr, we could implement have Comet call the same Java code that Spark calls to evaluate the regexp expr but do this over elements in arrays rather than over rows, avoiding the conversion costs.

This is not a well thought out idea yet, but I'll try and come up with a more concrete proposal.

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions