What is the problem the feature request solves?
Just a crazy idea, but I was thinking about regexp expression support. Java and Rust have different regexp engines with different features and behavior, so we'll never be able to be fully compatible with a native implementation.
In Spark RAPIDS, I spend significant time working on a regexp transpiler to try and translate Java regexp into a format that would be compatible in native code (cuDF in that case). This was a huge effort and did not reach full compatibility.
When we think about accelerating expressions in Comet, we really mean "write a native implementation", but it doesn't really have to be this way in all cases. We could also implement Comet expressions in Scala.
Rather than fall back to Spark for a projection or predicate with a regexp expr, we could implement have Comet call the same Java code that Spark calls to evaluate the regexp expr but do this over elements in arrays rather than over rows, avoiding the conversion costs.
This is not a well thought out idea yet, but I'll try and come up with a more concrete proposal.
Describe the potential solution
No response
Additional context
No response
What is the problem the feature request solves?
Just a crazy idea, but I was thinking about regexp expression support. Java and Rust have different regexp engines with different features and behavior, so we'll never be able to be fully compatible with a native implementation.
In Spark RAPIDS, I spend significant time working on a regexp transpiler to try and translate Java regexp into a format that would be compatible in native code (cuDF in that case). This was a huge effort and did not reach full compatibility.
When we think about accelerating expressions in Comet, we really mean "write a native implementation", but it doesn't really have to be this way in all cases. We could also implement Comet expressions in Scala.
Rather than fall back to Spark for a projection or predicate with a regexp expr, we could implement have Comet call the same Java code that Spark calls to evaluate the regexp expr but do this over elements in arrays rather than over rows, avoiding the conversion costs.
This is not a well thought out idea yet, but I'll try and come up with a more concrete proposal.
Describe the potential solution
No response
Additional context
No response