-
Notifications
You must be signed in to change notification settings - Fork 556
Description
Description
spark assert_true and raise_error function not compatible with current velox type system and function register interface.
as below query a example,
val df = spark.sql("select assert_true(1 > 0) as v, id from t")
df: org.apache.spark.sql.DataFrame = [v: void, id: bigint]
df.schema
res3: org.apache.spark.sql.types.StructType = StructType(StructField(v,NullType,true),StructField(id,LongType,true))
it would generate void type and NullType in output schema which would convert to unknown type in velox, for raise_error, it expect void return type but velox function always expect a non-void return type.
One way i tried is convert raise_error from void return type to string type but it need we update not only expression itself but also the output type schema also from NullType to StringType, otherwise would encounter mem allocation issue based on my local test.
Update:
RaiseError's data type is NullType. Ditto for AssertTrue as it is replaced by IF + RaiseError.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala#L82