You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Cast string to decimal could return null on GPU while a valid value on CPU in some edge cases when scale - precision.
An example: -9.72792462805176E-15 will return null when cast to decimal(15,15) on GPU but -1.0E-14 on CPU.
It seems very edge.
Steps/Code to reproduce bug
scala>importorg.apache.spark.sql.functions._importorg.apache.spark.sql.functions._
scala>importorg.apache.spark.sql.types._importorg.apache.spark.sql.types._
scala>valdata=Seq("-9.72792462805176E-15").toDF
data: org.apache.spark.sql.DataFrame= [value: string]
scala> data.write.mode("OVERWRITE").parquet("TEMP")
24/05/2408:07:55WARNGpuOverrides:*Exec <DataWritingCommandExec> will run on GPU*Output <InsertIntoHadoopFsRelationCommand> will run on GPU! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator classorg.apache.spark.sql.execution.LocalTableScanExec@Expression <AttributeReference> value#1 could run on GPU
scala>valdf= spark.read.parquet("TEMP")
df: org.apache.spark.sql.DataFrame= [value: string]
scala> df.select(col("value").cast(DecimalType(15, 15))).show()
24/05/2408:08:13WARNGpuOverrides:!Exec <CollectLimitExec> cannot run on GPU because the ExecCollectLimitExec has been disabled, and is disabled by default because CollectLimit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to trueif you wish to enable it
@Partitioning <SinglePartition$> could run on GPU*Exec <ProjectExec> will run on GPU*Expression <Alias> cast(cast(value#6 as decimal(15,15)) as string) AS value#12 will run on GPU*Expression <Cast> cast(cast(value#6 as decimal(15,15)) as string) will run on GPU*Expression <Cast> cast(value#6 as decimal(15,15)) will run on GPU*Exec <FileSourceScanExec> will run on GPU+-----+|value|+-----+|null|+-----+
scala> spark.conf.set("spark.rapids.sql.enabled", "false")
scala> df.select(col("value").cast(DecimalType(15, 15))).show()
+--------+| value|+--------+|-1.0E-14|+--------+
A test cases:
deftest_cast_double_to_string_to_decimal():
assert_gpu_and_cpu_are_equal_collect(
lambdaspark : unary_op_df(spark, double_gen, length=100000).selectExpr(
"cast(cast(a as string) as decimal(15, 15))"),
conf= {'spark.rapids.sql.castFloatToDecimal.enabled': 'true',
'spark.rapids.sql.castDecimalToFloat.enabled': 'true',
'spark.rapids.sql.castFloatToString.enabled': 'true'})
Expected behavior
CPU and GPU results should match.
Environment details (please complete the following information)
latest code, jdk 8, tested spark 330 & 341
The text was updated successfully, but these errors were encountered:
Describe the bug
Cast string to decimal could return null on GPU while a valid value on CPU in some edge cases when scale - precision.
An example: -9.72792462805176E-15 will return null when cast to decimal(15,15) on GPU but -1.0E-14 on CPU.
It seems very edge.
Steps/Code to reproduce bug
A test cases:
Expected behavior
CPU and GPU results should match.
Environment details (please complete the following information)
latest code, jdk 8, tested spark 330 & 341
The text was updated successfully, but these errors were encountered: