Skip to content

Commit

Permalink
[SPARK-46443][SQL] Decimal precision and scale should decided by H2 d…
Browse files Browse the repository at this point in the history
…ialect

### What changes were proposed in this pull request?
This PR fix a but by make JDBC dialect decide the decimal precision and scale.

**How to reproduce the bug?**
#44397 proposed DS V2 push down `PERCENTILE_CONT` and `PERCENTILE_DISC`.
The bug fired when pushdown the below SQL to H2 JDBC.
`SELECT "DEPT",PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "SALARY" ASC NULLS FIRST) FROM "test"."employee" WHERE 1=0 GROUP BY "DEPT"`

**The root cause**
`getQueryOutputSchema` used to get the output schema of query by call `JdbcUtils.getSchema`.
The query for database H2 show below.
`SELECT "DEPT",PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "SALARY" ASC NULLS FIRST) FROM "test"."employee" WHERE 1=0 GROUP BY "DEPT"`
We can get the five variables from `ResultSetMetaData`, please refer:
```
columnName = "PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY SALARY NULLS FIRST)"
dataType = 2
typeName = "NUMERIC"
fieldSize = 100000
fieldScale = 50000
```
Then we get the catalyst schema with `JdbcUtils.getCatalystType`, it calls `DecimalType.bounded(precision, scale)` actually.
The `DecimalType.bounded(100000, 50000)` returns `DecimalType(38, 38)`.
At finally, `makeGetter` throws exception.
```
Caused by: org.apache.spark.SparkArithmeticException: [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 42 exceeds max precision 38. SQLSTATE: 22003
	at org.apache.spark.sql.errors.DataTypeErrors$.decimalPrecisionExceedsMaxPrecisionError(DataTypeErrors.scala:48)
	at org.apache.spark.sql.types.Decimal.set(Decimal.scala:124)
	at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:577)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$4(JdbcUtils.scala:408)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:552)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$3(JdbcUtils.scala:408)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$3$adapted(JdbcUtils.scala:406)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:358)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:339)
```

### Why are the changes needed?
This PR fix the bug that `JdbcUtils` can't get the correct decimal type.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Fix a bug.

### How was this patch tested?
Manual tests in #44397

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #44398 from beliefer/SPARK-46443.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a921da8)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
beliefer authored and cloud-fan committed Dec 22, 2023
1 parent d7534a3 commit 286c469
Showing 1 changed file with 15 additions and 1 deletion.
Expand Up @@ -35,7 +35,7 @@ import org.apache.spark.sql.connector.catalog.functions.UnboundFunction
import org.apache.spark.sql.connector.catalog.index.TableIndex
import org.apache.spark.sql.connector.expressions.{Expression, FieldReference, NamedReference}
import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JdbcUtils}
import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, DecimalType, ShortType, StringType}
import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, DecimalType, MetadataBuilder, ShortType, StringType}

private[sql] object H2Dialect extends JdbcDialect {
override def canHandle(url: String): Boolean =
Expand All @@ -57,6 +57,20 @@ private[sql] object H2Dialect extends JdbcDialect {
override def isSupportedFunction(funcName: String): Boolean =
supportedFunctions.contains(funcName)

override def getCatalystType(
sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {
sqlType match {
case Types.NUMERIC if size > 38 =>
// H2 supports very large decimal precision like 100000. The max precision in Spark is only
// 38. Here we shrink both the precision and scale of H2 decimal to fit Spark, and still
// keep the ratio between them.
val scale = if (null != md) md.build().getLong("scale") else 0L
val selectedScale = (DecimalType.MAX_PRECISION * (scale.toDouble / size.toDouble)).toInt
Option(DecimalType(DecimalType.MAX_PRECISION, selectedScale))
case _ => None
}
}

override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
case StringType => Option(JdbcType("CLOB", Types.CLOB))
case BooleanType => Some(JdbcType("BOOLEAN", Types.BOOLEAN))
Expand Down

0 comments on commit 286c469

Please sign in to comment.