[SPARK-16714][SQL] `array` should create a decimal array from decimals with different precisions and scales #14353

dongjoon-hyun · 2016-07-25T22:50:03Z

What changes were proposed in this pull request?

In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.

Before

scala> sql("select array(0.001, 0.02)")
org.apache.spark.sql.AnalysisException: cannot resolve `array(CAST(0.001 AS DECIMAL(3,3)), CAST(0.02 AS DECIMAL(2,2)))` due to data type mismatch: input to function array should all be the same type, but it's [decimal(3,3), decimal(2,2)]; line 1 pos 7

After

scala> sql("select array(0.001, 0.02)")
res0: org.apache.spark.sql.DataFrame = [array(0.001, 0.02): array<decimal(3,3)>]

How was this patch tested?

Pass the Jenkins tests with a new test case.

SparkQA · 2016-07-26T00:54:24Z

Test build #62848 has finished for PR 14353 at commit d3dd7fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-07-26T03:19:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

-    TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+    if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+      TypeCheckResult.TypeCheckSuccess


I think we cannot just make the check pass. We need to need to actually cast those element to the same prevision and scale.

For example, if we access a single element, its data type actually may not be the one shown as the array's datatype.

Thank you for review, @yhuai .
I see. I'll check that more.

Hi, @yhuai . I checked the following.

scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T") res4: org.apache.spark.sql.DataFrame = [a[0]: decimal(3,3), a[1]: decimal(3,3)] scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T").show() +-----+-----+ | a[0]| a[1]| +-----+-----+ |0.001|0.020| +-----+-----+ scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T").explain(true) == Parsed Logical Plan == 'Project [unresolvedalias('a[0], None), unresolvedalias('a[1], None)] +- 'SubqueryAlias T +- 'Project ['array(0.001, 0.02) AS a#54] +- OneRowRelation$ == Analyzed Logical Plan == a[0]: decimal(3,3), a[1]: decimal(3,3) Project [a#54[0] AS a[0]#61, a#54[1] AS a[1]#62] +- SubqueryAlias T +- Project [array(0.001, 0.02) AS a#54] +- OneRowRelation$

scala> sql("create table d1(a DECIMAL(3,2))") scala> sql("create table d2(a DECIMAL(2,1))") scala> sql("insert into d1 values(1.0)") scala> sql("insert into d2 values(1.0)") scala> sql("select * from d1, d2").show() +----+---+ | a| a| +----+---+ |1.00|1.0| +----+---+ scala> sql("select array(d1.a,d2.a),array(d2.a,d1.a),* from d1, d2") res5: org.apache.spark.sql.DataFrame = [array(a, a): array<decimal(3,2)>, array(a, a): array<decimal(3,2)> ... 2 more fields] scala> sql("select array(d1.a,d2.a),array(d2.a,d1.a),* from d1, d2").show() +------------+------------+----+---+ | array(a, a)| array(a, a)| a| a| +------------+------------+----+---+ |[1.00, 1.00]|[1.00, 1.00]|1.00|1.0| +------------+------------+----+---+ scala> sql("select array(d1.a,d2.a)[0],array(d2.a,d1.a)[0],* from d1, d2").show() +--------------+--------------+----+---+ |array(a, a)[0]|array(a, a)[0]| a| a| +--------------+--------------+----+---+ | 1.00| 1.00|1.00|1.0| +--------------+--------------+----+---+ scala> sql("select array(d1.a,d2.a)[1],array(d2.a,d1.a)[1],* from d1, d2").show() +--------------+--------------+----+---+ |array(a, a)[1]|array(a, a)[1]| a| a| +--------------+--------------+----+---+ | 1.00| 1.00|1.00|1.0| +--------------+--------------+----+---+

And Finally, the following is the codegen result. Please see the line 29.

scala> sql("explain codegen select array(0.001, 0.02)[1]").collect().foreach(println) [Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 == *Project [0.02 AS array(0.001, 0.02)[1]#75] +- Scan OneRowRelation[] Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ private Object[] references; /* 007 */ private scala.collection.Iterator inputadapter_input; /* 008 */ private UnsafeRow project_result; /* 009 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder; /* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter project_rowWriter; /* 011 */ /* 012 */ public GeneratedIterator(Object[] references) { /* 013 */ this.references = references; /* 014 */ } /* 015 */ /* 016 */ public void init(int index, scala.collection.Iterator inputs[]) { /* 017 */ partitionIndex = index; /* 018 */ inputadapter_input = inputs[0]; /* 019 */ project_result = new UnsafeRow(1); /* 020 */ this.project_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 0); /* 021 */ this.project_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder, 1); /* 022 */ } /* 023 */ /* 024 */ protected void processNext() throws java.io.IOException { /* 025 */ while (inputadapter_input.hasNext()) { /* 026 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next(); /* 027 */ Object project_obj = ((Expression) references[0]).eval(null); /* 028 */ Decimal project_value = (Decimal) project_obj; /* 029 */ project_rowWriter.write(0, project_value, 3, 3); /* 030 */ append(project_result); /* 031 */ if (shouldStop()) return; /* 032 */ } /* 033 */ }

In short, those are recognized correctly in the Analyzed Logical Plan. As a result, the codegen correctly writes it with the unified precision and scale.

== Analyzed Logical Plan == a[0]: decimal(3,3), a[1]: decimal(3,3)

Is there anything to check more?

Hi, @yhuai .
Could you give me some advice?

dongjoon-hyun · 2016-07-27T12:03:23Z

Hi, @rxin .
Could you review this PR?

…ng different inferred precessions and scales

dongjoon-hyun · 2016-07-28T03:07:07Z

Rebased.

SparkQA · 2016-07-28T04:55:14Z

Test build #62954 has finished for PR 14353 at commit a095389.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-07-28T06:33:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+    var elementType: DataType = children.headOption.map(_.dataType).getOrElse(NullType)
+    if (elementType.isInstanceOf[DecimalType]) {
+      children.foreach { child =>
+        if (elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) {


i think this suffers from the same issue as the map pr.

Thank you, @rxin .
Yep. I've read you comment about the lose.
I'll check that and revise.

petermaxlee · 2016-07-28T07:38:22Z

@dongjoon-hyun I created a patch here: #14389

dongjoon-hyun · 2016-08-01T21:05:24Z

Close this for the better PR #14439

yhuai reviewed Jul 26, 2016
View reviewed changes

dongjoon-hyun mentioned this pull request Jul 27, 2016

[SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales #14374

Closed

[SPARK-16714][SQL] Fail to create a decimal arrays with literals havi…

a095389

…ng different inferred precessions and scales

rxin reviewed Jul 28, 2016
View reviewed changes

petermaxlee mentioned this pull request Jul 28, 2016

[SPARK-16714][SQL] Refactor type widening for consistency #14389

Closed

dongjoon-hyun closed this Aug 1, 2016

dongjoon-hyun deleted the SPARK-16714 branch August 14, 2016 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16714][SQL] `array` should create a decimal array from decimals with different precisions and scales #14353

[SPARK-16714][SQL] `array` should create a decimal array from decimals with different precisions and scales #14353

dongjoon-hyun commented Jul 25, 2016 •

edited

Loading

SparkQA commented Jul 26, 2016

yhuai Jul 26, 2016

yhuai Jul 26, 2016

dongjoon-hyun Jul 26, 2016

dongjoon-hyun Jul 26, 2016 •

edited

Loading

dongjoon-hyun Jul 26, 2016 •

edited

Loading

dongjoon-hyun Jul 26, 2016

dongjoon-hyun Jul 26, 2016

dongjoon-hyun Jul 26, 2016

dongjoon-hyun Jul 27, 2016

dongjoon-hyun commented Jul 27, 2016

dongjoon-hyun commented Jul 28, 2016

SparkQA commented Jul 28, 2016

rxin Jul 28, 2016

dongjoon-hyun Jul 28, 2016

petermaxlee commented Jul 28, 2016

dongjoon-hyun commented Aug 1, 2016

[SPARK-16714][SQL] array should create a decimal array from decimals with different precisions and scales #14353

[SPARK-16714][SQL] array should create a decimal array from decimals with different precisions and scales #14353

Conversation

dongjoon-hyun commented Jul 25, 2016 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jul 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun Jul 26, 2016 • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun Jul 26, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jul 27, 2016

dongjoon-hyun commented Jul 28, 2016

SparkQA commented Jul 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petermaxlee commented Jul 28, 2016

dongjoon-hyun commented Aug 1, 2016

[SPARK-16714][SQL] `array` should create a decimal array from decimals with different precisions and scales #14353

[SPARK-16714][SQL] `array` should create a decimal array from decimals with different precisions and scales #14353

dongjoon-hyun commented Jul 25, 2016 •

edited

Loading

dongjoon-hyun Jul 26, 2016 •

edited

Loading

dongjoon-hyun Jul 26, 2016 •

edited

Loading