[SPARK-7119][SQL]Give script a default serde with the user specific types #6638

zhichao-li · 2015-06-04T08:28:38Z

This is to address this issue that there would be not compatible type exception when running this:
from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2;

15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

@chenghao-intel @marmbrus

SparkQA · 2015-06-04T08:34:57Z

Test build #34172 has finished for PR 6638 at commit 31dec98.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-04T08:49:47Z

Test build #34174 has finished for PR 6638 at commit 300c031.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-06-05T00:48:55Z

@jameszhouyi Can you try this patch?
@viirya Can you give some comments for this?

viirya · 2015-06-05T01:25:07Z

@chenghao-intel Is it duplicate to #5688?

SparkQA · 2015-06-05T01:30:16Z

Test build #34234 has finished for PR 6638 at commit de413d4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-06-05T01:46:05Z

@viirya I think this PR just for fixing the bug when user specify the output schema, but #5688 will be more general to support user specified SerDe (and also the bug fixing). As the bug breaks our internally test for sometime, so we'd like this PR can go first, it's great appreciated if you can give some comments on the fixing.

chenghao-intel · 2015-06-05T06:21:07Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala

Keep it unchange, and leave the operator decide how to get the default serde

Or we should replace the output / input serde if it's not specified, but not by adding new field.

SparkQA · 2015-06-12T08:09:13Z

Test build #34764 has finished for PR 6638 at commit 5c0724b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SetInFilter[T <: Comparable[T]](

jameszhouyi · 2015-07-09T00:44:07Z

Hi,
I saw the 'Merged build finished. Test FAILed.' if there is a latest version for fix ?

zhichao-li · 2015-07-09T01:02:20Z

@jameszhouyi might not be an accepted version for the test failure. Will update and back to this shortly.

jameszhouyi · 2015-07-09T02:51:18Z

Thanks!

SparkQA · 2015-07-15T08:08:35Z

Test build #37331 has finished for PR 6638 at commit 6b3278b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-16T02:56:03Z

Test build #37437 has finished for PR 6638 at commit 2ee0488.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zhichao-li · 2015-07-16T03:17:18Z

retest this please.

SparkQA · 2015-07-16T04:43:26Z

Test build #37451 has finished for PR 6638 at commit 2ee0488.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-16T04:49:08Z

Test build #25 has finished for PR 6638 at commit 2ee0488.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-16T07:13:02Z

Test build #37462 has finished for PR 6638 at commit 4ab11b7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-07-16T08:24:16Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala

unwrap actually support StructObjectInspector, we don't need to extract every field here.
But, I prefer to reuse the mutableRow, which mean we don't need to create the mutableRow for every call of next().

SparkQA · 2015-07-17T02:13:06Z

Test build #37554 has finished for PR 6638 at commit a6a075e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhichao-li · 2015-07-17T07:15:43Z

cc @rxin @davies

rxin · 2015-07-17T07:16:50Z

cc @yhuai for this one ...

jameszhouyi · 2015-07-27T06:08:51Z

Apply this PR based on commit id 'c025c3d0a1fdfbc45b64db9c871176b40b4a7b9b' and the case relative to script transform can pass now.

chenghao-intel · 2015-07-27T06:11:22Z

LGTM.
cc @yhuai

JoshRosen · 2015-07-29T06:49:47Z

If it's important to get this in for 1.5.0 then we need to fix the conflicts and bring it up to date. This may be slightly non-trivial given the major cleanup / refactorings that I did in ScriptTransform in order to fix an error-handling bug / deadlock.

zhichao-li · 2015-07-29T08:10:52Z

Essentially not much code added for this pr, mainly delete some and always give the script a default serde. Would rebase the code shortly.

zhichao-li · 2015-07-29T08:56:05Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala

Maybe I need to add this back, but seems like if the child throw exception, then the actual result should be null which would cause checkAnswer throw not equal exception first instead of "intentional exception"

SparkQA · 2015-07-29T10:38:36Z

Test build #38825 has finished for PR 6638 at commit 14b892e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhichao-li · 2015-07-30T02:48:22Z

@JoshRosen Could you pls take a look at this changes? This pr just simply give the script a default serde if none of the formatter and serde is given. Previously I was thinking of removing the formatter and use serde only but seems like it's a valid use case so that part of logic is untouched.
.

SparkQA · 2015-07-30T04:13:09Z

Test build #38957 has finished for PR 6638 at commit f6968a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhichao-li · 2015-07-30T04:44:07Z

retest this please.

SparkQA · 2015-07-30T05:12:59Z

Test build #152 has finished for PR 6638 at commit b9252a8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zhichao-li · 2015-07-30T05:36:23Z

retest this please.

SparkQA · 2015-07-30T06:03:08Z

Test build #154 has finished for PR 6638 at commit b9252a8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-30T06:11:41Z

Test build #38988 has finished for PR 6638 at commit b9252a8.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-30T07:22:58Z

Test build #38997 has finished for PR 6638 at commit b9252a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-07-31T02:35:02Z

LTGM, can you updating the description?

We don't support user specified input/output format yet.
The exception stack

marmbrus · 2015-08-04T01:22:45Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala

style: val columnTypes = attrs.map(_.dataType)

SparkQA · 2015-08-04T03:39:57Z

Test build #39632 has finished for PR 6638 at commit a36cc7c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

… types This is to address this issue that there would be not compatible type exception when running this: `from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2;` 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) chenghao-intel marmbrus Author: zhichao.li <zhichao.li@intel.com> Closes #6638 from zhichao-li/transDataType2 and squashes the following commits: a36cc7c [zhichao.li] style b9252a8 [zhichao.li] delete cacheRow f6968a4 [zhichao.li] give script a default serde (cherry picked from commit 6f8f0e2) Signed-off-by: Michael Armbrust <michael@databricks.com>

marmbrus · 2015-08-05T01:26:45Z

Thanks, merged to master and 1.5

chenghao-intel reviewed Jun 5, 2015
View reviewed changes

zhichao-li force-pushed the transDataType2 branch from de413d4 to 5c0724b Compare June 12, 2015 07:17

zhichao-li force-pushed the transDataType2 branch from 5c0724b to 6b3278b Compare July 15, 2015 06:49

zhichao-li force-pushed the transDataType2 branch from 2ee0488 to 4ab11b7 Compare July 16, 2015 05:36

chenghao-intel reviewed Jul 16, 2015
View reviewed changes

zhichao-li force-pushed the transDataType2 branch from a6a075e to 14b892e Compare July 29, 2015 08:54

zhichao-li reviewed Jul 29, 2015
View reviewed changes

give script a default serde

f6968a4

zhichao-li force-pushed the transDataType2 branch from 14b892e to f6968a4 Compare July 30, 2015 02:39

delete cacheRow

b9252a8

JoshRosen mentioned this pull request Aug 1, 2015

[SPARK-7119][SQL] ScriptTransform should also consider the output data type when no serde used #5688

Closed

marmbrus reviewed Aug 4, 2015
View reviewed changes

style

a36cc7c

asfgit closed this in 6f8f0e2 Aug 5, 2015

[SPARK-7119][SQL]Give script a default serde with the user specific types #6638

[SPARK-7119][SQL]Give script a default serde with the user specific types #6638

Uh oh!

Conversation

zhichao-li commented Jun 4, 2015

Uh oh!

SparkQA commented Jun 4, 2015

Uh oh!

SparkQA commented Jun 4, 2015

Uh oh!

chenghao-intel commented Jun 5, 2015

Uh oh!

viirya commented Jun 5, 2015

Uh oh!

SparkQA commented Jun 5, 2015

Uh oh!

chenghao-intel commented Jun 5, 2015

Uh oh!

chenghao-intel Jun 5, 2015

Choose a reason for hiding this comment

Uh oh!

chenghao-intel Jun 5, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 12, 2015

Uh oh!

jameszhouyi commented Jul 9, 2015

Uh oh!

zhichao-li commented Jul 9, 2015

Uh oh!

jameszhouyi commented Jul 9, 2015

Uh oh!

SparkQA commented Jul 15, 2015

Uh oh!

SparkQA commented Jul 16, 2015

Uh oh!

zhichao-li commented Jul 16, 2015

Uh oh!

SparkQA commented Jul 16, 2015

Uh oh!

SparkQA commented Jul 16, 2015

Uh oh!

SparkQA commented Jul 16, 2015

Uh oh!

chenghao-intel Jul 16, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 17, 2015

Uh oh!

zhichao-li commented Jul 17, 2015

Uh oh!

rxin commented Jul 17, 2015

Uh oh!

jameszhouyi commented Jul 27, 2015

Uh oh!

chenghao-intel commented Jul 27, 2015

Uh oh!

JoshRosen commented Jul 29, 2015

Uh oh!

zhichao-li commented Jul 29, 2015

Uh oh!

zhichao-li Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 29, 2015

Uh oh!

zhichao-li commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!

zhichao-li commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!

zhichao-li commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!