-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7119][SQL]Give script a default serde with the user specific types #6638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #34172 has finished for PR 6638 at commit
|
|
Test build #34174 has finished for PR 6638 at commit
|
|
@jameszhouyi Can you try this patch? |
|
@chenghao-intel Is it duplicate to #5688? |
|
Test build #34234 has finished for PR 6638 at commit
|
|
@viirya I think this PR just for fixing the bug when user specify the output schema, but #5688 will be more general to support user specified SerDe (and also the bug fixing). As the bug breaks our internally test for sometime, so we'd like this PR can go first, it's great appreciated if you can give some comments on the fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep it unchange, and leave the operator decide how to get the default serde
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we should replace the output / input serde if it's not specified, but not by adding new field.
de413d4 to
5c0724b
Compare
|
Test build #34764 has finished for PR 6638 at commit
|
|
Hi, |
|
@jameszhouyi might not be an accepted version for the test failure. Will update and back to this shortly. |
|
Thanks! |
5c0724b to
6b3278b
Compare
|
Test build #37331 has finished for PR 6638 at commit
|
|
Test build #37437 has finished for PR 6638 at commit
|
|
retest this please. |
|
Test build #37451 has finished for PR 6638 at commit
|
|
Test build #25 has finished for PR 6638 at commit
|
2ee0488 to
4ab11b7
Compare
|
Test build #37462 has finished for PR 6638 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwrap actually support StructObjectInspector, we don't need to extract every field here.
But, I prefer to reuse the mutableRow, which mean we don't need to create the mutableRow for every call of next().
|
Test build #37554 has finished for PR 6638 at commit
|
|
cc @yhuai for this one ... |
|
Apply this PR based on commit id 'c025c3d0a1fdfbc45b64db9c871176b40b4a7b9b' and the case relative to script transform can pass now. |
|
LGTM. |
|
If it's important to get this in for 1.5.0 then we need to fix the conflicts and bring it up to date. This may be slightly non-trivial given the major cleanup / refactorings that I did in ScriptTransform in order to fix an error-handling bug / deadlock. |
|
Essentially not much code added for this pr, mainly delete some and always give the script a default serde. Would rebase the code shortly. |
a6a075e to
14b892e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I need to add this back, but seems like if the child throw exception, then the actual result should be null which would cause checkAnswer throw not equal exception first instead of "intentional exception"
|
Test build #38825 has finished for PR 6638 at commit
|
14b892e to
f6968a4
Compare
|
@JoshRosen Could you pls take a look at this changes? This pr just simply give the script a default serde if none of the |
|
Test build #38957 has finished for PR 6638 at commit
|
|
retest this please. |
|
Test build #152 has finished for PR 6638 at commit
|
|
retest this please. |
|
Test build #154 has finished for PR 6638 at commit
|
|
Test build #38988 has finished for PR 6638 at commit
|
|
Test build #38997 has finished for PR 6638 at commit
|
|
LTGM, can you updating the description?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: val columnTypes = attrs.map(_.dataType)
|
Test build #39632 has finished for PR 6638 at commit
|
… types This is to address this issue that there would be not compatible type exception when running this: `from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2;` 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) chenghao-intel marmbrus Author: zhichao.li <zhichao.li@intel.com> Closes #6638 from zhichao-li/transDataType2 and squashes the following commits: a36cc7c [zhichao.li] style b9252a8 [zhichao.li] delete cacheRow f6968a4 [zhichao.li] give script a default serde (cherry picked from commit 6f8f0e2) Signed-off-by: Michael Armbrust <michael@databricks.com>
|
Thanks, merged to master and 1.5 |
This is to address this issue that there would be not compatible type exception when running this:
from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2;15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
@chenghao-intel @marmbrus