Skip to content

Conversation

@zhichao-li
Copy link
Contributor

This is to address this issue that there would be not compatible type exception when running this:
from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2;

15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

@chenghao-intel @marmbrus

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34172 has finished for PR 6638 at commit 31dec98.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34174 has finished for PR 6638 at commit 300c031.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

@jameszhouyi Can you try this patch?
@viirya Can you give some comments for this?

@viirya
Copy link
Member

viirya commented Jun 5, 2015

@chenghao-intel Is it duplicate to #5688?

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34234 has finished for PR 6638 at commit de413d4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

@viirya I think this PR just for fixing the bug when user specify the output schema, but #5688 will be more general to support user specified SerDe (and also the bug fixing). As the bug breaks our internally test for sometime, so we'd like this PR can go first, it's great appreciated if you can give some comments on the fixing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it unchange, and leave the operator decide how to get the default serde

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we should replace the output / input serde if it's not specified, but not by adding new field.

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34764 has finished for PR 6638 at commit 5c0724b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SetInFilter[T <: Comparable[T]](

@jameszhouyi
Copy link

Hi,
I saw the 'Merged build finished. Test FAILed.' if there is a latest version for fix ?

@zhichao-li
Copy link
Contributor Author

@jameszhouyi might not be an accepted version for the test failure. Will update and back to this shortly.

@jameszhouyi
Copy link

Thanks!

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37331 has finished for PR 6638 at commit 6b3278b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37437 has finished for PR 6638 at commit 2ee0488.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhichao-li
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37451 has finished for PR 6638 at commit 2ee0488.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #25 has finished for PR 6638 at commit 2ee0488.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37462 has finished for PR 6638 at commit 4ab11b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrap actually support StructObjectInspector, we don't need to extract every field here.
But, I prefer to reuse the mutableRow, which mean we don't need to create the mutableRow for every call of next().

@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37554 has finished for PR 6638 at commit a6a075e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhichao-li
Copy link
Contributor Author

cc @rxin @davies

@rxin
Copy link
Contributor

rxin commented Jul 17, 2015

cc @yhuai for this one ...

@jameszhouyi
Copy link

Apply this PR based on commit id 'c025c3d0a1fdfbc45b64db9c871176b40b4a7b9b' and the case relative to script transform can pass now.

@chenghao-intel
Copy link
Contributor

LGTM.
cc @yhuai

@JoshRosen
Copy link
Contributor

If it's important to get this in for 1.5.0 then we need to fix the conflicts and bring it up to date. This may be slightly non-trivial given the major cleanup / refactorings that I did in ScriptTransform in order to fix an error-handling bug / deadlock.

@zhichao-li
Copy link
Contributor Author

Essentially not much code added for this pr, mainly delete some and always give the script a default serde. Would rebase the code shortly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I need to add this back, but seems like if the child throw exception, then the actual result should be null which would cause checkAnswer throw not equal exception first instead of "intentional exception"

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38825 has finished for PR 6638 at commit 14b892e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhichao-li
Copy link
Contributor Author

@JoshRosen Could you pls take a look at this changes? This pr just simply give the script a default serde if none of the formatter and serde is given. Previously I was thinking of removing the formatter and use serde only but seems like it's a valid use case so that part of logic is untouched.
.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38957 has finished for PR 6638 at commit f6968a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhichao-li
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #152 has finished for PR 6638 at commit b9252a8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhichao-li
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #154 has finished for PR 6638 at commit b9252a8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38988 has finished for PR 6638 at commit b9252a8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38997 has finished for PR 6638 at commit b9252a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

LTGM, can you updating the description?

  • We don't support user specified input/output format yet.
  • The exception stack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: val columnTypes = attrs.map(_.dataType)

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39632 has finished for PR 6638 at commit a36cc7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Aug 5, 2015
… types

This is to address this issue that there would be not compatible type exception when running this:
`from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2;`

15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer
	at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
	at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
	at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
	at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
	at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
	at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
	at scala.collection.AbstractIterator.to(Iterator.scala:1157)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
	at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:64)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)

chenghao-intel marmbrus

Author: zhichao.li <zhichao.li@intel.com>

Closes #6638 from zhichao-li/transDataType2 and squashes the following commits:

a36cc7c [zhichao.li] style
b9252a8 [zhichao.li] delete cacheRow
f6968a4 [zhichao.li] give script a default serde

(cherry picked from commit 6f8f0e2)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@marmbrus
Copy link
Contributor

marmbrus commented Aug 5, 2015

Thanks, merged to master and 1.5

@asfgit asfgit closed this in 6f8f0e2 Aug 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants