[SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions #9093

JoshRosen · 2015-10-13T07:38:46Z

In the current implementation of named expressions' ExprIds, we rely on a per-JVM AtomicLong to ensure that expression ids are unique within a JVM. However, these expression ids will not be globally unique. This opens the potential for id collisions if new expression ids happen to be created inside of tasks rather than on the driver.

There are currently a few cases where tasks allocate expression ids, which happen to be safe because those expressions are never compared to expressions created on the driver. In order to guard against the introduction of invalid comparisons between driver-created and executor-created expression ids, this patch extends ExprId to incorporate a UUID to identify the JVM that created the id, which prevents collisions.

SparkQA · 2015-10-13T08:11:36Z

Test build #43632 has finished for PR 9093 at commit 48e3d1c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2015-10-13T18:42:04Z

Digging into the test failures here, it looks like a bunch of them are caused by the BindReferences call inside of AggregationIterator:

[info]   Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, tree: hyperloglogplusplus(a#4,0.04)
[info]      at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
[info]      at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:315)
[info]      at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:280)
[info]      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:232)
[info]      at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:217)
[info]      at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:85)
[info]      at org.apache.spark.sql.execution.aggregate.AggregationIterator.<init>(AggregationIterator.scala:93)
[info]      at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.<init>(SortBasedAggregationIterator.scala:29)
[info]      at org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1$$anonfun$2.apply(SortBasedAggregate.scala:86)
[info]      at org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1$$anonfun$2.apply(SortBasedAggregate.scala:72)
[info]      at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:700)
[info]      at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:700)
[info]      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
[info]      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
[info]      at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
[info]      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
[info]      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
[info]      at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
[info]      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
[info]      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
[info]      at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
[info]      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
[info]      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
[info]      at org.apache.spark.scheduler.Task.run(Task.scala:88)
[info]      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
[info]      ... 3 more
[info]   Caused by: java.lang.reflect.InvocationTargetException
[info]      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[info]      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
[info]      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[info]      at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
[info]      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1$$anonfun$apply$10.apply(TreeNode.scala:326)
[info]      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1$$anonfun$apply$10.apply(TreeNode.scala:325)
[info]      at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
[info]      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:323)
[info]      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:315)
[info]      at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
[info]      ... 27 more
[info]   Caused by: java.lang.IllegalStateException: Expression ids should not be allocated inside of tasks
[info]      at org.apache.spark.sql.catalyst.expressions.NamedExpression$.newExprId(namedExpressions.scala:30)
[info]      at org.apache.spark.sql.catalyst.expressions.AttributeReference$.apply$default$5(namedExpressions.scala:184)
[info]      at org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus$$anonfun$2.apply(functions.scala:546)
[info]      at org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus$$anonfun$2.apply(functions.scala:545)
[info]      at scala.collection.generic.GenTraversableFactory.tabulate(GenTraversableFactory.scala:149)
[info]      at org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus.<init>(functions.scala:545)
[info]      ... 37 more (QueryTest.scala:78)

Here, it looks like the copying of HyperLogLogPlus is causing problems because that ends up changing the expressionIds of its aggBufferAttributes.

cloud-fan · 2015-10-13T19:41:15Z

Now what this PR did doesn't conform to the title, can you update title and description?

JoshRosen · 2015-10-13T19:42:57Z

@cloud-fan, yep, planning to update shortly.

SparkQA · 2015-10-13T21:30:41Z

Test build #43662 has finished for PR 9093 at commit 955a1a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ChildProcAppHandle implements SparkAppHandle
- abstract class LauncherConnection implements Closeable, Runnable
- final class LauncherProtocol
- static class Message implements Serializable
- static class Hello extends Message
- static class SetAppId extends Message
- static class SetState extends Message
- static class Stop extends Message
- class LauncherServer implements Closeable
- class NamedThreadFactory implements ThreadFactory
- class OutputRedirector
- case class ExprId(id: Long, jvmId: UUID)

JoshRosen · 2015-10-13T21:30:53Z

Alright, updating now....

JoshRosen · 2015-10-13T21:35:19Z

Updated; PTAL @marmbrus.

marmbrus · 2015-10-13T21:50:50Z

LGTM pending tests

JoshRosen · 2015-10-13T21:52:00Z

It already passed tests as of the latest commit.

marmbrus · 2015-10-13T22:10:09Z

Merging to master.

davies · 2015-12-11T20:39:56Z

Cherry-picked into branch-1.5, to fix https://issues.apache.org/jira/browse/SPARK-11885

…afe cross-JVM comparisions In the current implementation of named expressions' `ExprIds`, we rely on a per-JVM AtomicLong to ensure that expression ids are unique within a JVM. However, these expression ids will not be _globally_ unique. This opens the potential for id collisions if new expression ids happen to be created inside of tasks rather than on the driver. There are currently a few cases where tasks allocate expression ids, which happen to be safe because those expressions are never compared to expressions created on the driver. In order to guard against the introduction of invalid comparisons between driver-created and executor-created expression ids, this patch extends `ExprId` to incorporate a UUID to identify the JVM that created the id, which prevents collisions. Author: Josh Rosen <joshrosen@databricks.com> Closes #9093 from JoshRosen/SPARK-11080.

Throw exception when NamedExpression.newExprId is called from task.

48e3d1c

JoshRosen mentioned this pull request Oct 13, 2015

[SPARK-11017] [SQL] Support ImperativeAggregates in TungstenAggregate #9038

Closed

JoshRosen changed the title ~~[SPARK-11080] Throw exception when NamedExpression.newExprId is called inside tasks~~ [SPARK-11080] [SQL] Throw exception when NamedExpression.newExprId is called inside tasks Oct 13, 2015

viirya mentioned this pull request Oct 13, 2015

[SPARK-11036][SQL] AttributeReference should not be assigned new expression id inside tasks #9094

Closed

Add per-JVM UUID

955a1a8

JoshRosen changed the title ~~[SPARK-11080] [SQL] Throw exception when NamedExpression.newExprId is called inside tasks~~ [SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions Oct 13, 2015

asfgit closed this in ef72673 Oct 13, 2015

JoshRosen deleted the SPARK-11080 branch December 15, 2015 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions #9093

[SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions #9093

JoshRosen commented Oct 13, 2015

SparkQA commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

cloud-fan commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

SparkQA commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

marmbrus commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

marmbrus commented Oct 13, 2015

davies commented Dec 11, 2015

[SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions #9093

[SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions #9093

Conversation

JoshRosen commented Oct 13, 2015

SparkQA commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

cloud-fan commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

SparkQA commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

marmbrus commented Oct 13, 2015

JoshRosen commented Oct 13, 2015

marmbrus commented Oct 13, 2015

davies commented Dec 11, 2015