Skip to content

Conversation

marmbrus
Copy link
Contributor

@marmbrus marmbrus commented Jun 7, 2014

Basically there is a race condition (possibly a scala bug?) when these values are recomputed on all of the slaves that results in an incorrect projection being generated (possibly because the GUID uniqueness contract is broken?).

In general we should probably enforce that all expression planing occurs on the driver, as is now occurring here.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15522/

@marmbrus
Copy link
Contributor Author

marmbrus commented Jun 7, 2014

test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15524/

@marmbrus
Copy link
Contributor Author

marmbrus commented Jun 7, 2014

@rxin I added https://issues.apache.org/jira/browse/SPARK-2068 to track other places where we need to fix this, but we should probably just merge this one right away.

@rxin
Copy link
Contributor

rxin commented Jun 7, 2014

How big does the closure size increase by?

@marmbrus
Copy link
Contributor Author

marmbrus commented Jun 7, 2014

Is there an easy way to measure that?

Either way it was wrong before and I don't think making it possible to plan
on the slaves is worth the effort. Doing so subtly breaks the expression
guid contact independent of the issues with concurrency.
On Jun 7, 2014 2:06 PM, "Reynold Xin" notifications@github.com wrote:

How big does the closure size increase by?


Reply to this email directly or view it on GitHub
#1004 (comment).

@rxin
Copy link
Contributor

rxin commented Jun 7, 2014

I'm going to merge this. YOu can test this easily by looking at the log. Spark tells you the size of the task closure and how long it takes to serialize each of them in the info log.

@rxin
Copy link
Contributor

rxin commented Jun 7, 2014

Merged in master & branch-1.0.

@asfgit asfgit closed this in a6c72ab Jun 7, 2014
@rxin
Copy link
Contributor

rxin commented Jun 7, 2014

One reason we had to add @transient lazy val is due to the lack of an init method on each partition for operators. I think there are benefits of adding that - it makes clear and explicit about object initialization, and then you can probably avoid this problem.

asfgit pushed a commit that referenced this pull request Jun 7, 2014
… data in HDFS

Basically there is a race condition (possibly a scala bug?) when these values are recomputed on all of the slaves that results in an incorrect projection being generated (possibly because the GUID uniqueness contract is broken?).

In general we should probably enforce that all expression planing occurs on the driver, as is now occurring here.

Author: Michael Armbrust <michael@databricks.com>

Closes #1004 from marmbrus/fixAggBug and squashes the following commits:

e0c116c [Michael Armbrust] Compute aggregate expression during planning instead of lazily on workers.

(cherry picked from commit a6c72ab)
Signed-off-by: Reynold Xin <rxin@apache.org>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
… data in HDFS

Basically there is a race condition (possibly a scala bug?) when these values are recomputed on all of the slaves that results in an incorrect projection being generated (possibly because the GUID uniqueness contract is broken?).

In general we should probably enforce that all expression planing occurs on the driver, as is now occurring here.

Author: Michael Armbrust <michael@databricks.com>

Closes apache#1004 from marmbrus/fixAggBug and squashes the following commits:

e0c116c [Michael Armbrust] Compute aggregate expression during planning instead of lazily on workers.
@marmbrus marmbrus deleted the fixAggBug branch July 8, 2014 22:50
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
… data in HDFS

Basically there is a race condition (possibly a scala bug?) when these values are recomputed on all of the slaves that results in an incorrect projection being generated (possibly because the GUID uniqueness contract is broken?).

In general we should probably enforce that all expression planing occurs on the driver, as is now occurring here.

Author: Michael Armbrust <michael@databricks.com>

Closes apache#1004 from marmbrus/fixAggBug and squashes the following commits:

e0c116c [Michael Armbrust] Compute aggregate expression during planning instead of lazily on workers.
wangyum pushed a commit that referenced this pull request May 26, 2023
…edException (#1004)

* [CARMEL-6072] Return more information in SchemaColumnConvertNotSupportedException

* [CARMEL-6072] Return more information in SchemaColumnConvertNotSupportedException

* [CARMEL-6072] Return more information in SchemaColumnConvertNotSupportedException

* [CARMEL-6072] Return more information in SchemaColumnConvertNotSupportedException

* [CARMEL-6072] Return more information in SchemaColumnConvertNotSupportedException
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants