SPARK 1084.1 (resubmitted) #31

srowen · 2014-02-27T11:51:06Z

(Ported from https://github.com/apache/incubator-spark/pull/637 )

…with plugin config updates

…ge code to avoid it

AmplabJenkins · 2014-02-27T11:53:50Z

Merged build triggered.

AmplabJenkins · 2014-02-27T11:53:51Z

Merged build started.

AmplabJenkins · 2014-02-27T11:54:00Z

Merged build triggered.

AmplabJenkins · 2014-02-27T12:23:05Z

Merged build finished.

AmplabJenkins · 2014-02-27T12:23:05Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12917/

pwendell · 2014-02-27T19:12:08Z

Looks good, merging this into master. Thanks!

Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies.

(Ported from https://github.com/apache/incubator-spark/pull/637 ) Author: Sean Owen <sowen@cloudera.com> Closes apache#31 from srowen/SPARK-1084.1 and squashes the following commits: 6c4a32c [Sean Owen] Suppress warnings about legitimate unchecked array creations, or change code to avoid it f35b833 [Sean Owen] Fix two misc javadoc problems 254e8ef [Sean Owen] Fix one new style error introduced in scaladoc warning commit 5b2fce2 [Sean Owen] Fix scaladoc invocation warning, and enable javac warnings properly, with plugin config updates 007762b [Sean Owen] Remove dead scaladoc links b8ff8cb [Sean Owen] Replace deprecated Ant <tasks> with <target> Conflicts: bagel/src/main/scala/org/apache/spark/bagel/Bagel.scala core/src/main/scala/org/apache/spark/util/StatCounter.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala

## What changes were proposed in this pull request? After we move the ExtractPythonUDF rule into physical plan, Python UDF can't work on top of aggregate anymore, because they can't be evaluated before aggregate, should be evaluated after aggregate. This PR add another rule to extract these kind of Python UDF from logical aggregate, create a Project on top of Aggregate. ## How was this patch tested? Added regression tests. The plan of added test query looks like this: ``` == Parsed Logical Plan == 'Project [<lambda>('k, 's) AS t#26] +- Aggregate [<lambda>(key#5L)], [<lambda>(key#5L) AS k#17, sum(cast(<lambda>(value#6) as bigint)) AS s#22L] +- LogicalRDD [key#5L, value#6] == Analyzed Logical Plan == t: int Project [<lambda>(k#17, s#22L) AS t#26] +- Aggregate [<lambda>(key#5L)], [<lambda>(key#5L) AS k#17, sum(cast(<lambda>(value#6) as bigint)) AS s#22L] +- LogicalRDD [key#5L, value#6] == Optimized Logical Plan == Project [<lambda>(agg#29, agg#30L) AS t#26] +- Aggregate [<lambda>(key#5L)], [<lambda>(key#5L) AS agg#29, sum(cast(<lambda>(value#6) as bigint)) AS agg#30L] +- LogicalRDD [key#5L, value#6] == Physical Plan == *Project [pythonUDF0#37 AS t#26] +- BatchEvalPython [<lambda>(agg#29, agg#30L)], [agg#29, agg#30L, pythonUDF0#37] +- *HashAggregate(key=[<lambda>(key#5L)#31], functions=[sum(cast(<lambda>(value#6) as bigint))], output=[agg#29,agg#30L]) +- Exchange hashpartitioning(<lambda>(key#5L)#31, 200) +- *HashAggregate(key=[pythonUDF0#34 AS <lambda>(key#5L)#31], functions=[partial_sum(cast(pythonUDF1#35 as bigint))], output=[<lambda>(key#5L)#31,sum#33L]) +- BatchEvalPython [<lambda>(key#5L), <lambda>(value#6)], [key#5L, value#6, pythonUDF0#34, pythonUDF1#35] +- Scan ExistingRDD[key#5L,value#6] ``` Author: Davies Liu <davies@databricks.com> Closes #13682 from davies/fix_py_udf. (cherry picked from commit 5389013) Signed-off-by: Davies Liu <davies.liu@gmail.com>

## What changes were proposed in this pull request? After we move the ExtractPythonUDF rule into physical plan, Python UDF can't work on top of aggregate anymore, because they can't be evaluated before aggregate, should be evaluated after aggregate. This PR add another rule to extract these kind of Python UDF from logical aggregate, create a Project on top of Aggregate. ## How was this patch tested? Added regression tests. The plan of added test query looks like this: ``` == Parsed Logical Plan == 'Project [<lambda>('k, 's) AS t#26] +- Aggregate [<lambda>(key#5L)], [<lambda>(key#5L) AS k#17, sum(cast(<lambda>(value#6) as bigint)) AS s#22L] +- LogicalRDD [key#5L, value#6] == Analyzed Logical Plan == t: int Project [<lambda>(k#17, s#22L) AS t#26] +- Aggregate [<lambda>(key#5L)], [<lambda>(key#5L) AS k#17, sum(cast(<lambda>(value#6) as bigint)) AS s#22L] +- LogicalRDD [key#5L, value#6] == Optimized Logical Plan == Project [<lambda>(agg#29, agg#30L) AS t#26] +- Aggregate [<lambda>(key#5L)], [<lambda>(key#5L) AS agg#29, sum(cast(<lambda>(value#6) as bigint)) AS agg#30L] +- LogicalRDD [key#5L, value#6] == Physical Plan == *Project [pythonUDF0#37 AS t#26] +- BatchEvalPython [<lambda>(agg#29, agg#30L)], [agg#29, agg#30L, pythonUDF0#37] +- *HashAggregate(key=[<lambda>(key#5L)apache#31], functions=[sum(cast(<lambda>(value#6) as bigint))], output=[agg#29,agg#30L]) +- Exchange hashpartitioning(<lambda>(key#5L)apache#31, 200) +- *HashAggregate(key=[pythonUDF0#34 AS <lambda>(key#5L)apache#31], functions=[partial_sum(cast(pythonUDF1#35 as bigint))], output=[<lambda>(key#5L)apache#31,sum#33L]) +- BatchEvalPython [<lambda>(key#5L), <lambda>(value#6)], [key#5L, value#6, pythonUDF0#34, pythonUDF1#35] +- Scan ExistingRDD[key#5L,value#6] ``` Author: Davies Liu <davies@databricks.com> Closes apache#13682 from davies/fix_py_udf.

port index modification and cli 0.5.10

Add OS_VPC_ID environment variable for Telefonica ACC tests

…anExec

… - release 3.1.1.3 (apache#31)

srowen added 6 commits February 27, 2014 08:01

Replace deprecated Ant <tasks> with <target>

b8ff8cb

Remove dead scaladoc links

007762b

Fix scaladoc invocation warning, and enable javac warnings properly, …

5b2fce2

…with plugin config updates

Fix one new style error introduced in scaladoc warning commit

254e8ef

Fix two misc javadoc problems

f35b833

Suppress warnings about legitimate unchecked array creations, or chan…

6c4a32c

…ge code to avoid it

asfgit closed this in 12bbca2 Feb 27, 2014

srowen deleted the SPARK-1084.1 branch March 2, 2014 23:20

jhartlaub referenced this pull request in jhartlaub/spark May 27, 2014

Merge pull request alteryx#31 from sundeepn/branch-0.8

023e3fd

Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies.

lins05 pushed a commit to lins05/spark that referenced this pull request Jan 22, 2017

Fix spacing for command highlighting (apache#31)

486bdbe

lins05 pushed a commit to lins05/spark that referenced this pull request Apr 23, 2017

Fix spacing for command highlighting (apache#31)

3e3c4d4

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017

Fix spacing for command highlighting (apache#31)

a89b4b0

Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018

Merge pull request apache#31 from mesosphere/port-index

3723ee9

port index modification and cli 0.5.10

heary-cao mentioned this pull request Nov 5, 2018

[SPARK-24066][SQL]Add new optimization rule to eliminate unnecessary sort by exchanged adjacent Window expressions #22945

Closed

bzhaoopenstack referenced this pull request in bzhaoopenstack/spark Sep 11, 2019

Merge pull request theopenlab#31 from liu-sheng/30

4691d11

Add OS_VPC_ID environment variable for Telefonica ACC tests

hn5092 pushed a commit to hn5092/spark that referenced this pull request Sep 29, 2019

apache#31 add read bytes metric in FileSourceScanExec and HiveTableSc…

e8a5378

…anExec

maropu mentioned this pull request Jun 3, 2020

[SPARK-31670][SQL] Trim unnecessary Struct field alias in Aggregate/GroupingSets #28490

Closed

redsanket pushed a commit to redsanket/spark that referenced this pull request Feb 16, 2021

Fix typos (apache#31)

2fbd75a

SirOibaf pushed a commit to SirOibaf/spark that referenced this pull request Nov 21, 2022

[HOPSWORKS-3233] Timestamp incompatibility Spark/Hive/Hudi - Hive fix…

6ac786b

… - release 3.1.1.3 (apache#31)

wangyum pushed a commit that referenced this pull request May 26, 2023

[CARMEL-3525] support spillable large result set (#31)

f42b52b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK 1084.1 (resubmitted) #31

SPARK 1084.1 (resubmitted) #31

srowen commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

pwendell commented Feb 27, 2014

SPARK 1084.1 (resubmitted) #31

SPARK 1084.1 (resubmitted) #31

Conversation

srowen commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

pwendell commented Feb 27, 2014