[SPARK-14390][GraphX] Make initialization step in Pregel optional. #12159

adeandrade · 2016-04-04T23:08:18Z

What changes were proposed in this pull request?

Suppose a sendMsg function depends on the state of a edge's vertices to send messages, and those Pregel messages update such state. In this scenario, initialMsg will initially enforce the same message on all vertices, effectively removing a custom initialization one may have per vertex.

To deal with this situation, we must define a dummy initMsg (i.e. None), and all Pregel functions must be modified to handle this type of message. A simpler and less cumbersome solution is to make initMsg and the initialization step in Pregel optional.

How was this patch tested?

No new tests were added. Previous functionality was kept.

dbtsai · 2016-04-04T23:19:59Z

Jenkins, test this please.

SparkQA · 2016-04-04T23:25:02Z

Test build #54908 has finished for PR 12159 at commit 611fe6b.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#11796 from vanzin/SPARK-13579.

## What changes were proposed in this pull request? Remove sbt-idea plugin as importing sbt project provides much better support. Author: Luciano Resende <lresende@apache.org> Closes apache#12151 from lresende/SPARK-14366.

Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and preserving the partitioner. Author: Guillaume Poulin <poulin.guillaume@gmail.com> Closes apache#10382 from gpoulin/dstream_union_optimisation.

With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will not make sense, e.g. `Dataset.count()`. A simple API is required to check whether the data in a Dataset is bounded or unbounded. This will allow users to check whether their Dataset is in streaming mode or not. ML algorithms may check if the data is unbounded and throw an exception for example. The implementation of this method is simple, however naming it is the challenge. Some possible names for this method are: - isStreaming - isContinuous - isBounded - isUnbounded I've gone with `isStreaming` for now. We can change it before Spark 2.0 if we decide to come up with a different name. For that reason I've marked it as `Experimental` Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#12080 from brkyvz/is-streaming.

…oncrete types ## What changes were proposed in this pull request? In spark.ml, GBT and RandomForest expose the trait DecisionTreeModel in the trees method, but they should not since it is a private trait (and not ready to be made public). It will also be more useful to users if we return the concrete types. This PR: return concrete types The MIMA checks appear to be OK with this change. ## How was this patch tested? Existing unit tests Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#12158 from jkbradley/hide-dtm.

…case unit. ## What changes were proposed in this pull request? This fix tries to address the issue in PySpark where `spark.python.worker.memory` could only be configured with a lower case unit (`k`, `m`, `g`, `t`). This fix allows the upper case unit (`K`, `M`, `G`, `T`) to be used as well. This is to conform to the JVM memory string as is specified in the documentation . ## How was this patch tested? This fix adds additional test to cover the changes. Author: Yong Tang <yong.tang.github@outlook.com> Closes apache#12163 from yongtang/SPARK-14368.

## What changes were proposed in this pull request? This adds the corresponding Java static functions for built-in typed aggregates already exposed in Scala. ## How was this patch tested? Unit tests. rxin Author: Eric Liang <ekl@databricks.com> Closes apache#12168 from ericl/sc-2794.

…mand ## What changes were proposed in this pull request? This PR adds Native execution of SHOW TBLPROPERTIES command. Command Syntax: ``` SQL SHOW TBLPROPERTIES table_name[(property_key_literal)] ``` ## How was this patch tested? Tests added in HiveComandSuiie and DDLCommandSuite Author: Dilip Biswal <dbiswal@us.ibm.com> Closes apache#12133 from dilipbiswal/dkb_show_tblproperties.

…/DDL in SQL Context. #### What changes were proposed in this pull request? Currently, the weird error messages are issued if we use Hive Context-only operations in SQL Context. For example, - When calling `Drop Table` in SQL Context, we got the following message: ``` Expected exception org.apache.spark.sql.catalyst.parser.ParseException to be thrown, but java.lang.ClassCastException was thrown. ``` - When calling `Script Transform` in SQL Context, we got the message: ``` assertion failed: No plan for ScriptTransformation [key#9,value#10], cat, [tKey#155,tValue#156], null +- LogicalRDD [key#9,value#10], MapPartitionsRDD[3] at beforeAll at BeforeAndAfterAll.scala:187 ``` Updates: Based on the investigation from hvanhovell , the root cause is `visitChildren`, which is the default implementation. It always returns the result of the last defined context child. After merging the code changes from hvanhovell , it works! Thank you hvanhovell ! #### How was this patch tested? A few test cases are added. Not sure if the same issue exist for the other operators/DDL/DML. hvanhovell Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Herman van Hovell <hvanhovell@questtec.nl> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes apache#12134 from gatorsmile/hiveParserCommand.

dbtsai · 2016-04-08T22:31:07Z

Jenkins, test this please.

SparkQA · 2016-04-08T22:36:44Z

Test build #55401 has finished for PR 12159 at commit 4edccf3.

This patch fails R style tests.
This patch does not merge cleanly.
This patch adds no public classes.

…ndrade/spark into pregel-optional-initmessage # Conflicts: # graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala # graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala

Make initialization step in Pregel optional.

611fe6b

Marcelo Vanzin and others added 12 commits April 4, 2016 16:52

[SPARK-14366] Remove sbt-idea plugin

a172e11

## What changes were proposed in this pull request? Remove sbt-idea plugin as importing sbt project provides much better support. Author: Luciano Resende <lresende@apache.org> Closes apache#12151 from lresende/SPARK-14366.

[SPARK-12425][STREAMING] DStream union optimisation

7201f03

Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and preserving the partitioner. Author: Guillaume Poulin <poulin.guillaume@gmail.com> Closes apache#10382 from gpoulin/dstream_union_optimisation.

Style correction.

6eb1a79

Update GraphX lib.

44a26e2

Overload pregel method properly.

4edccf3

Anderson de Andrade added 5 commits April 8, 2016 19:06

Make initialization step in Pregel optional.

02275d2

Style correction.

0bbfab2

Update GraphX lib.

102c400

Overload pregel method properly.

2d5004c

Merge branch 'pregel-optional-initmessage' of https://github.com/adea…

ea5c15b

…ndrade/spark into pregel-optional-initmessage # Conflicts: # graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala # graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala

adeandrade closed this Apr 8, 2016

adeandrade deleted the pregel-optional-initmessage branch April 8, 2016 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14390][GraphX] Make initialization step in Pregel optional. #12159

[SPARK-14390][GraphX] Make initialization step in Pregel optional. #12159

Uh oh!

adeandrade commented Apr 4, 2016

Uh oh!

dbtsai commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

dbtsai commented Apr 8, 2016

Uh oh!

SparkQA commented Apr 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

[SPARK-14390][GraphX] Make initialization step in Pregel optional. #12159

[SPARK-14390][GraphX] Make initialization step in Pregel optional. #12159

Uh oh!

Conversation

adeandrade commented Apr 4, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dbtsai commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

dbtsai commented Apr 8, 2016

Uh oh!

SparkQA commented Apr 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants