Skip to content

Conversation

@adeandrade
Copy link

What changes were proposed in this pull request?

Suppose a sendMsg function depends on the state of a edge's vertices to send messages, and those Pregel messages update such state. In this scenario, initialMsg will initially enforce the same message on all vertices, effectively removing a custom initialization one may have per vertex.

To deal with this situation, we must define a dummy initMsg (i.e. None), and all Pregel functions must be modified to handle this type of message. A simpler and less cumbersome solution is to make initMsg and the initialization step in Pregel optional.

How was this patch tested?

No new tests were added. Previous functionality was kept.

@dbtsai
Copy link
Member

dbtsai commented Apr 4, 2016

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 4, 2016

Test build #54908 has finished for PR 12159 at commit 611fe6b.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Marcelo Vanzin and others added 12 commits April 4, 2016 16:52
This change modifies the "assembly/" module to just copy needed
dependencies to its build directory, and modifies the packaging
script to pick those up (and remove duplicate jars packages in the
examples module).

I also made some minor adjustments to dependencies to remove some
test jars from the final packaging, and remove jars that conflict with each
other when packaged separately (e.g. servlet api).

Also note that this change restores guava in applications' classpaths, even
though it's still shaded inside Spark. This is now needed for the Hadoop
libraries that are packaged with Spark, which now are not processed by
the shade plugin.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#11796 from vanzin/SPARK-13579.
## What changes were proposed in this pull request?

Remove sbt-idea plugin as importing sbt project provides much better support.

Author: Luciano Resende <lresende@apache.org>

Closes apache#12151 from lresende/SPARK-14366.
Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and
preserving the partitioner.

Author: Guillaume Poulin <poulin.guillaume@gmail.com>

Closes apache#10382 from gpoulin/dstream_union_optimisation.
With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will not make sense, e.g. `Dataset.count()`.

A simple API is required to check whether the data in a Dataset is bounded or unbounded. This will allow users to check whether their Dataset is in streaming mode or not. ML algorithms may check if the data is unbounded and throw an exception for example.

The implementation of this method is simple, however naming it is the challenge. Some possible names for this method are:
 - isStreaming
 - isContinuous
 - isBounded
 - isUnbounded

I've gone with `isStreaming` for now. We can change it before Spark 2.0 if we decide to come up with a different name. For that reason I've marked it as `Experimental`

Author: Burak Yavuz <brkyvz@gmail.com>

Closes apache#12080 from brkyvz/is-streaming.
…oncrete types

## What changes were proposed in this pull request?

In spark.ml, GBT and RandomForest expose the trait DecisionTreeModel in the trees method, but they should not since it is a private trait (and not ready to be made public). It will also be more useful to users if we return the concrete types.

This PR: return concrete types

The MIMA checks appear to be OK with this change.

## How was this patch tested?

Existing unit tests

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#12158 from jkbradley/hide-dtm.
…case unit.

## What changes were proposed in this pull request?

This fix tries to address the issue in PySpark where `spark.python.worker.memory`
could only be configured with a lower case unit (`k`, `m`, `g`, `t`). This fix
allows the upper case unit (`K`, `M`, `G`, `T`) to be used as well. This is to
conform to the JVM memory string as is specified in the documentation .

## How was this patch tested?

This fix adds additional test to cover the changes.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes apache#12163 from yongtang/SPARK-14368.
## What changes were proposed in this pull request?

This adds the corresponding Java static functions for built-in typed aggregates already exposed in Scala.

## How was this patch tested?

Unit tests.

rxin

Author: Eric Liang <ekl@databricks.com>

Closes apache#12168 from ericl/sc-2794.
…mand

## What changes were proposed in this pull request?

This PR adds Native execution of SHOW TBLPROPERTIES command.

Command Syntax:
``` SQL
SHOW TBLPROPERTIES table_name[(property_key_literal)]
```
## How was this patch tested?

Tests added in HiveComandSuiie and DDLCommandSuite

Author: Dilip Biswal <dbiswal@us.ibm.com>

Closes apache#12133 from dilipbiswal/dkb_show_tblproperties.
…/DDL in SQL Context.

#### What changes were proposed in this pull request?

Currently, the weird error messages are issued if we use Hive Context-only operations in SQL Context.

For example,
- When calling `Drop Table` in SQL Context, we got the following message:
```
Expected exception org.apache.spark.sql.catalyst.parser.ParseException to be thrown, but java.lang.ClassCastException was thrown.
```

- When calling `Script Transform` in SQL Context, we got the message:
```
assertion failed: No plan for ScriptTransformation [key#9,value#10], cat, [tKey#155,tValue#156], null
+- LogicalRDD [key#9,value#10], MapPartitionsRDD[3] at beforeAll at BeforeAndAfterAll.scala:187
```

Updates:
Based on the investigation from hvanhovell , the root cause is `visitChildren`, which is the default implementation. It always returns the result of the last defined context child. After merging the code changes from hvanhovell , it works! Thank you hvanhovell !

#### How was this patch tested?
A few test cases are added.

Not sure if the same issue exist for the other operators/DDL/DML. hvanhovell

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Herman van Hovell <hvanhovell@questtec.nl>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes apache#12134 from gatorsmile/hiveParserCommand.
@dbtsai
Copy link
Member

dbtsai commented Apr 8, 2016

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55401 has finished for PR 12159 at commit 4edccf3.

  • This patch fails R style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Anderson de Andrade added 5 commits April 8, 2016 19:06
…ndrade/spark into pregel-optional-initmessage

# Conflicts:
#	graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
#	graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala
@adeandrade adeandrade closed this Apr 8, 2016
@adeandrade adeandrade deleted the pregel-optional-initmessage branch April 8, 2016 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.