SPY-287 Merging Apache 0.8.2 changes #7

Merged
merged 19 commits into from Feb 23, 2014

Projects

None yet

9 participants

@markhamstra

Should be all bug-fixes

pwendell and others added some commits Dec 10, 2013
@pwendell pwendell [maven-release-plugin] prepare for next development iteration 8ce9bd8
@pwendell pwendell Version updates not handled by maven release plug-in 8f56390
@ewencp ewencp Force pseudo-tty allocation in spark-ec2 script.
ssh commands need the -t argument repeated twice if there is no local
tty, e.g. if the process running spark-ec2 uses nohup and the parent
process exits.
2e2ead4
@pwendell pwendell Merge pull request #271 from ewencp/really-force-ssh-pseudo-tty-0.8
Force pseudo-tty allocation in spark-ec2 script.

ssh commands need the -t argument repeated twice if there is no local
tty, e.g. if the process running spark-ec2 uses nohup and the parent
process exits.

Without this change, if you run the script this way (e.g. using nohup from a cron job), it will fail setting up the nodes because some of the ssh commands complain about missing ttys and then fail.

(This version is for the 0.8 branch. I've filed a separate request for master since changes to the script caused the patches to be different.)
f898238
@rxin rxin Merge pull request #273 from rxin/top
Fixed a performance problem in RDD.top and BoundedPriorityQueue

BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion.

This should also cherry pick cleanly into branch-0.8.

(cherry picked from commit f4effb3)
Signed-off-by: Reynold Xin <rxin@apache.org>
df5fada
@kayousterhout kayousterhout Handle IndirectTaskResults in LocalScheduler 6183102
@kayousterhout kayousterhout Fixed test failure by adding exception to abortion msg d7bf08c
@mateiz mateiz Merge pull request #281 from kayousterhout/local_indirect_fix
Handle IndirectTaskResults in LocalScheduler

This fixes a bug where large results aren't correctly handled when running in local mode.  Not doing this in master because expecting the Local/Cluster scheduler consolidation to go into 0.9, which will fix this issue (see #127)
88c565d
@rxin rxin Merge pull request #320 from kayousterhout/erroneous_failed_msg
Remove erroneous FAILED state for killed tasks.

Currently, when tasks are killed, the Executor first sends a
status update for the task with a "KILLED" state, and then
sends a second status update with a "FAILED" state saying that
the task failed due to an exception. The second FAILED state is
misleading/unncessary, and occurs due to a NonLocalReturnControl
Exception that gets thrown due to the way we kill tasks. This
commit eliminates that problem.

I'm not at all sure that this is the best way to fix this problem,
so alternate suggestions welcome. @rxin guessing you're the right
person to look at this.

(cherry picked from commit 0475ca8)
Signed-off-by: Reynold Xin <rxin@apache.org>
5c443ad
@shivaram shivaram Add collectPartition to JavaRDD interface.
Also remove takePartition from PythonRDD and use collectPartition in rdd.py.

Conflicts:
	core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
	python/pyspark/context.py
	python/pyspark/rdd.py
5092bae
@shivaram shivaram Make collectPartitions take an array of partitions
Change the implementation to use runJob instead of PartitionPruningRDD.
Also update the unit tests and the python take implementation
to use the new interface.
91e6e5b
@shivaram shivaram Add comment explaining collectPartitions's use 3ef68e4
@shivaram shivaram Make broadcast id public for use in R frontend 691dfef
@pwendell pwendell Merge pull request #496 from pwendell/master
Fix bug in worker clean-up in UI

Introduced in d5a96fe (/cc @aarondav).

This should be picked into 0.8 and 0.9 as well. The bug causes old (zombie) workers on a node to not disappear immediately from the UI when a new one registers.
(cherry picked from commit a1cd185)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
f3cc3a7
@shivaram shivaram Restore takePartition to PythonRDD, context.py
This is to avoid removing functions in minor releases.
38bf786
@rxin rxin Merge pull request #453 from shivaram/branch-0.8-SparkR
Backport changes used in SparkR to 0.8 branch

Backports two changes from master branch

1. Adding collectPartition to JavaRDD and using it from Python as well
2. Making broadcast id public.
c89b71a
@colorant colorant Merge pull request #583 from colorant/zookeeper.
Minor fix for ZooKeeperPersistenceEngine to use configured working dir

Author: Raymond Liu <raymond.liu@intel.com>

Closes #583 and squashes the following commits:

91b0609 [Raymond Liu] Minor fix for ZooKeeperPersistenceEngine to use configured working dir

(cherry picked from commit 68b2c0d)
Signed-off-by: Aaron Davidson <aaron@databricks.com>

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/master/ZooKeeperPersistenceEngine.scala
62b3158
@markhamstra markhamstra Merge branch 'branch-0.8' of https://github.com/apache/incubator-spark
…into master-csd

Conflicts:
	assembly/pom.xml
	bagel/pom.xml
	core/pom.xml
	examples/pom.xml
	mllib/pom.xml
	pom.xml
	repl-bin/pom.xml
	repl/pom.xml
	streaming/pom.xml
	tools/pom.xml
	yarn/pom.xml
2042fa0
@markhamstra markhamstra POM fixes a57cd14
@jhartlaub jhartlaub merged commit c588470 into clearstorydata:master-csd Feb 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment