[BRANCH-1.1][SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler #3502

roxchkplusony · 2014-11-28T06:46:35Z

v1.1 backport for #3483

…gle file as parameter ```if (!fs.getFileStatus(path).isDir) throw Exception``` make no sense after this commit #1370 be careful if someone is working on SPARK-2551, make sure the new change passes test case ```test("Read a parquet file instead of a directory")``` Author: chutium <teng.qiu@gmail.com> Closes #2044 from chutium/parquet-singlefile and squashes the following commits: 4ae477f [chutium] [SPARK-3138][SQL] sqlContext.parquetFile should be able to take a single file as parameter (cherry picked from commit 48f4278) Signed-off-by: Michael Armbrust <michael@databricks.com>

…th "Launch More like this" ... copy the spark_cluster_tag from a spot instance requests over to the instances. Author: Vida Ha <vida@databricks.com> Closes #2163 from vidaha/vida/spark-3213 and squashes the following commits: 5070a70 [Vida Ha] Spark-3214 Fix issue with spark-ec2 not detecting slaves created with 'Launch More Like This' and using Spot Requests (cherry picked from commit 7faf755) Signed-off-by: Josh Rosen <joshrosen@apache.org>

If we set both `spark.driver.extraClassPath` and `--driver-class-path`, then the latter correctly overrides the former. However, the value of the system property `spark.driver.extraClassPath` still uses the former, which is actually not added to the class path. This may cause some confusion... Of course, this also affects other options (i.e. java options, library path, memory...). Author: Andrew Or <andrewor14@gmail.com> Closes #2154 from andrewor14/driver-submit-configs-fix and squashes the following commits: 17ec6fc [Andrew Or] Fix tests 0140836 [Andrew Or] Don't forget spark.driver.memory e39d20f [Andrew Or] Also set spark.driver.extra* configs in client mode (cherry picked from commit 63a053a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

According to the text message, both relations should be tested. So add the missing condition. Author: viirya <viirya@gmail.com> Closes #2159 from viirya/fix_test and squashes the following commits: b1c0f52 [viirya] add missing condition. (cherry picked from commit 28d41d6) Signed-off-by: Michael Armbrust <michael@databricks.com>

…tion Currently we do `relation.hiveQlTable.getDataLocation.getPath`, which returns the path-part of the URI (e.g., "s3n://my-bucket/my-path" => "/my-path"). We should do `relation.hiveQlTable.getDataLocation.toString` instead, as a URI's toString returns a faithful representation of the full URI, which can later be passed into a Hadoop Path. Author: Aaron Davidson <aaron@databricks.com> Closes #2150 from aarondav/parquet-location and squashes the following commits: 459f72c [Aaron Davidson] [SQL] [SPARK-3236] Reading Parquet tables from Metastore mangles location (cherry picked from commit cc275f4) Signed-off-by: Michael Armbrust <michael@databricks.com>

…udf_unix_timestamp format "yyyy MMM dd h:mm:ss a" run with not "America/Los_Angeles" TimeZone in HiveCompatibilitySuite When run the udf_unix_timestamp of org.apache.spark.sql.hive.execution.HiveCompatibilitySuite testcase with not "America/Los_Angeles" TimeZone throws error. [https://issues.apache.org/jira/browse/SPARK-3065] add locale setting on beforeAll and afterAll method to fix the bug of HiveCompatibilitySuite testcase Author: luogankun <luogankun@gmail.com> Closes #1968 from luogankun/SPARK-3065 and squashes the following commits: c167832 [luogankun] [SPARK-3065][SQL] Add Locale setting to HiveCompatibilitySuite 0a25e3a [luogankun] [SPARK-3065][SQL] Add Locale setting to HiveCompatibilitySuite (cherry picked from commit 6525350) Signed-off-by: Michael Armbrust <michael@databricks.com>

Author: Michael Armbrust <michael@databricks.com> Closes #2147 from marmbrus/inMemDefaultSize and squashes the following commits: 5390360 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into inMemDefaultSize 14204d3 [Michael Armbrust] Set the context before creating SparkLogicalPlans. 8da4414 [Michael Armbrust] Make sure we throw errors when leaf nodes fail to provide statistcs 18ce029 [Michael Armbrust] Ensure in-memory tables don't always broadcast. (cherry picked from commit 7d2a7a9) Signed-off-by: Michael Armbrust <michael@databricks.com>

This reverts commit 9af3fb7.

This reverts commit e1535ad.

Error was - $ SPARK_HOME=$PWD/dist ./dev/create-release/generate-changelist.py File "./dev/create-release/generate-changelist.py", line 128 if day < SPARK_REPO_CHANGE_DATE1 or ^ SyntaxError: invalid syntax Author: Matthew Farrellee <matt@redhat.com> Closes #2139 from mattf/master-fix-generate-changelist.py-0 and squashes the following commits: 6b3a900 [Matthew Farrellee] Add line continuation for script to work w/ py2.7.5 (cherry picked from commit 64d8ecb) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

In `SparkSubmitDriverBootstrapper`, we wait for the parent process to send us an `EOF` before finishing the application. This is applicable for the PySpark shell because we terminate the application the same way. However if we run a python application, for instance, the JVM actually never exits unless it receives a manual EOF from the user. This is causing a few tests to timeout. We only need to do this for the PySpark shell because Spark submit runs as a python subprocess only in this case. Thus, the normal Spark shell doesn't need to go through this case even though it is also a REPL. Thanks davies for reporting this. Author: Andrew Or <andrewor14@gmail.com> Closes #2170 from andrewor14/bootstrap-hotfix and squashes the following commits: 42963f5 [Andrew Or] Do not wait for EOF unless this is the pyspark shell (cherry picked from commit dafe343) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

It is not safe to run the closure cleaner on slaves. #2153 introduced this which broke all UDF execution on slaves. Will re-add cleaning of UDF closures in a follow-up PR. Author: Michael Armbrust <michael@databricks.com> Closes #2174 from marmbrus/fixUdfs and squashes the following commits: 55406de [Michael Armbrust] [HOTFIX] Remove cleaning of UDFs (cherry picked from commit 024178c) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2172 from liancheng/sqlconf-typo and squashes the following commits: 115cc71 [Cheng Lian] Fixed 2 comment typos in SQLConf (cherry picked from commit 68f75dc) Signed-off-by: Michael Armbrust <michael@databricks.com>

We need to convert the case classes into Rows. Author: Michael Armbrust <michael@databricks.com> Closes #2133 from marmbrus/structUdfs and squashes the following commits: 189722f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into structUdfs 8e29b1c [Michael Armbrust] Use existing function d8d0b76 [Michael Armbrust] Fix udfs that return structs (cherry picked from commit 76e3ba4) Signed-off-by: Michael Armbrust <michael@databricks.com>

This reverts commit a118ea5.

This reverts commit 79e86ef.

This reverts commit 96926c5.

This reverts commit da4b94c.

This reverts commit 56070f1.

…teration"" This reverts commit 71ec014.

This reverts commit a118ea5.

This reverts commit db22a9e.

This reverts commit 7029301.

This involves a few main changes: - Log all output message to the log file. Previously the log file was not useful because it did not indicate progress. - Remove hive-site.xml in sbt_hive_app to avoid interference - Add the appropriate repositories for new dependencies

This reverts commit 6de8881.

This reverts commit 3f9e073.

andrewor14 This backports the bug fix in #3220 . It would be good if we can get it in 1.1.1. But this is minor. Author: Xiangrui Meng <meng@databricks.com> Closes #3251 from mengxr/SPARK-4355-1.1 and squashes the following commits: 33886b6 [Xiangrui Meng] Merge remote-tracking branch 'apache/branch-1.1' into SPARK-4355-1.1 91fe1a3 [Xiangrui Meng] fix OnlineSummarizer.merge when other.mean is zero

…r file" This reverts commit 098f83c.

This reverts commit 685bdd2.

This reverts commit 72a4fdb.

This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR. Author: Andrew Or <andrew@databricks.com> Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits: 486fc49 [Andrew Or] Reset `elementsRead`

…sks; use HashedWheelTimer (For branch-1.1) This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs. This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits: 786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager

Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction. This should be applied to branch-1.0, branch-1.1, and branch-1.2. pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #3291 from mengxr/SPARK-4433 and squashes the following commits: c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex (cherry picked from commit bb46046) Signed-off-by: Xiangrui Meng <meng@databricks.com>

[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3338)  Author: Cheng Lian <lian@databricks.com> Closes #3338 from liancheng/spark-3334-for-1.1 and squashes the following commits: bd17512 [Cheng Lian] Backports #3334 to branch-1.1

This is the branch-1.1 version of #3243. Author: Andrew Or <andrew@databricks.com> Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits: 36ec152 [Andrew Or] Log more precise representation of bytes in spilling code

This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1. Author: Andrew Or <andrew@databricks.com> Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits: f2e552c [Andrew Or] Fix tests 7012595 [Andrew Or] Avoid many small spills

…treamFunctions.saveAsNewAPIHadoopFiles Solves two JIRAs in one shot - Makes the ForechDStream created by saveAsNewAPIHadoopFiles serializable for checkpoints - Makes the default configuration object used saveAsNewAPIHadoopFiles be the Spark's hadoop configuration Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3457 from tdas/savefiles-fix and squashes the following commits: bb4729a [Tathagata Das] Same treatment for saveAsHadoopFiles b382ea9 [Tathagata Das] Fix serialization issue in PairDStreamFunctions.saveAsNewAPIHadoopFiles. (cherry picked from commit 8838ad7) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

This commit provides a script that computes the contributors list by linking the github commits with JIRA issues. Automatically translating github usernames remains a TODO at this point.

… with the scheduler Author: roxchkplusony <roxchkplusony@gmail.com> Closes #3483 from roxchkplusony/bugfix/4626 and squashes the following commits: aba9184 [roxchkplusony] replace warning message per review 5e7fdea [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler

chutium and others added 30 commits August 27, 2014 13:13

Revert "[maven-release-plugin] prepare for next development iteration"

0c03fb6

This reverts commit 9af3fb7.

Revert "[maven-release-plugin] prepare release v1.1.0-snapshot2"

0b17c7d

This reverts commit e1535ad.

BUILD: Updating CHANGES.txt for Spark 1.1

8597e9c

[maven-release-plugin] prepare release v1.1.0-rc1

58b0be6

[maven-release-plugin] prepare for next development iteration

78e3c03

HOTFIX: Don't build with YARN support for Mapr3

ad0fab2

[maven-release-plugin] prepare release v1.1.0-rc1

79e86ef

[maven-release-plugin] prepare for next development iteration

a118ea5

Revert "[maven-release-plugin] prepare for next development iteration"

71ec014

This reverts commit a118ea5.

Revert "[maven-release-plugin] prepare release v1.1.0-rc1"

56070f1

This reverts commit 79e86ef.

Additional CHANGES.txt

a9df703

[maven-release-plugin] prepare release v1.1.0-rc1

da4b94c

[maven-release-plugin] prepare for next development iteration

96926c5

Revert "[maven-release-plugin] prepare for next development iteration"

473b02d

This reverts commit 96926c5.

Revert "[maven-release-plugin] prepare release v1.1.0-rc1"

ecdbeef

This reverts commit da4b94c.

Revert "Revert "[maven-release-plugin] prepare release v1.1.0-rc1""

4186c45

This reverts commit 56070f1.

Revert "Revert "[maven-release-plugin] prepare for next development i…

df61944

…teration"" This reverts commit 71ec014.

Revert "[maven-release-plugin] prepare for next development iteration"

d01b3fa

This reverts commit a118ea5.

andrewor14 and others added 29 commits November 12, 2014 19:01

[maven-release-plugin] prepare for next development iteration

db22a9e

Revert "[maven-release-plugin] prepare for next development iteration"

d3b808f

This reverts commit db22a9e.

Revert "[maven-release-plugin] prepare release v1.1.1-rc1"

8fe1c8c

This reverts commit 7029301.

[maven-release-plugin] prepare release v1.1.1-rc1

3f9e073

[maven-release-plugin] prepare for next development iteration

6de8881

[Release] Correct make-distribution.sh log path

ba6d81d

Revert "[maven-release-plugin] prepare for next development iteration"

6f34fa0

This reverts commit 6de8881.

Revert "[maven-release-plugin] prepare release v1.1.1-rc1"

6f7b1bc

This reverts commit 3f9e073.

[maven-release-plugin] prepare release v1.1.1-rc1

72a4fdb

[maven-release-plugin] prepare for next development iteration

685bdd2

Revert "[SPARK-4075] [Deploy] Jar url validation is not enough for Ja…

b528367

…r file" This reverts commit 098f83c.

Revert "[maven-release-plugin] prepare for next development iteration"

cf8d0ef

This reverts commit 685bdd2.

Revert "[maven-release-plugin] prepare release v1.1.1-rc1"

e4f5695

This reverts commit 72a4fdb.

Update CHANGES.txt for 1.1.1-rc2

aa3c794

[maven-release-plugin] prepare release v1.1.1-rc2

3693ae5

[maven-release-plugin] prepare for next development iteration

1df1c1d

Update versions to 1.1.2-SNAPSHOT

6371737

[HOTFIX] Fixing broken build due to missing imports.

1a7f414

[Release] Automate generation of contributors list

a59c445

This commit provides a script that computes the contributors list by linking the github commits with JIRA issues. Automatically translating github usernames remains a TODO at this point.

roxchkplusony closed this Nov 28, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BRANCH-1.1][SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler #3502

[BRANCH-1.1][SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler #3502

roxchkplusony commented Nov 28, 2014

[BRANCH-1.1][SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler #3502

[BRANCH-1.1][SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler #3502

Conversation

roxchkplusony commented Nov 28, 2014