Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BRANCH-1.1][SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler #3502

Closed
wants to merge 527 commits into from

Conversation

roxchkplusony
Copy link
Contributor

v1.1 backport for #3483

chutium and others added 30 commits August 27, 2014 13:13
…gle file as parameter

```if (!fs.getFileStatus(path).isDir) throw Exception``` make no sense after this commit #1370

be careful if someone is working on SPARK-2551, make sure the new change passes test case ```test("Read a parquet file instead of a directory")```

Author: chutium <teng.qiu@gmail.com>

Closes #2044 from chutium/parquet-singlefile and squashes the following commits:

4ae477f [chutium] [SPARK-3138][SQL] sqlContext.parquetFile should be able to take a single file as parameter

(cherry picked from commit 48f4278)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…th "Launch More like this"

... copy the spark_cluster_tag from a spot instance requests over to the instances.

Author: Vida Ha <vida@databricks.com>

Closes #2163 from vidaha/vida/spark-3213 and squashes the following commits:

5070a70 [Vida Ha] Spark-3214 Fix issue with spark-ec2 not detecting slaves created with 'Launch More Like This' and using Spot Requests

(cherry picked from commit 7faf755)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
If we set both `spark.driver.extraClassPath` and `--driver-class-path`, then the latter correctly overrides the former. However, the value of the system property `spark.driver.extraClassPath` still uses the former, which is actually not added to the class path. This may cause some confusion...

Of course, this also affects other options (i.e. java options, library path, memory...).

Author: Andrew Or <andrewor14@gmail.com>

Closes #2154 from andrewor14/driver-submit-configs-fix and squashes the following commits:

17ec6fc [Andrew Or] Fix tests
0140836 [Andrew Or] Don't forget spark.driver.memory
e39d20f [Andrew Or] Also set spark.driver.extra* configs in client mode
(cherry picked from commit 63a053a)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
According to the text message, both relations should be tested. So add the missing condition.

Author: viirya <viirya@gmail.com>

Closes #2159 from viirya/fix_test and squashes the following commits:

b1c0f52 [viirya] add missing condition.

(cherry picked from commit 28d41d6)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…tion

Currently we do `relation.hiveQlTable.getDataLocation.getPath`, which returns the path-part of the URI (e.g., "s3n://my-bucket/my-path" => "/my-path"). We should do `relation.hiveQlTable.getDataLocation.toString` instead, as a URI's toString returns a faithful representation of the full URI, which can later be passed into a Hadoop Path.

Author: Aaron Davidson <aaron@databricks.com>

Closes #2150 from aarondav/parquet-location and squashes the following commits:

459f72c [Aaron Davidson] [SQL] [SPARK-3236] Reading Parquet tables from Metastore mangles location

(cherry picked from commit cc275f4)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…udf_unix_timestamp format "yyyy MMM dd h:mm:ss a" run with not "America/Los_Angeles" TimeZone in HiveCompatibilitySuite

When run the udf_unix_timestamp of org.apache.spark.sql.hive.execution.HiveCompatibilitySuite testcase
with not "America/Los_Angeles" TimeZone throws error. [https://issues.apache.org/jira/browse/SPARK-3065]
add locale setting on beforeAll and afterAll method to fix the bug of HiveCompatibilitySuite testcase

Author: luogankun <luogankun@gmail.com>

Closes #1968 from luogankun/SPARK-3065 and squashes the following commits:

c167832 [luogankun] [SPARK-3065][SQL] Add Locale setting to HiveCompatibilitySuite
0a25e3a [luogankun] [SPARK-3065][SQL] Add Locale setting to HiveCompatibilitySuite

(cherry picked from commit 6525350)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #2147 from marmbrus/inMemDefaultSize and squashes the following commits:

5390360 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into inMemDefaultSize
14204d3 [Michael Armbrust] Set the context before creating SparkLogicalPlans.
8da4414 [Michael Armbrust] Make sure we throw errors when leaf nodes fail to provide statistcs
18ce029 [Michael Armbrust] Ensure in-memory tables don't always broadcast.

(cherry picked from commit 7d2a7a9)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Error was -

$ SPARK_HOME=$PWD/dist ./dev/create-release/generate-changelist.py
  File "./dev/create-release/generate-changelist.py", line 128
    if day < SPARK_REPO_CHANGE_DATE1 or
                                      ^
SyntaxError: invalid syntax

Author: Matthew Farrellee <matt@redhat.com>

Closes #2139 from mattf/master-fix-generate-changelist.py-0 and squashes the following commits:

6b3a900 [Matthew Farrellee] Add line continuation for script to work w/ py2.7.5
(cherry picked from commit 64d8ecb)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
In `SparkSubmitDriverBootstrapper`, we wait for the parent process to send us an `EOF` before finishing the application. This is applicable for the PySpark shell because we terminate the application the same way. However if we run a python application, for instance, the JVM actually never exits unless it receives a manual EOF from the user. This is causing a few tests to timeout.

We only need to do this for the PySpark shell because Spark submit runs as a python subprocess only in this case. Thus, the normal Spark shell doesn't need to go through this case even though it is also a REPL.

Thanks davies for reporting this.

Author: Andrew Or <andrewor14@gmail.com>

Closes #2170 from andrewor14/bootstrap-hotfix and squashes the following commits:

42963f5 [Andrew Or] Do not wait for EOF unless this is the pyspark shell
(cherry picked from commit dafe343)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
It is not safe to run the closure cleaner on slaves.  #2153 introduced this which broke all UDF execution on slaves.  Will re-add cleaning of UDF closures in a follow-up PR.

Author: Michael Armbrust <michael@databricks.com>

Closes #2174 from marmbrus/fixUdfs and squashes the following commits:

55406de [Michael Armbrust] [HOTFIX] Remove cleaning of UDFs
(cherry picked from commit 024178c)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #2172 from liancheng/sqlconf-typo and squashes the following commits:

115cc71 [Cheng Lian] Fixed 2 comment typos in SQLConf

(cherry picked from commit 68f75dc)
Signed-off-by: Michael Armbrust <michael@databricks.com>
We need to convert the case classes into Rows.

Author: Michael Armbrust <michael@databricks.com>

Closes #2133 from marmbrus/structUdfs and squashes the following commits:

189722f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into structUdfs
8e29b1c [Michael Armbrust] Use existing function
d8d0b76 [Michael Armbrust] Fix udfs that return structs

(cherry picked from commit 76e3ba4)
Signed-off-by: Michael Armbrust <michael@databricks.com>
andrewor14 and others added 29 commits November 12, 2014 19:01
This involves a few main changes:
- Log all output message to the log file. Previously the log file
  was not useful because it did not indicate progress.
- Remove hive-site.xml in sbt_hive_app to avoid interference
- Add the appropriate repositories for new dependencies
andrewor14 This backports the bug fix in #3220 . It would be good if we can get it in 1.1.1. But this is minor.

Author: Xiangrui Meng <meng@databricks.com>

Closes #3251 from mengxr/SPARK-4355-1.1 and squashes the following commits:

33886b6 [Xiangrui Meng] Merge remote-tracking branch 'apache/branch-1.1' into SPARK-4355-1.1
91fe1a3 [Xiangrui Meng] fix OnlineSummarizer.merge when other.mean is zero
This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR.

Author: Andrew Or <andrew@databricks.com>

Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits:

486fc49 [Andrew Or] Reset `elementsRead`
…sks; use HashedWheelTimer (For branch-1.1)

This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs.

This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits:

786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager
Spark hangs with the following code:

~~~
sc.parallelize(1 to 10).zipWithIndex.repartition(10).count()
~~~

This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction.

This should be applied to branch-1.0, branch-1.1, and branch-1.2.

pwendell

Author: Xiangrui Meng <meng@databricks.com>

Closes #3291 from mengxr/SPARK-4433 and squashes the following commits:

c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex

(cherry picked from commit bb46046)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3338)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #3338 from liancheng/spark-3334-for-1.1 and squashes the following commits:

bd17512 [Cheng Lian] Backports #3334 to branch-1.1
This is the branch-1.1 version of #3243.

Author: Andrew Or <andrew@databricks.com>

Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits:

36ec152 [Andrew Or] Log more precise representation of bytes in spilling code
This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1.

Author: Andrew Or <andrew@databricks.com>

Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits:

f2e552c [Andrew Or] Fix tests
7012595 [Andrew Or] Avoid many small spills
…treamFunctions.saveAsNewAPIHadoopFiles

Solves two JIRAs in one shot
- Makes the ForechDStream created by saveAsNewAPIHadoopFiles serializable for checkpoints
- Makes the default configuration object used saveAsNewAPIHadoopFiles be the Spark's hadoop configuration

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #3457 from tdas/savefiles-fix and squashes the following commits:

bb4729a [Tathagata Das] Same treatment for saveAsHadoopFiles
b382ea9 [Tathagata Das] Fix serialization issue in PairDStreamFunctions.saveAsNewAPIHadoopFiles.

(cherry picked from commit 8838ad7)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
This commit provides a script that computes the contributors list
by linking the github commits with JIRA issues. Automatically
translating github usernames remains a TODO at this point.
… with the scheduler

Author: roxchkplusony <roxchkplusony@gmail.com>

Closes #3483 from roxchkplusony/bugfix/4626 and squashes the following commits:

aba9184 [roxchkplusony] replace warning message per review
5e7fdea [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.