update from orign#13118
Closed
zhaorongsheng wants to merge 815 commits intoapache:masterfrom
zhaorongsheng:master
Closed
update from orign#13118zhaorongsheng wants to merge 815 commits intoapache:masterfrom zhaorongsheng:master
zhaorongsheng wants to merge 815 commits intoapache:masterfrom
zhaorongsheng:master
Conversation
…ling setConf This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala andrewor14 FYI Author: tedyu <yuzhihong@gmail.com> Closes #10164 from tedyu/master. (cherry picked from commit f725b2e) Signed-off-by: Andrew Or <andrew@databricks.com>
ExternalBlockStore.scala Author: Naveen <naveenminchu@gmail.com> Closes #10313 from naveenminchu/branch-fix-SPARK-9886. (cherry picked from commit 8a215d2) Signed-off-by: Andrew Or <andrew@databricks.com>
… completes This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062. (cherry picked from commit c5b6b39) Signed-off-by: Andrew Or <andrew@databricks.com>
This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow. Credit goes to the original author Titan-C (mentioned in the NOTICE). Note that I am not a CSS expert, so I can only address comments up to some extent. Default view: <img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png"> When collapsed manually by the user: <img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png"> Disappears when column is too narrow: <img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png"> Can still be opened by the user if necessary: <img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10297 from thunterdb/12324. (cherry picked from commit a6325fc) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10281 from yanboliang/spark-12310. (cherry picked from commit 22f6cd8) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #10244 from yu-iskw/SPARK-12215. (cherry picked from commit 26d70bd) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318. (cherry picked from commit 2eb5af5) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
…h Mesos cluster mode. SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined. We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead. Author: Timothy Chen <tnachen@gmail.com> Closes #10332 from tnachen/scheduler_ui. (cherry picked from commit ad8c1f0) Signed-off-by: Andrew Or <andrew@databricks.com>
… bisecting k-means This PR includes only an example code in order to finish it quickly. I'll send another PR for the docs soon. Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9952 from yu-iskw/SPARK-6518. (cherry picked from commit 7b6dc29) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```. cc mengxr jkbradley shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10324 from yanboliang/spark-12364. (cherry picked from commit 1a8b2a1) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```
Author: Andrew Or <andrew@databricks.com>
Closes #10334 from andrewor14/rpc-typo.
(cherry picked from commit 861549a)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception. This was suggested by mateiz on #7699. It may have already turned up an issue in "zero split job". Author: Imran Rashid <irashid@cloudera.com> Closes #8466 from squito/SPARK-10248. (cherry picked from commit 38d9795) Signed-off-by: Andrew Or <andrew@databricks.com>
…addShutdownHook() is called SPARK-9886 fixed ExternalBlockStore.scala This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook() Author: tedyu <yuzhihong@gmail.com> Closes #10325 from ted-yu/master. (cherry picked from commit f590178) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
…ry string when redirecting. Author: Rohit Agarwal <rohita@qubole.com> Closes #10180 from mindprince/SPARK-12186. (cherry picked from commit fdb3822) Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10339 from vanzin/SPARK-12386. (cherry picked from commit d1508dd) Signed-off-by: Andrew Or <andrew@databricks.com>
This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference. Regarding the schema inference change, if we have something like ``` {"f1":1} [1,2,3] ``` originally, we will get a DF without any column. After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`. When merge this PR, please make sure that the author is simplyianm. JIRA: https://issues.apache.org/jira/browse/SPARK-12057 Closes #10043 Author: Ian Macalinao <me@ian.pw> Author: Yin Huai <yhuai@databricks.com> Closes #10288 from yhuai/handleCorruptJson. (cherry picked from commit 9d66c42) Signed-off-by: Reynold Xin <rxin@databricks.com>
This commit is to resolve SPARK-12396. Author: echo2mei <534384876@qq.com> Closes #10354 from echoTomei/master. (cherry picked from commit 5a514b6) Signed-off-by: Davies Liu <davies.liu@gmail.com>
…er." This reverts commit da7542f.
For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null). The order of columns had been changed to match that with MySQL and PostgreSQL [1]. This PR also fix the nullability of output for outer join. [1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html Author: Davies Liu <davies@databricks.com> Closes #10353 from davies/fix_join. (cherry picked from commit a170d34) Signed-off-by: Davies Liu <davies.liu@gmail.com>
Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10349 from yanboliang/text-value. (cherry picked from commit 6e07716) Signed-off-by: Reynold Xin <rxin@databricks.com>
… server Fix problem with #10332, this one should fix Cluster mode on Mesos Author: Iulian Dragos <jaguarul@gmail.com> Closes #10359 from dragos/issue/fix-spark-12345-one-more-time. (cherry picked from commit 8184568) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around. Supersedes #11524 Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #11631 from srowen/SPARK-13663. (cherry picked from commit 927e22e) Signed-off-by: Sean Owen <sowen@cloudera.com>
## What changes were proposed in this pull request? Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines. ``` -By default Spark will build with Hive 0.13.1 bindings. +By default Spark will build with Hive 1.2.1 bindings. -# Apache Hadoop 2.4.X with Hive 13 support +# Apache Hadoop 2.4.X with Hive 1.2.1 support ``` `sql/README.md` file also describe ## How was this patch tested? Manual. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11639 from dongjoon-hyun/fix_doc_hive_version. (cherry picked from commit 88fa866) Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Closes #11220 from olarayej/SPARK-13312-3. (cherry picked from commit 416e71a) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
…ions ## What changes were proposed in this pull request? Currently, when a java.net.BindException is thrown, it displays the following message: java.net.BindException: Address already in use: Service '$serviceName' failed after 16 retries! This change adds port configuration suggestions to the BindException, for example, for the UI, it now displays java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries! Consider explicitly setting the appropriate port for 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. ## How was this patch tested? Manual tests Author: Bjorn Jonsson <bjornjon@gmail.com> Closes #11644 from bjornjon/master. (cherry picked from commit 515e4af) Signed-off-by: Sean Owen <sowen@cloudera.com>
## What changes were proposed in this pull request? fix typo in DataSourceRegister ## How was this patch tested? found when going through latest code Author: Jacky Li <jacky.likun@huawei.com> Closes #11686 from jackylk/patch-12. (cherry picked from commit f3daa09) Signed-off-by: Reynold Xin <rxin@databricks.com>
## What changes were proposed in this pull request? When studying spark many users just copy examples on the documentation and paste on their terminals and because of that the missing backlashes lead them run into some shell errors. The added backslashes avoid that problem for spark users with that behavior. ## How was this patch tested? I generated the documentation locally using jekyll and checked the generated pages Author: Daniel Santana <mestresan@gmail.com> Closes #11699 from danielsan/master. (cherry picked from commit 9f13f0f) Signed-off-by: Andrew Or <andrew@databricks.com>
## What changes were proposed in this pull request? JavaUtils.java has methods to convert time and byte strings for internal use, this change renames a variable used in byteStringAs(), from timeError to byteError. Author: Bjorn Jonsson <bjornjon@gmail.com> Closes #11695 from bjornjon/master. (cherry picked from commit e06493c) Signed-off-by: Andrew Or <andrew@databricks.com>
…CCESS files. If a _SUCCESS appears in the inner partitioning dir, partition discovery will treat that _SUCCESS file as a data file. Then, partition discovery will fail because it finds that the dir structure is not valid. We should ignore those `_SUCCESS` files. In future, it is better to ignore all files/dirs starting with `_` or `.`. This PR does not make this change. I am thinking about making this change simple, so we can consider of getting it in branch 1.6. To ignore all files/dirs starting with `_` or `, the main change is to let ParquetRelation have another way to get metadata files. Right now, it relies on FileStatusCache's cachedLeafStatuses, which returns file statuses of both metadata files (e.g. metadata files used by parquet) and data files, which requires more changes. https://issues.apache.org/jira/browse/SPARK-13207 Author: Yin Huai <yhuai@databricks.com> Closes #11697 from yhuai/SPARK13207_branch16.
## What changes were proposed in this pull request? This patch contains the functionality to balance the load of the cluster-mode drivers among workers This patch restores the changes in #1106 which was erased due to the merging of #731 ## How was this patch tested? test with existing test cases Author: CodingCat <zhunansjtu@gmail.com> Closes #11702 from CodingCat/SPARK-13803. (cherry picked from commit bd5365b) Signed-off-by: Sean Owen <sowen@cloudera.com>
… next locality level JIRA Issue:https://issues.apache.org/jira/browse/SPARK-13901 In getAllowedLocalityLevel method of TaskSetManager,we get wrong logDebug information when jump to the next locality level.So we should fix it. Author: trueyao <501663994@qq.com> Closes #11719 from trueyao/logDebug-localityWait. (cherry picked from commit ea9ca6f) Signed-off-by: Sean Owen <sowen@cloudera.com>
## What changes were proposed in this pull request? This change fixes the executor OOM which was recently introduced in PR #11095 (Please fill in changes proposed in this fix) ## How was this patch tested? Tested by running a spark job on the cluster. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) … Sorter Author: Sital Kedia <skedia@fb.com> Closes #11794 from sitalkedia/SPARK-13958. (cherry picked from commit 2e0c528) Signed-off-by: Davies Liu <davies.liu@gmail.com>
## What changes were proposed in this pull request?
Replaces current docstring ("Creates a :class:`WindowSpec` with the partitioning defined.") with "Creates a :class:`WindowSpec` with the ordering defined."
## How was this patch tested?
PySpark unit tests (no regression introduced). No changes to the code.
Author: zero323 <matthew.szymkiewicz@gmail.com>
Closes #11877 from zero323/order-by-description.
(cherry picked from commit 8193a26)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Fix data type mismatch for decimal, patch for branch-1.6. Author: cenyuhai <cenyuhai@didichuxing.com> Closes #11605 from cenyuhai/SPARK-13772.
## What changes were proposed in this pull request? A backport of #11652 for branch 1.6. It fixes all newly captured SparkR lint-r errors after the lintr package is updated from github. ## How was this patch tested? dev/lint-r SparkR unit tests Author: Sun Rui <rui.sun@intel.com> Closes #11884 from sun-rui/SPARK-14006.
Round() in database usually round the number up (away from zero), it's different than Math.round() in Java. For example: ``` scala> java.lang.Math.round(-3.5) res3: Long = -3 ``` In Database, we should return -4.0 in this cases. This PR remove the buggy special case for scale=0. Add tests for negative values with tie. Author: Davies Liu <davies@databricks.com> Closes #11894 from davies/fix_round. (cherry picked from commit 4700adb) Signed-off-by: Davies Liu <davies.liu@gmail.com> Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
…icationMaster ## What changes were proposed in this pull request? This patch is fixing the race condition in ApplicationMaster when receiving a signal. In the current implementation, if signal is received and with no any exception, this application will be finished with successful state in Yarn, and there's no another attempt. Actually the application is killed by signal in the runtime, so another attempt is expected. This patch adds a signal handler to handle the signal things, if signal is received, marking this application finished with failure, rather than success. ## How was this patch tested? This patch is tested with following situations: Application is finished normally. Application is finished by calling System.exit(n). Application is killed by yarn command. ApplicationMaster is killed by "SIGTERM" send by kill pid command. ApplicationMaster is killed by NM with "SIGTERM" in case of NM failure. Author: jerryshao <sshao@hortonworks.com> Closes #11690 from jerryshao/SPARK-13642-1.6-backport.
…b to install intr package. ## What changes were proposed in this pull request? In dev/lint-r.R, `install_github` makes our builds depend on a unstable source. This may cause un-expected test failures and then build break. This PR adds a specified commit sha1 ID to `install_github` to get a stable source. ## How was this patch tested? dev/lint-r Author: Sun Rui <rui.sun@intel.com> Closes #11913 from sun-rui/SPARK-14074. (cherry picked from commit 7d11750) Signed-off-by: Xiangrui Meng <meng@databricks.com>
## What changes were proposed in this pull request? GBTs in pyspark previously had seed parameters, but they could not be passed as keyword arguments through the class constructor. This patch adds seed as a keyword argument and also sets default value. ## How was this patch tested? Doc tests were updated to pass a random seed through the GBTClassifier and GBTRegressor constructors. Author: sethah <seth.hendrickson16@gmail.com> Closes #11944 from sethah/SPARK-14107. (cherry picked from commit 5850977) Signed-off-by: Xiangrui Meng <meng@databricks.com>
## What changes were proposed in this pull request? We ran into a problem today debugging some class loading problem during deserialization, and JVM was masking the underlying exception which made it very difficult to debug. We can however log the exceptions using try/catch ourselves in serialization/deserialization. The good thing is that all these methods are already using Utils.tryOrIOException, so we can just put the try catch and logging in a single place. ## How was this patch tested? A logging change with a manual test. Author: Reynold Xin <rxin@databricks.com> Closes #11951 from rxin/SPARK-14149. (cherry picked from commit 70a6f0b) Signed-off-by: Reynold Xin <rxin@databricks.com>
## What changes were proposed in this pull request? Fix incorrect use of binarySearch in SparseMatrix ## How was this patch tested? Unit test added. Author: Chenliang Xu <chexu@groupon.com> Closes #11992 from luckyrandom/SPARK-14187. (cherry picked from commit c838829) Signed-off-by: Xiangrui Meng <meng@databricks.com>
## What changes were proposed in this pull request? This patch will ensure that we trim all path set in yarn.nodemanager.local-dirs and that the the scheme is well removed so the level db can be created. ## How was this patch tested? manual tests. Author: nfraison <nfraison@yahoo.fr> Closes #11475 from ashangit/level_db_creation_issue. (cherry picked from commit ff3bea3)
## What changes were proposed in this pull request?
Currently, `GraphOps.pickRandomVertex()` falls into infinite loops for graphs having only one vertex. This PR fixes it by modifying the following termination-checking condition.
```scala
- if (selectedVertices.count > 1) {
+ if (selectedVertices.count > 0) {
```
## How was this patch tested?
Pass the Jenkins tests (including new test case).
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12021 from dongjoon-hyun/SPARK-14219-2.
…askEnd avioding driver OOM ## What changes were proposed in this pull request? We have a streaming job using `FlumePollInputStream` always driver OOM after few days, here is some driver heap dump before OOM ``` num #instances #bytes class name ---------------------------------------------- 1: 13845916 553836640 org.apache.spark.storage.BlockStatus 2: 14020324 336487776 org.apache.spark.storage.StreamBlockId 3: 13883881 333213144 scala.collection.mutable.DefaultEntry 4: 8907 89043952 [Lscala.collection.mutable.HashEntry; 5: 62360 65107352 [B 6: 163368 24453904 [Ljava.lang.Object; 7: 293651 20342664 [C ... ``` `BlockStatus` and `StreamBlockId` keep on growing, and the driver OOM in the end. After investigated, i found the `executorIdToStorageStatus` in `StorageStatusListener` seems never remove the blocks from `StorageStatus`. In order to fix the issue, i try to use `onBlockUpdated` replace `onTaskEnd ` , so we can update the block informations(add blocks, drop the block from memory to disk and delete the blocks) in time. ## How was this patch tested? Existing unit tests and manual tests Author: jeanlyn <jeanlyn92@gmail.com> Closes #12028 from jeanlyn/fixoom1.6.
…r is removed with a multiple line reason. ## What changes were proposed in this pull request? The event timeline doesn't show on job page if an executor is removed with a multiple line reason. This PR replaces all new line characters in the reason string with spaces.  ## How was this patch tested? Verified on the Web UI. Author: Carson Wang <carson.wang@intel.com> Closes #12029 from carsonwang/eventTimeline. (cherry picked from commit 15c0b00) Signed-off-by: Andrew Or <andrew@databricks.com>
jira: https://issues.apache.org/jira/browse/SPARK-11507 "In certain situations when adding two block matrices, I get an error regarding colPtr and the operation fails. External issue URL includes full error and code for reproducing the problem." root cause: colPtr.last does NOT always equal to values.length in breeze SCSMatrix, which fails the require in SparseMatrix. easy step to repro: ``` val m1: BM[Double] = new CSCMatrix[Double] (Array (1.0, 1, 1), 3, 3, Array (0, 1, 2, 3), Array (0, 1, 2) ) val m2: BM[Double] = new CSCMatrix[Double] (Array (1.0, 2, 2, 4), 3, 3, Array (0, 0, 2, 4), Array (1, 2, 1, 2) ) val sum = m1 + m2 Matrices.fromBreeze(sum) ``` Solution: By checking the code in [CSCMatrix](https://github.com/scalanlp/breeze/blob/28000a7b901bc3cfbbbf5c0bce1d0a5dda8281b0/math/src/main/scala/breeze/linalg/CSCMatrix.scala), CSCMatrix in breeze can have extra zeros in the end of data array. Invoking compact will make sure it aligns with the require of SparseMatrix. This should add limited overhead as the actual compact operation is only performed when necessary. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9520 from hhbyyh/matricesFromBreeze. (cherry picked from commit ca45861) Signed-off-by: Joseph K. Bradley <joseph@databricks.com> Conflicts: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
Branch 1.6
Member
|
@zhaorongsheng it seems it is open mistakenly. I guess this might have to be closed. |
|
Can one of the admins verify this patch? |
Contributor
Author
|
sorry, it is unintentional mistake. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)