-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Branch 1.6 #11668
Branch 1.6 #11668
Commits on Dec 11, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 3e39925 - Browse repository at this point
Copy the full SHA 3e39925View commit details -
Configuration menu - View commit details
-
Copy full SHA for 250249e - Browse repository at this point
Copy the full SHA 250249eView commit details -
Configuration menu - View commit details
-
Copy full SHA for eec3660 - Browse repository at this point
Copy the full SHA eec3660View commit details -
Configuration menu - View commit details
-
Copy full SHA for 23f8dfd - Browse repository at this point
Copy the full SHA 23f8dfdView commit details -
2
Configuration menu - View commit details
-
Copy full SHA for 2e45231 - Browse repository at this point
Copy the full SHA 2e45231View commit details -
[SPARK-12146][SPARKR] SparkR jsonFile should support multiple input f…
…iles * ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments jsonFile(sqlContext, “path1,path2”) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10145 from yanboliang/spark-12146. (cherry picked from commit 0fb9825) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for f05bae4 - Browse repository at this point
Copy the full SHA f05bae4View commit details -
[SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation
Configuration menu - View commit details
-
Copy full SHA for 2ddd104 - Browse repository at this point
Copy the full SHA 2ddd104View commit details -
[SPARK-11497][MLLIB][PYTHON] PySpark RowMatrix Constructor Has Type E…
…rasure Issue As noted in PR #9441, implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor. As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark. Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`. Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`. As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type. `IndexedRowMatrix` and `CoordinateMatrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types. This PR currently contains that retagging fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`. This PR blocks #9441, so once this is merged, the other can be rebased. cc holdenk Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #9458 from dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue. (cherry picked from commit 1b82203) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for bfcc8cf - Browse repository at this point
Copy the full SHA bfcc8cfView commit details -
[SPARK-12217][ML] Document invalid handling for StringIndexer
Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation. I wonder if I should also add a snippet to the code example, input welcome. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10257 from BenFradet/SPARK-12217. (cherry picked from commit aea676c) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 75531c7 - Browse repository at this point
Copy the full SHA 75531c7View commit details
Commits on Dec 12, 2015
-
[SPARK-11978][ML] Move dataset_example.py to examples/ml and rename t…
…o dataframe_example.py Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion. #9873 finished the work of Scala example, here we focus on the Python one. Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```. BTW, fix minor missing issues of #9873. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9957 from yanboliang/SPARK-11978. (cherry picked from commit a0ff6d1) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for c2f2046 - Browse repository at this point
Copy the full SHA c2f2046View commit details -
[SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitions
Modifies the String overload to call the Column overload and ensures this is called in a test. Author: Ankur Dave <ankurdave@gmail.com> Closes #10271 from ankurdave/SPARK-12298. (cherry picked from commit 1e799d6) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 03d8015 - Browse repository at this point
Copy the full SHA 03d8015View commit details -
[SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit t…
…est cases The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsmile@gmail.com> Closes #10160 from gatorsmile/sampleR. (cherry picked from commit 1e3526c) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for 47461fe - Browse repository at this point
Copy the full SHA 47461feView commit details -
[SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap t…
…rait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #10203 from jbonofre/SPARK-11193. (cherry picked from commit 03138b6) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 2679fce - Browse repository at this point
Copy the full SHA 2679fceView commit details
Commits on Dec 13, 2015
-
[SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md
https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199. (cherry picked from commit 98b212d) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e05364b - Browse repository at this point
Copy the full SHA e05364bView commit details -
[SPARK-12267][CORE] Store the remote RpcEnv address to send the corre…
…ct disconnetion message Author: Shixiong Zhu <shixiong@databricks.com> Closes #10261 from zsxwing/SPARK-12267. (cherry picked from commit 8af2f8c) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for d7e3bfd - Browse repository at this point
Copy the full SHA d7e3bfdView commit details
Commits on Dec 14, 2015
-
[SPARK-12281][CORE] Fix a race condition when reporting ExecutorState…
… in the shutdown hook 1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook. 2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook. This should fix the potential exceptions when exiting a local cluster ``` java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown. at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180) at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` Author: Shixiong Zhu <shixiong@databricks.com> Closes #10269 from zsxwing/executor-state. (cherry picked from commit 2aecda2) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for fbf16da - Browse repository at this point
Copy the full SHA fbf16daView commit details -
[SPARK-12275][SQL] No plan for BroadcastHint in some condition
When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies. https://issues.apache.org/jira/browse/SPARK-12275 Author: yucai <yucai.yu@intel.com> Closes #10265 from yucai/broadcast_hint. (cherry picked from commit ed87f6d) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 94ce502 - Browse repository at this point
Copy the full SHA 94ce502View commit details -
[MINOR][DOC] Fix broken word2vec link
Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10282 from BenFradet/SPARK-12199. (cherry picked from commit e25f1fe) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for c0f0f6c - Browse repository at this point
Copy the full SHA c0f0f6cView commit details
Commits on Dec 15, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 352a0c8 - Browse repository at this point
Copy the full SHA 352a0c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 23c8846 - Browse repository at this point
Copy the full SHA 23c8846View commit details -
Update branch-1.6 for 1.6.0 release
Author: Michael Armbrust <michael@databricks.com> Closes #10317 from marmbrus/versions.
Configuration menu - View commit details
-
Copy full SHA for 80d2617 - Browse repository at this point
Copy the full SHA 80d2617View commit details -
Configuration menu - View commit details
-
Copy full SHA for 00a39d9 - Browse repository at this point
Copy the full SHA 00a39d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 08aa3b4 - Browse repository at this point
Copy the full SHA 08aa3b4View commit details
Commits on Dec 16, 2015
-
[SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after cal…
…ling setConf This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala andrewor14 FYI Author: tedyu <yuzhihong@gmail.com> Closes #10164 from tedyu/master. (cherry picked from commit f725b2e) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9e4ac56 - Browse repository at this point
Copy the full SHA 9e4ac56View commit details -
[SPARK-12351][MESOS] Add documentation about submitting Spark with me…
Configuration menu - View commit details
-
Copy full SHA for 2c324d3 - Browse repository at this point
Copy the full SHA 2c324d3View commit details -
[SPARK-9886][CORE] Fix to use ShutdownHookManager in
ExternalBlockStore.scala Author: Naveen <naveenminchu@gmail.com> Closes #10313 from naveenminchu/branch-fix-SPARK-9886. (cherry picked from commit 8a215d2) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 8e9a600 - Browse repository at this point
Copy the full SHA 8e9a600View commit details -
[SPARK-12062][CORE] Change Master to asyc rebuild UI when application…
… completes This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062. (cherry picked from commit c5b6b39) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 93095eb - Browse repository at this point
Copy the full SHA 93095ebView commit details -
[SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readabi…
Configuration menu - View commit details
-
Copy full SHA for fb08f7b - Browse repository at this point
Copy the full SHA fb08f7bView commit details -
[SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation
This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow. Credit goes to the original author Titan-C (mentioned in the NOTICE). Note that I am not a CSS expert, so I can only address comments up to some extent. Default view: <img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png"> When collapsed manually by the user: <img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png"> Disappears when column is too narrow: <img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png"> Can still be opened by the user if necessary: <img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10297 from thunterdb/12324. (cherry picked from commit a6325fc) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for a2d584e - Browse repository at this point
Copy the full SHA a2d584eView commit details -
[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR
Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10281 from yanboliang/spark-12310. (cherry picked from commit 22f6cd8) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for ac0e2ea - Browse repository at this point
Copy the full SHA ac0e2eaView commit details -
[SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml
cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #10244 from yu-iskw/SPARK-12215. (cherry picked from commit 26d70bd) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 16edd93 - Browse repository at this point
Copy the full SHA 16edd93View commit details -
[SPARK-12318][SPARKR] Save mode in SparkR should be error by default
shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318. (cherry picked from commit 2eb5af5) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for f815127 - Browse repository at this point
Copy the full SHA f815127View commit details -
[SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs wit…
…h Mesos cluster mode. SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined. We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead. Author: Timothy Chen <tnachen@gmail.com> Closes #10332 from tnachen/scheduler_ui. (cherry picked from commit ad8c1f0) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e5b8571 - Browse repository at this point
Copy the full SHA e5b8571View commit details -
[SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for…
… bisecting k-means This PR includes only an example code in order to finish it quickly. I'll send another PR for the docs soon. Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9952 from yu-iskw/SPARK-6518. (cherry picked from commit 7b6dc29) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e1adf6d - Browse repository at this point
Copy the full SHA e1adf6dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 168c89e - Browse repository at this point
Copy the full SHA 168c89eView commit details -
Configuration menu - View commit details
-
Copy full SHA for aee88eb - Browse repository at this point
Copy the full SHA aee88ebView commit details -
[SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6
Configuration menu - View commit details
-
Copy full SHA for dffa610 - Browse repository at this point
Copy the full SHA dffa610View commit details -
[SPARK-12364][ML][SPARKR] Add ML example for SparkR
We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```. cc mengxr jkbradley shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10324 from yanboliang/spark-12364. (cherry picked from commit 1a8b2a1) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 04e868b - Browse repository at this point
Copy the full SHA 04e868bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 552b38f - Browse repository at this point
Copy the full SHA 552b38fView commit details
Commits on Dec 17, 2015
-
[MINOR] Add missing interpolation in NettyRPCEnv
``` Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ``` Author: Andrew Or <andrew@databricks.com> Closes #10334 from andrewor14/rpc-typo. (cherry picked from commit 861549a) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 638b89b - Browse repository at this point
Copy the full SHA 638b89bView commit details -
[SPARK-10248][CORE] track exceptions in dagscheduler event loop in tests
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception. This was suggested by mateiz on #7699. It may have already turned up an issue in "zero split job". Author: Imran Rashid <irashid@cloudera.com> Closes #8466 from squito/SPARK-10248. (cherry picked from commit 38d9795) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for fb02e4e - Browse repository at this point
Copy the full SHA fb02e4eView commit details -
[SPARK-12365][CORE] Use ShutdownHookManager where Runtime.getRuntime.…
…addShutdownHook() is called SPARK-9886 fixed ExternalBlockStore.scala This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook() Author: tedyu <yuzhihong@gmail.com> Closes #10325 from ted-yu/master. (cherry picked from commit f590178) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
1Configuration menu - View commit details
-
Copy full SHA for 4af6438 - Browse repository at this point
Copy the full SHA 4af6438View commit details -
[SPARK-12186][WEB UI] Send the complete request URI including the que…
…ry string when redirecting. Author: Rohit Agarwal <rohita@qubole.com> Closes #10180 from mindprince/SPARK-12186. (cherry picked from commit fdb3822) Signed-off-by: Andrew Or <andrew@databricks.com>
Rohit Agarwal authored and Andrew Or committedDec 17, 2015 Configuration menu - View commit details
-
Copy full SHA for 154567d - Browse repository at this point
Copy the full SHA 154567dView commit details -
[SPARK-12386][CORE] Fix NPE when spark.executor.port is set.
Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10339 from vanzin/SPARK-12386. (cherry picked from commit d1508dd) Signed-off-by: Andrew Or <andrew@databricks.com>
Marcelo Vanzin authored and Andrew Or committedDec 17, 2015 Configuration menu - View commit details
-
Copy full SHA for 4ad0803 - Browse repository at this point
Copy the full SHA 4ad0803View commit details -
[SPARK-12057][SQL] Prevent failure on corrupt JSON records
This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference. Regarding the schema inference change, if we have something like ``` {"f1":1} [1,2,3] ``` originally, we will get a DF without any column. After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`. When merge this PR, please make sure that the author is simplyianm. JIRA: https://issues.apache.org/jira/browse/SPARK-12057 Closes #10043 Author: Ian Macalinao <me@ian.pw> Author: Yin Huai <yhuai@databricks.com> Closes #10288 from yhuai/handleCorruptJson. (cherry picked from commit 9d66c42) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for d509194 - Browse repository at this point
Copy the full SHA d509194View commit details -
Once driver register successfully, stop it to connect to master.
This commit is to resolve SPARK-12396. Author: echo2mei <534384876@qq.com> Closes #10354 from echoTomei/master. (cherry picked from commit 5a514b6) Signed-off-by: Davies Liu <davies.liu@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for da7542f - Browse repository at this point
Copy the full SHA da7542fView commit details -
Revert "Once driver register successfully, stop it to connect to mast…
…er." This reverts commit da7542f.
Configuration menu - View commit details
-
Copy full SHA for a846648 - Browse repository at this point
Copy the full SHA a846648View commit details -
[SPARK-12395] [SQL] fix resulting columns of outer join
For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null). The order of columns had been changed to match that with MySQL and PostgreSQL [1]. This PR also fix the nullability of output for outer join. [1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html Author: Davies Liu <davies@databricks.com> Closes #10353 from davies/fix_join. (cherry picked from commit a170d34) Signed-off-by: Davies Liu <davies.liu@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 1ebedb2 - Browse repository at this point
Copy the full SHA 1ebedb2View commit details -
[SQL] Update SQLContext.read.text doc
Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10349 from yanboliang/text-value. (cherry picked from commit 6e07716) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 41ad8ac - Browse repository at this point
Copy the full SHA 41ad8acView commit details -
[SPARK-12220][CORE] Make Utils.fetchFile support files that contain s…
Configuration menu - View commit details
-
Copy full SHA for 1fbca41 - Browse repository at this point
Copy the full SHA 1fbca41View commit details -
[SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos REST…
… server Fix problem with #10332, this one should fix Cluster mode on Mesos Author: Iulian Dragos <jaguarul@gmail.com> Closes #10359 from dragos/issue/fix-spark-12345-one-more-time. (cherry picked from commit 8184568) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Configuration menu - View commit details
-
Copy full SHA for 881f254 - Browse repository at this point
Copy the full SHA 881f254View commit details -
[SPARK-12390] Clean up unused serializer parameter in BlockManager
No change in functionality is intended. This only changes internal API. Author: Andrew Or <andrew@databricks.com> Closes #10343 from andrewor14/clean-bm-serializer. Conflicts: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
Andrew Or committedDec 17, 2015 Configuration menu - View commit details
-
Copy full SHA for 88bbb54 - Browse repository at this point
Copy the full SHA 88bbb54View commit details -
[SPARK-12410][STREAMING] Fix places that use '.' and '|' directly in …
Configuration menu - View commit details
-
Copy full SHA for c0ab14f - Browse repository at this point
Copy the full SHA c0ab14fView commit details -
[SPARK-12397][SQL] Improve error messages for data sources when they …
…are not found Point users to spark-packages.org to find them. Author: Reynold Xin <rxin@databricks.com> Closes #10351 from rxin/SPARK-12397. (cherry picked from commit e096a65) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 48dcee4 - Browse repository at this point
Copy the full SHA 48dcee4View commit details -
[SPARK-12376][TESTS] Spark Streaming Java8APISuite fails in assertOrd…
…erInvariantEquals method org.apache.spark.streaming.Java8APISuite.java is failing due to trying to sort immutable list in assertOrderInvariantEquals method. Author: Evan Chen <chene@us.ibm.com> Closes #10336 from evanyc15/SPARK-12376-StreamingJavaAPISuite.
Configuration menu - View commit details
-
Copy full SHA for 4df1dd4 - Browse repository at this point
Copy the full SHA 4df1dd4View commit details
Commits on Dec 18, 2015
-
[SPARK-11749][STREAMING] Duplicate creating the RDD in file stream wh…
…en recovering from checkpoint data Add a transient flag `DStream.restoredFromCheckpointData` to control the restore processing in DStream to avoid duplicate works: check this flag first in `DStream.restoreCheckpointData`, only when `false`, the restore process will be executed. Author: jhu-chang <gt.hu.chang@gmail.com> Closes #9765 from jhu-chang/SPARK-11749. (cherry picked from commit f4346f6) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9177ea3 - Browse repository at this point
Copy the full SHA 9177ea3View commit details -
[SPARK-12413] Fix Mesos ZK persistence
I believe this fixes SPARK-12413. I'm currently running an integration test to verify. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #10366 from mgummelt/fix-zk-mesos. (cherry picked from commit 2bebaa3) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Configuration menu - View commit details
-
Copy full SHA for df02319 - Browse repository at this point
Copy the full SHA df02319View commit details -
[SPARK-12218][SQL] Invalid splitting of nested AND expressions in Dat…
…a Source filter API JIRA: https://issues.apache.org/jira/browse/SPARK-12218 When creating filters for Parquet/ORC, we should not push nested AND expressions partially. Author: Yin Huai <yhuai@databricks.com> Closes #10362 from yhuai/SPARK-12218. (cherry picked from commit 41ee7c5) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1dc71ec - Browse repository at this point
Copy the full SHA 1dc71ecView commit details -
Revert "[SPARK-12365][CORE] Use ShutdownHookManager where Runtime.get…
…Runtime.addShutdownHook() is called" This reverts commit 4af6438.
Andrew Or committedDec 18, 2015 Configuration menu - View commit details
-
Copy full SHA for 3b903e4 - Browse repository at this point
Copy the full SHA 3b903e4View commit details -
[SPARK-12404][SQL] Ensure objects passed to StaticInvoke is Serializable
Now `StaticInvoke` receives `Any` as a object and `StaticInvoke` can be serialized but sometimes the object passed is not serializable. For example, following code raises Exception because `RowEncoder#extractorsFor` invoked indirectly makes `StaticInvoke`. ``` case class TimestampContainer(timestamp: java.sql.Timestamp) val rdd = sc.parallelize(1 to 2).map(_ => TimestampContainer(System.currentTimeMillis)) val df = rdd.toDF val ds = df.as[TimestampContainer] val rdd2 = ds.rdd <----------------- invokes extractorsFor indirectory ``` I'll add test cases. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Michael Armbrust <michael@databricks.com> Closes #10357 from sarutak/SPARK-12404. (cherry picked from commit 6eba655) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for bd33d4e - Browse repository at this point
Copy the full SHA bd33d4eView commit details -
Configuration menu - View commit details
-
Copy full SHA for eca401e - Browse repository at this point
Copy the full SHA eca401eView commit details
Commits on Dec 19, 2015
-
Configuration menu - View commit details
-
Copy full SHA for d6a519f - Browse repository at this point
Copy the full SHA d6a519fView commit details
Commits on Dec 21, 2015
-
Configuration menu - View commit details
-
Copy full SHA for c754a08 - Browse repository at this point
Copy the full SHA c754a08View commit details -
[SPARK-12466] Fix harmless NPE in tests
``` [info] ReplayListenerSuite: [info] - Simple replay (58 milliseconds) java.lang.NullPointerException at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982) at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980) ``` https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests). Tested locally to verify that the NPE is gone. Author: Andrew Or <andrew@databricks.com> Closes #10417 from andrewor14/fix-harmless-npe. (cherry picked from commit d655d37) Signed-off-by: Andrew Or <andrew@databricks.com>
Andrew Or committedDec 21, 2015 Configuration menu - View commit details
-
Copy full SHA for ca39985 - Browse repository at this point
Copy the full SHA ca39985View commit details
Commits on Dec 22, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 4062cda - Browse repository at this point
Copy the full SHA 4062cdaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b19e7c - Browse repository at this point
Copy the full SHA 5b19e7cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 309ef35 - Browse repository at this point
Copy the full SHA 309ef35View commit details -
[SPARK-11823][SQL] Fix flaky JDBC cancellation test in HiveThriftBina…
…ryServerSuite This patch fixes a flaky "test jdbc cancel" test in HiveThriftBinaryServerSuite. This test is prone to a race-condition which causes it to block indefinitely with while waiting for an extremely slow query to complete, which caused many Jenkins builds to time out. For more background, see my comments on #6207 (the PR which introduced this test). Author: Josh Rosen <joshrosen@databricks.com> Closes #10425 from JoshRosen/SPARK-11823. (cherry picked from commit 2235cd4) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0f905d7 - Browse repository at this point
Copy the full SHA 0f905d7View commit details -
[SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler
Configuration menu - View commit details
-
Copy full SHA for 94fb5e8 - Browse repository at this point
Copy the full SHA 94fb5e8View commit details
Commits on Dec 23, 2015
-
[SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example f…
…or Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10385 from zsxwing/accumulator-broadcast-example. (cherry picked from commit 20591af) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 942c057 - Browse repository at this point
Copy the full SHA 942c057View commit details -
[SPARK-12477][SQL] - Tungsten projection fails for null values in arr…
…ay fields Accessing null elements in an array field fails when tungsten is enabled. It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled. This PR solves this by checking if the accessed element in the array field is null, in the generated code. Example: ``` // Array of String case class AS( as: Seq[String] ) val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF dfAS.registerTempTable("T_AS") for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))} ``` With Tungsten disabled: ``` 0 = [a] 1 = [null] 2 = [b] ``` With Tungsten enabled: ``` 0 = [a] 15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ``` Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com> Closes #10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array. (cherry picked from commit 43b2a63) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for c6c9bf9 - Browse repository at this point
Copy the full SHA c6c9bf9View commit details
Commits on Dec 24, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 5987b16 - Browse repository at this point
Copy the full SHA 5987b16View commit details -
[SPARK-12411][CORE] Decrease executor heartbeat timeout to match hear…
…tbeat interval Previously, the rpc timeout was the default network timeout, which is the same value the driver uses to determine dead executors. This means if there is a network issue, the executor is determined dead after one heartbeat attempt. There is a separate config for the heartbeat interval which is a better value to use for the heartbeat RPC. With this change, the executor will make multiple heartbeat attempts even with RPC issues. Author: Nong Li <nong@databricks.com> Closes #10365 from nongli/spark-12411.
Configuration menu - View commit details
-
Copy full SHA for b49856a - Browse repository at this point
Copy the full SHA b49856aView commit details -
[SPARK-12502][BUILD][PYTHON] Script /dev/run-tests fails when IBM Jav…
…a is used fix an exception with IBM JDK by removing update field from a JavaVersion tuple. This is because IBM JDK does not have information on update '_xx' Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10463 from kiszk/SPARK-12502. (cherry picked from commit 9e85bb7) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Configuration menu - View commit details
-
Copy full SHA for 4dd8712 - Browse repository at this point
Copy the full SHA 4dd8712View commit details -
[SPARK-12010][SQL] Spark JDBC requires support for column-name-free I…
…NSERT syntax In the past Spark JDBC write only worked with technologies which support the following INSERT statement syntax (JdbcUtils.scala: insertStatement()): INSERT INTO $table VALUES ( ?, ?, ..., ? ) But some technologies require a list of column names: INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? ) This was blocking the use of e.g. the Progress JDBC Driver for Cassandra. Another limitation is that syntax 1 relies no the dataframe field ordering match that of the target table. This works fine, as long as the target table has been created by writer.jdbc(). If the target table contains more columns (not created by writer.jdbc()), then the insert fails due mismatch of number of columns or their data types. This PR switches to the recommended second INSERT syntax. Column names are taken from datafram field names. Author: CK50 <christian.kurz@oracle.com> Closes #10380 from CK50/master-SPARK-12010-2. (cherry picked from commit 502476e) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 865dd8b - Browse repository at this point
Copy the full SHA 865dd8bView commit details
Commits on Dec 28, 2015
-
[SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equ…
…i-Join After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code. For example, users can do the Equi-Join like ```df.join(df2, 'name', 'outer').select('name', 'height').collect()``` - There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`). - After a PR: #8600, the 1.6 does not have such an issue, but the description has not been updated. Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join. Author: gatorsmile <gatorsmile@gmail.com> Closes #10477 from gatorsmile/pyOuterJoin.
Configuration menu - View commit details
-
Copy full SHA for b8da77e - Browse repository at this point
Copy the full SHA b8da77eView commit details -
[SPARK-12517] add default RDD name for one created via sc.textFile
The feature was first added at commit: 7b877b2 but was later removed (probably by mistake) at commit: fc8b581. This change sets the default path of RDDs created via sc.textFile(...) to the path argument. Here is the symptom: * Using spark-1.5.2-bin-hadoop2.6: scala> sc.textFile("/home/root/.bashrc").name res5: String = null scala> sc.binaryFiles("/home/root/.bashrc").name res6: String = /home/root/.bashrc * while using Spark 1.3.1: scala> sc.textFile("/home/root/.bashrc").name res0: String = /home/root/.bashrc scala> sc.binaryFiles("/home/root/.bashrc").name res1: String = /home/root/.bashrc Author: Yaron Weinsberg <wyaron@gmail.com> Author: yaron <yaron@il.ibm.com> Closes #10456 from wyaron/master. (cherry picked from commit 73b70f0) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Configuration menu - View commit details
-
Copy full SHA for 1fbcb6e - Browse repository at this point
Copy the full SHA 1fbcb6eView commit details -
[SPARK-12424][ML] The implementation of ParamMap#filter is wrong.
ParamMap#filter uses `mutable.Map#filterKeys`. The return type of `filterKey` is collection.Map, not mutable.Map but the result is casted to mutable.Map using `asInstanceOf` so we get `ClassCastException`. Also, the return type of Map#filterKeys is not Serializable. It's the issue of Scala (https://issues.scala-lang.org/browse/SI-6654). Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10381 from sarutak/SPARK-12424. (cherry picked from commit 07165ca) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Configuration menu - View commit details
-
Copy full SHA for 7c7d76f - Browse repository at this point
Copy the full SHA 7c7d76fView commit details -
[SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer t…
…hrow Buffer underflow exception Since we only need to implement `def skipBytes(n: Int)`, code in #10213 could be simplified. davies scwf Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #10253 from adrian-wang/kryo. (cherry picked from commit a6d3853) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Configuration menu - View commit details
-
Copy full SHA for a9c52d4 - Browse repository at this point
Copy the full SHA a9c52d4View commit details -
[SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs
Include the following changes: 1. Close `java.sql.Statement` 2. Fix incorrect `asInstanceOf`. 3. Remove unnecessary `synchronized` and `ReentrantLock`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10440 from zsxwing/findbugs. (cherry picked from commit 710b411) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for fd20248 - Browse repository at this point
Copy the full SHA fd20248View commit details
Commits on Dec 29, 2015
-
[SPARK-11394][SQL] Throw IllegalArgumentException for unsupported typ…
…es in postgresql If DataFrame has BYTE types, throws an exception: org.postgresql.util.PSQLException: ERROR: type "byte" does not exist Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #9350 from maropu/FixBugInPostgreJdbc. (cherry picked from commit 73862a1) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 85a8718 - Browse repository at this point
Copy the full SHA 85a8718View commit details -
[SPARK-12526][SPARKR] ifelse
,
when,
otherwise` unable to take Col……umn as value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <forest.fang@outlook.com> Closes #10481 from saurfang/spark-12526. (cherry picked from commit d80cc90) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for c069ffc - Browse repository at this point
Copy the full SHA c069ffcView commit details
Commits on Dec 30, 2015
-
[SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collections
Current schema inference for local python collections halts as soon as there are no NullTypes. This is different than when we specify a sampling ratio of 1.0 on a distributed collection. This could result in incomplete schema information. Author: Holden Karau <holden@us.ibm.com> Closes #10275 from holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections. (cherry picked from commit d1ca634) Signed-off-by: Davies Liu <davies.liu@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 8dc6549 - Browse repository at this point
Copy the full SHA 8dc6549View commit details -
[SPARK-12399] Display correct error message when accessing REST API w…
…ith an unknown app Id I got an exception when accessing the below REST API with an unknown application Id. `http://<server-url>:18080/api/v1/applications/xxx/jobs` Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx` ``` org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116) at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226) at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46) at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66) ``` Author: Carson Wang <carson.wang@intel.com> Closes #10352 from carsonwang/unknownAppFix. (cherry picked from commit b244297) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for cd86075 - Browse repository at this point
Copy the full SHA cd86075View commit details
Commits on Jan 3, 2016
-
[SPARK-12327][SPARKR] fix code for lintr warning for commented code
Configuration menu - View commit details
-
Copy full SHA for 4e9dd16 - Browse repository at this point
Copy the full SHA 4e9dd16View commit details
Commits on Jan 4, 2016
-
[SPARK-12562][SQL] DataFrame.write.format(text) requires the column n…
…ame to be called value Author: Xiu Guo <xguo27@gmail.com> Closes #10515 from xguo27/SPARK-12562. (cherry picked from commit 84f8492) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f7a3223 - Browse repository at this point
Copy the full SHA f7a3223View commit details -
[SPARK-12486] Worker should kill the executors more forcefully if pos…
…sible. This patch updates the ExecutorRunner's terminate path to use the new java 8 API to terminate processes more forcefully if possible. If the executor is unhealthy, it would previously ignore the destroy() call. Presumably, the new java API was added to handle cases like this. We could update the termination path in the future to use OS specific commands for older java versions. Author: Nong Li <nong@databricks.com> Closes #10438 from nongli/spark-12486-executors. (cherry picked from commit 8f65939) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for cd02038 - Browse repository at this point
Copy the full SHA cd02038View commit details -
[SPARK-12470] [SQL] Fix size reduction calculation
also only allocate required buffer size Author: Pete Robbins <robbinspg@gmail.com> Closes #10421 from robbinspg/master. (cherry picked from commit b504b6a) Signed-off-by: Davies Liu <davies.liu@gmail.com> Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala
Configuration menu - View commit details
-
Copy full SHA for b5a1f56 - Browse repository at this point
Copy the full SHA b5a1f56View commit details -
[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (databricks/spark-redshift#143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence. (cherry picked from commit 6c83d93) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 7f37c1e - Browse repository at this point
Copy the full SHA 7f37c1eView commit details -
[DOC] Adjust coverage for partitionBy()
This is the related thread: http://search-hadoop.com/m/q3RTtO3ReeJ1iF02&subj=Re+partitioning+json+data+in+spark Michael suggested fixing the doc. Please review. Author: tedyu <yuzhihong@gmail.com> Closes #10499 from ted-yu/master. (cherry picked from commit 40d0396) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1005ee3 - Browse repository at this point
Copy the full SHA 1005ee3View commit details -
[SPARK-12589][SQL] Fix UnsafeRowParquetRecordReader to properly set t…
…he row length. The reader was previously not setting the row length meaning it was wrong if there were variable length columns. This problem does not manifest usually, since the value in the column is correct and projecting the row fixes the issue. Author: Nong Li <nong@databricks.com> Closes #10576 from nongli/spark-12589. (cherry picked from commit 34de24a) Signed-off-by: Yin Huai <yhuai@databricks.com> Conflicts: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
Configuration menu - View commit details
-
Copy full SHA for 8ac9198 - Browse repository at this point
Copy the full SHA 8ac9198View commit details
Commits on Jan 5, 2016
-
[SPARKR][DOC] minor doc update for version in migration guide
Configuration menu - View commit details
-
Copy full SHA for 8950482 - Browse repository at this point
Copy the full SHA 8950482View commit details -
Configuration menu - View commit details
-
Copy full SHA for d9e4438 - Browse repository at this point
Copy the full SHA d9e4438View commit details -
[SPARK-12647][SQL] Fix o.a.s.sqlexecution.ExchangeCoordinatorSuite.de…
…termining the number of reducers: aggregate operator change expected partition sizes Author: Pete Robbins <robbinspg@gmail.com> Closes #10599 from robbinspg/branch-1.6.
Configuration menu - View commit details
-
Copy full SHA for 5afa62b - Browse repository at this point
Copy the full SHA 5afa62bView commit details -
[SPARK-12617] [PYSPARK] Clean up the leak sockets of Py4J
This patch added Py4jCallbackConnectionCleaner to clean the leak sockets of Py4J every 30 seconds. This is a workaround before Py4J fixes the leak issue py4j/py4j#187 Author: Shixiong Zhu <shixiong@databricks.com> Closes #10579 from zsxwing/SPARK-12617. (cherry picked from commit 047a31b) Signed-off-by: Davies Liu <davies.liu@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for f31d0fd - Browse repository at this point
Copy the full SHA f31d0fdView commit details -
[SPARK-12511] [PYSPARK] [STREAMING] Make sure PythonDStream.registerS…
…erializer is called only once There is an issue that Py4J's PythonProxyHandler.finalize blocks forever. (py4j/py4j#184) Py4j will create a PythonProxyHandler in Java for "transformer_serializer" when calling "registerSerializer". If we call "registerSerializer" twice, the second PythonProxyHandler will override the first one, then the first one will be GCed and trigger "PythonProxyHandler.finalize". To avoid that, we should not call"registerSerializer" more than once, so that "PythonProxyHandler" in Java side won't be GCed. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10514 from zsxwing/SPARK-12511. (cherry picked from commit 6cfe341) Signed-off-by: Davies Liu <davies.liu@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 83fe5cf - Browse repository at this point
Copy the full SHA 83fe5cfView commit details -
[SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans
SPARK-12450 . Un-persist broadcasted variables in KMeans. Author: RJ Nowling <rnowling@gmail.com> Closes #10415 from rnowling/spark-12450. (cherry picked from commit 78015a8) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0afad66 - Browse repository at this point
Copy the full SHA 0afad66View commit details -
[SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk
Successfully ran kinesis demo on a live, aws hosted kinesis stream against master and 1.6 branches. For reasons I don't entirely understand it required a manual merge to 1.5 which I did as shown here: BrianLondon@075c22e The demo ran successfully on the 1.5 branch as well. According to `mvn dependency:tree` it is still pulling a fairly old version of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis regression in 1.5.2. Author: BrianLondon <brian@seatgeek.com> Closes #10492 from BrianLondon/remove-only. (cherry picked from commit ff89975) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for bf3dca2 - Browse repository at this point
Copy the full SHA bf3dca2View commit details
Commits on Jan 6, 2016
-
[SPARK-12393][SPARKR] Add read.text and write.text for SparkR
Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10348 from yanboliang/spark-12393. (cherry picked from commit d1fea41) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for c3135d0 - Browse repository at this point
Copy the full SHA c3135d0View commit details -
[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting `initialModel.weights` to `list`. Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9986 from zero323/SPARK-12006. (cherry picked from commit fcd013c) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1756819 - Browse repository at this point
Copy the full SHA 1756819View commit details -
[SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming
Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10621 from zsxwing/SPARK-12617-2. (cherry picked from commit 1e6648d) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for d821fae - Browse repository at this point
Copy the full SHA d821faeView commit details -
[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of defau…
…lt root path to gain the streaming batch url. Author: huangzhaowei <carlmartinmax@gmail.com> Closes #10617 from SaintBacchus/SPARK-12672.
Configuration menu - View commit details
-
Copy full SHA for 8f0ead3 - Browse repository at this point
Copy the full SHA 8f0ead3View commit details -
Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead …
Configuration menu - View commit details
-
Copy full SHA for 39b0a34 - Browse repository at this point
Copy the full SHA 39b0a34View commit details
Commits on Jan 7, 2016
-
[SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in…
… pyspark JIRA: https://issues.apache.org/jira/browse/SPARK-12016 We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper when loading it in pyspark. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #10100 from viirya/fix-load-py-wordvecmodel. (cherry picked from commit b51a4cd) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 11b901b - Browse repository at this point
Copy the full SHA 11b901bView commit details -
[SPARK-12673][UI] Add missing uri prepending for job description
Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot: ![screen shot 2016-01-06 at 5 28 26 pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png) Author: jerryshao <sshao@hortonworks.com> Closes #10618 from jerryshao/SPARK-12673. (cherry picked from commit 174e72c) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 94af69c - Browse repository at this point
Copy the full SHA 94af69cView commit details -
[SPARK-12678][CORE] MapPartitionsRDD clearDependencies
MapPartitionsRDD was keeping a reference to `prev` after a call to `clearDependencies` which could lead to memory leak. Author: Guillaume Poulin <poulin.guillaume@gmail.com> Closes #10623 from gpoulin/map_partition_deps. (cherry picked from commit b673852) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for d061b85 - Browse repository at this point
Copy the full SHA d061b85View commit details -
Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is …
Configuration menu - View commit details
-
Copy full SHA for 34effc4 - Browse repository at this point
Copy the full SHA 34effc4View commit details -
[DOC] fix 'spark.memory.offHeap.enabled' default value to false
Configuration menu - View commit details
-
Copy full SHA for 47a58c7 - Browse repository at this point
Copy the full SHA 47a58c7View commit details -
[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
If initial model passed to GMM is not empty it causes net.razorvine.pickle.PickleException. It can be fixed by converting initialModel.weights to list. Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #10644 from zero323/SPARK-12006. (cherry picked from commit 592f649) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 69a885a - Browse repository at this point
Copy the full SHA 69a885aView commit details -
[SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overla…
…pping splits https://issues.apache.org/jira/browse/SPARK-12662 cc yhuai Author: Sameer Agarwal <sameer@databricks.com> Closes #10626 from sameeragarwal/randomsplit. (cherry picked from commit f194d99) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 017b73e - Browse repository at this point
Copy the full SHA 017b73eView commit details -
[SPARK-12598][CORE] bug in setMinPartitions
There is a bug in the calculation of ```maxSplitSize```. The ```totalLen``` should be divided by ```minPartitions``` and not by ```files.size```. Author: Darek Blasiak <darek.blasiak@640labs.com> Closes #10546 from datafarmer/setminpartitionsbug. (cherry picked from commit 8346518) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 6ef8235 - Browse repository at this point
Copy the full SHA 6ef8235View commit details
Commits on Jan 8, 2016
-
[SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and all…
Configuration menu - View commit details
-
Copy full SHA for a7c3636 - Browse repository at this point
Copy the full SHA a7c3636View commit details -
[SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (…
…branch 1.6) backport #10609 to branch 1.6 Author: Shixiong Zhu <shixiong@databricks.com> Closes #10656 from zsxwing/SPARK-12591-branch-1.6.
Configuration menu - View commit details
-
Copy full SHA for 0d96c54 - Browse repository at this point
Copy the full SHA 0d96c54View commit details -
Configuration menu - View commit details
-
Copy full SHA for fe2cf34 - Browse repository at this point
Copy the full SHA fe2cf34View commit details -
Configuration menu - View commit details
-
Copy full SHA for e4227cb - Browse repository at this point
Copy the full SHA e4227cbView commit details -
[SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…
…s on secure Hadoop https://issues.apache.org/jira/browse/SPARK-12654 So the bug here is that WholeTextFileRDD.getPartitions has: val conf = getConf in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext. The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #10651 from tgravescs/SPARK-12654. (cherry picked from commit 553fd7b) Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
Configuration menu - View commit details
-
Copy full SHA for faf094c - Browse repository at this point
Copy the full SHA faf094cView commit details -
[SPARK-12696] Backport Dataset Bug fixes to 1.6
We've fixed a lot of bugs in master, and since this is experimental in 1.6 we should consider back porting the fixes. The only thing that is obviously risky to me is 0e07ed3, we might try to remove that. Author: Wenchen Fan <wenchen@databricks.com> Author: gatorsmile <gatorsmile@gmail.com> Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Cheng Lian <lian@databricks.com> Author: Nong Li <nong@databricks.com> Closes #10650 from marmbrus/dataset-backports.
Configuration menu - View commit details
-
Copy full SHA for a619050 - Browse repository at this point
Copy the full SHA a619050View commit details
Commits on Jan 9, 2016
-
[SPARK-12645][SPARKR] SparkR support hash function
Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645. (cherry picked from commit 3d77cff) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for 8b5f230 - Browse repository at this point
Copy the full SHA 8b5f230View commit details
Commits on Jan 10, 2016
-
[SPARK-10359][PROJECT-INFRA] Backport dev/test-dependencies script to…
Configuration menu - View commit details
-
Copy full SHA for 7903b06 - Browse repository at this point
Copy the full SHA 7903b06View commit details
Commits on Jan 11, 2016
-
[SPARK-12734][BUILD] Backport Netty exclusion + Maven enforcer fixes …
Configuration menu - View commit details
-
Copy full SHA for 43b72d8 - Browse repository at this point
Copy the full SHA 43b72d8View commit details -
removed lambda from sortByKey()
According to the documentation the sortByKey method does not take a lambda as an argument, thus the example is flawed. Removed the argument completely as this will default to ascending sort. Author: Udo Klein <git@blinkenlight.net> Closes #10640 from udoklein/patch-1. (cherry picked from commit bd723bd) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for d4cfd2a - Browse repository at this point
Copy the full SHA d4cfd2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for ce906b3 - Browse repository at this point
Copy the full SHA ce906b3View commit details -
[SPARK-12734][HOTFIX] Build changes must trigger all tests; clean aft…
…er install in dep tests This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script. First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed. I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests. /cc zsxwing Author: Josh Rosen <joshrosen@databricks.com> Closes #10704 from JoshRosen/fix-build-test-problems. (cherry picked from commit a449914) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 3b32aa9 - Browse repository at this point
Copy the full SHA 3b32aa9View commit details -
[SPARK-12758][SQL] add note to Spark SQL Migration guide about Timest…
…ampType casting Warning users about casting changes. Author: Brandon Bradley <bradleytastic@gmail.com> Closes #10708 from blbradley/spark-12758. (cherry picked from commit a767ee8) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for dd2cf64 - Browse repository at this point
Copy the full SHA dd2cf64View commit details
Commits on Jan 12, 2016
-
[SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancel
https://issues.apache.org/jira/browse/SPARK-11823 This test often hangs and times out, leaving hanging processes. Let's ignore it for now and improve the test. Author: Yin Huai <yhuai@databricks.com> Closes #10715 from yhuai/SPARK-11823-ignore. (cherry picked from commit aaa2c3b) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for a6c9c68 - Browse repository at this point
Copy the full SHA a6c9c68View commit details -
[SPARK-12638][API DOC] Parameter explanation not very accurate for rd…
…d function "aggregate" Currently, RDD function aggregate's parameter doesn't explain well, especially parameter "zeroValue". It's helpful to let junior scala user know that "zeroValue" attend both "seqOp" and "combOp" phase. Author: Tommy YU <tummyyu@163.com> Closes #10587 from Wenpei/rdd_aggregate_doc. (cherry picked from commit 9f0995b) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 46fc7a1 - Browse repository at this point
Copy the full SHA 46fc7a1View commit details -
[SPARK-12582][TEST] IndexShuffleBlockResolverSuite fails in windows
[SPARK-12582][Test] IndexShuffleBlockResolverSuite fails in windows * IndexShuffleBlockResolverSuite fails in windows due to file is not closed. * mv IndexShuffleBlockResolverSuite.scala from "test/java" to "test/scala". https://issues.apache.org/jira/browse/SPARK-12582 Author: Yucai Yu <yucai.yu@intel.com> Closes #10526 from yucai/master. (cherry picked from commit 7e15044) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 3221a7d - Browse repository at this point
Copy the full SHA 3221a7dView commit details -
[SPARK-5273][MLLIB][DOCS] Improve documentation examples for LinearRe…
…gression Use a much smaller step size in LinearRegressionWithSGD MLlib examples to achieve a reasonable RMSE. Our training folks hit this exact same issue when concocting an example and had the same solution. Author: Sean Owen <sowen@cloudera.com> Closes #10675 from srowen/SPARK-5273. (cherry picked from commit 9c7f34a) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 4c67d55 - Browse repository at this point
Copy the full SHA 4c67d55View commit details -
[SPARK-7615][MLLIB] MLLIB Word2Vec wordVectors divided by Euclidean N…
…orm equals to zero Cosine similarity with 0 vector should be 0 Related to #10152 Author: Sean Owen <sowen@cloudera.com> Closes #10696 from srowen/SPARK-7615. (cherry picked from commit c48f2a3) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 94b39f7 - Browse repository at this point
Copy the full SHA 94b39f7View commit details -
Revert "[SPARK-12645][SPARKR] SparkR support hash function"
This reverts commit 8b5f230.
Configuration menu - View commit details
-
Copy full SHA for 03e523e - Browse repository at this point
Copy the full SHA 03e523eView commit details
Commits on Jan 13, 2016
-
[HOT-FIX] bypass hive test when parse logical plan to json
#10311 introduces some rare, non-deterministic flakiness for hive udf tests, see #10311 (comment) I can't reproduce it locally, and may need more time to investigate, a quick solution is: bypass hive tests for json serialization. Author: Wenchen Fan <wenchen@databricks.com> Closes #10430 from cloud-fan/hot-fix. (cherry picked from commit 8543997) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f71e5cc - Browse repository at this point
Copy the full SHA f71e5ccView commit details -
[SPARK-12558][SQL] AnalysisException when multiple functions applied …
…in GROUP BY clause cloud-fan Can you please take a look ? In this case, we are failing during check analysis while validating the aggregation expression. I have added a semanticEquals for HiveGenericUDF to fix this. Please let me know if this is the right way to address this issue. Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #10520 from dilipbiswal/spark-12558. (cherry picked from commit dc7b387) Signed-off-by: Yin Huai <yhuai@databricks.com> Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
Configuration menu - View commit details
-
Copy full SHA for dcdc864 - Browse repository at this point
Copy the full SHA dcdc864View commit details -
Configuration menu - View commit details
-
Copy full SHA for f9ecd3a - Browse repository at this point
Copy the full SHA f9ecd3aView commit details -
[SPARK-12685][MLLIB][BACKPORT TO 1.4] word2vec trainWordsCount gets o…
…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <hhbyyh@gmail.com> Closes #10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 364f799 - Browse repository at this point
Copy the full SHA 364f799View commit details -
[SPARK-12268][PYSPARK] Make pyspark shell pythonstartup work under py…
…thon3 This replaces the `execfile` used for running custom python shell scripts with explicit open, compile and exec (as recommended by 2to3). The reason for this change is to make the pythonstartup option compatible with python3. Author: Erik Selin <erik.selin@gmail.com> Closes #10255 from tyro89/pythonstartup-python3. (cherry picked from commit e4e0b3f) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for cf6d506 - Browse repository at this point
Copy the full SHA cf6d506View commit details -
[SPARK-12690][CORE] Fix NPE in UnsafeInMemorySorter.free()
I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it. ``` ERROR spark.TaskContextImpl: Error in TaskCompletionListener java.lang.NullPointerException at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ``` Author: Carson Wang <carson.wang@intel.com> Closes #10637 from carsonwang/FixNPE. (cherry picked from commit eabc7b8) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 26f13fa - Browse repository at this point
Copy the full SHA 26f13faView commit details
Commits on Jan 14, 2016
-
[SPARK-12026][MLLIB] ChiSqTest gets slower and slower over time when …
…number of features is large jira: https://issues.apache.org/jira/browse/SPARK-12026 The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger. I tested on local and the change can improve the performance and the running time was stable. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #10146 from hhbyyh/chiSq. (cherry picked from commit 021dafc) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for a490787 - Browse repository at this point
Copy the full SHA a490787View commit details -
[SPARK-9844][CORE] File appender race condition during shutdown
When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed. Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop. This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception. Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur. Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844. (cherry picked from commit 56cdbd6) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 0c67993 - Browse repository at this point
Copy the full SHA 0c67993View commit details -
[SPARK-12784][UI] Fix Spark UI IndexOutOfBoundsException with dynamic…
… allocation Add `listener.synchronized` to get `storageStatusList` and `execInfo` atomically. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10728 from zsxwing/SPARK-12784. (cherry picked from commit 501e99e) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for d1855ad - Browse repository at this point
Copy the full SHA d1855adView commit details
Commits on Jan 15, 2016
-
[SPARK-12708][UI] Sorting task error in Stages Page when yarn mode.
If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message. ![spark-12708](https://cloud.githubusercontent.com/assets/6679275/12193320/80814f8c-b62a-11e5-9914-7bf3907029df.png) It's similar to SPARK-4313 . Author: root <root@R520T1.(none)> Author: Koyo Yoshida <koyo0615@gmail.com> Closes #10663 from yoshidakuy/SPARK-12708. (cherry picked from commit 32cca93) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Configuration menu - View commit details
-
Copy full SHA for d23e57d - Browse repository at this point
Copy the full SHA d23e57dView commit details -
[SPARK-11031][SPARKR] Method str() on a DataFrame
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Closes #9613 from olarayej/SPARK-11031. (cherry picked from commit ba4a641) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for 5a00528 - Browse repository at this point
Copy the full SHA 5a00528View commit details -
[SPARK-12701][CORE] FileAppender should use join to ensure writing th…
…read completion Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701. (cherry picked from commit ea104b8) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 7733668 - Browse repository at this point
Copy the full SHA 7733668View commit details
Commits on Jan 16, 2016
-
[SPARK-12722][DOCS] Fixed typo in Pipeline example
http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline ``` val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model") ``` should be ``` val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model") ``` cc: jkbradley Author: Jeff Lam <sha0lin@alumni.carnegiemellon.edu> Closes #10769 from Agent007/SPARK-12722. (cherry picked from commit 86972fa) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 5803fce - Browse repository at this point
Copy the full SHA 5803fceView commit details
Commits on Jan 18, 2016
-
[SPARK-12558][FOLLOW-UP] AnalysisException when multiple functions ap…
…plied in GROUP BY clause Addresses the comments from Yin. #10520 Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #10758 from dilipbiswal/spark-12558-followup. (cherry picked from commit db9a860) Signed-off-by: Yin Huai <yhuai@databricks.com> Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
Configuration menu - View commit details
-
Copy full SHA for 53184ce - Browse repository at this point
Copy the full SHA 53184ceView commit details -
[SPARK-12346][ML] Missing attribute names in GLM for vector-type feat…
…ures Currently `summary()` fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing `VectorAssembler` to make up suitable names when inputs are missing names. cc mengxr Author: Eric Liang <ekl@databricks.com> Closes #10323 from ericl/spark-12346. (cherry picked from commit 5e492e9) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 8c2b67f - Browse repository at this point
Copy the full SHA 8c2b67fView commit details -
[SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume i…
…ntegration doc This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10746 from zsxwing/flume-doc. (cherry picked from commit a973f48) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 7482c7b - Browse repository at this point
Copy the full SHA 7482c7bView commit details
Commits on Jan 19, 2016
-
[SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis…
… integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10822 from zsxwing/kinesis-doc. (cherry picked from commit 721845c) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for d43704d - Browse repository at this point
Copy the full SHA d43704dView commit details -
[SPARK-12841][SQL][BRANCH-1.6] fix cast in filter
In SPARK-10743 we wrap cast with `UnresolvedAlias` to give `Cast` a better alias if possible. However, for cases like filter, the `UnresolvedAlias` can't be resolved and actually we don't need a better alias for this case. This PR move the cast wrapping logic to `Column.named` so that we will only do it when we need a alias name. backport #10781 to 1.6 Author: Wenchen Fan <wenchen@databricks.com> Closes #10819 from cloud-fan/bug.
Configuration menu - View commit details
-
Copy full SHA for 68265ac - Browse repository at this point
Copy the full SHA 68265acView commit details -
[SQL][MINOR] Fix one little mismatched comment according to the codes…
Configuration menu - View commit details
-
Copy full SHA for 30f55e5 - Browse repository at this point
Copy the full SHA 30f55e5View commit details -
[MLLIB] Fix CholeskyDecomposition assertion's message
Change assertion's message so it's consistent with the code. The old message says that the invoked method was lapack.dports, where in fact it was lapack.dppsv method. Author: Wojciech Jurczyk <wojtek.jurczyk@gmail.com> Closes #10818 from wjur/wjur/rename_error_message. (cherry picked from commit ebd9ce0) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 962e618 - Browse repository at this point
Copy the full SHA 962e618View commit details
Commits on Jan 21, 2016
-
[SPARK-12921] Use SparkHadoopUtil reflection in SpecificParquetRecord…
…ReaderBase It looks like there's one place left in the codebase, SpecificParquetRecordReaderBase, where we didn't use SparkHadoopUtil's reflective accesses of TaskAttemptContext methods, which could create problems when using a single Spark artifact with both Hadoop 1.x and 2.x. Author: Josh Rosen <joshrosen@databricks.com> Closes #10843 from JoshRosen/SPARK-12921.
Configuration menu - View commit details
-
Copy full SHA for 40fa218 - Browse repository at this point
Copy the full SHA 40fa218View commit details
Commits on Jan 22, 2016
-
[SPARK-12747][SQL] Use correct type name for Postgres JDBC's real array
https://issues.apache.org/jira/browse/SPARK-12747 Postgres JDBC driver uses "FLOAT4" or "FLOAT8" not "real". Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10695 from viirya/fix-postgres-jdbc. (cherry picked from commit 55c7dd0) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for b5d7dbe - Browse repository at this point
Copy the full SHA b5d7dbeView commit details
Commits on Jan 23, 2016
-
[SPARK-12859][STREAMING][WEB UI] Names of input streams with receiver…
Configuration menu - View commit details
-
Copy full SHA for dca238a - Browse repository at this point
Copy the full SHA dca238aView commit details -
[SPARK-12760][DOCS] invalid lambda expression in python example for …
…local vs cluster srowen thanks for the PR at #10866! sorry it took me a while. This is related to #10866, basically the assignment in the lambda expression in the python example is actually invalid ``` In [1]: data = [1, 2, 3, 4, 5] In [2]: counter = 0 In [3]: rdd = sc.parallelize(data) In [4]: rdd.foreach(lambda x: counter += x) File "<ipython-input-4-fcb86c182bad>", line 1 rdd.foreach(lambda x: counter += x) ^ SyntaxError: invalid syntax ``` Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #10867 from mortada/doc_python_fix. (cherry picked from commit 56f57f8) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for e8ae242 - Browse repository at this point
Copy the full SHA e8ae242View commit details -
[SPARK-12760][DOCS] inaccurate description for difference between loc…
…al vs cluster mode in closure handling Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode Author: Sean Owen <sowen@cloudera.com> Closes #10866 from srowen/SPARK-12760. (cherry picked from commit aca2a01) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for f13a3d1 - Browse repository at this point
Copy the full SHA f13a3d1View commit details
Commits on Jan 24, 2016
-
[SPARK-12120][PYSPARK] Improve exception message when failing to init…
…ialize HiveContext in PySpark davies Mind to review ? This is the error message after this PR ``` 15/12/03 16:59:53 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException /Users/jzhang/github/spark/python/pyspark/sql/context.py:689: UserWarning: You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly warnings.warn("You must build Spark with Hive. " Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 663, in read return DataFrameReader(self) File "/Users/jzhang/github/spark/python/pyspark/sql/readwriter.py", line 56, in __init__ self._jreader = sqlContext._ssql_ctx.read() File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 692, in _ssql_ctx raise e py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: java.net.ConnectException: Call From jzhangMBPr.local/127.0.0.1 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238) at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218) at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208) at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462) at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461) at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40) at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330) at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90) at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) ``` Author: Jeff Zhang <zjffdu@apache.org> Closes #10126 from zjffdu/SPARK-12120. (cherry picked from commit e789b1d) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f913f7e - Browse repository at this point
Copy the full SHA f913f7eView commit details
Commits on Jan 25, 2016
-
[SPARK-12624][PYSPARK] Checks row length when converting Java arrays …
…to Python rows When actual row length doesn't conform to specified schema field length, we should give a better error message instead of throwing an unintuitive `ArrayOutOfBoundsException`. Author: Cheng Lian <lian@databricks.com> Closes #10886 from liancheng/spark-12624. (cherry picked from commit 3327fd2) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 88614dd - Browse repository at this point
Copy the full SHA 88614ddView commit details -
[SPARK-12932][JAVA API] improved error message for java type inferenc…
…e failure Author: Andy Grove <andygrove73@gmail.com> Closes #10865 from andygrove/SPARK-12932. (cherry picked from commit d8e4805) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 88114d3 - Browse repository at this point
Copy the full SHA 88114d3View commit details -
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped. This contribution is my original work, and I license this work to the Spark project under the project's open source license. Author: Michael Allman <michael@videoamp.com> Closes #10700 from mallman/stop_event_logger_first. (cherry picked from commit 4ee8191) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for b40e58c - Browse repository at this point
Copy the full SHA b40e58cView commit details
Commits on Jan 26, 2016
-
[SPARK-12961][CORE] Prevent snappy-java memory leak
JIRA: https://issues.apache.org/jira/browse/SPARK-12961 To prevent memory leak in snappy-java, just call the method once and cache the result. After the library releases new version, we can remove this object. JoshRosen Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10875 from viirya/prevent-snappy-memory-leak. (cherry picked from commit 5936bf9) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 572bc39 - Browse repository at this point
Copy the full SHA 572bc39View commit details -
[SPARK-12682][SQL] Add support for (optionally) not storing tables in…
… hive metadata format This PR adds a new table option (`skip_hive_metadata`) that'd allow the user to skip storing the table metadata in hive metadata format. While this could be useful in general, the specific use-case for this change is that Hive doesn't handle wide schemas well (see https://issues.apache.org/jira/browse/SPARK-12682 and https://issues.apache.org/jira/browse/SPARK-6024) which in turn prevents such tables from being queried in SparkSQL. Author: Sameer Agarwal <sameer@databricks.com> Closes #10826 from sameeragarwal/skip-hive-metadata. (cherry picked from commit 08c781c) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f0c98a6 - Browse repository at this point
Copy the full SHA f0c98a6View commit details -
[SPARK-12682][SQL][HOT-FIX] Fix test compilation
Author: Yin Huai <yhuai@databricks.com> Closes #10925 from yhuai/branch-1.6-hot-fix.
Configuration menu - View commit details
-
Copy full SHA for 6ce3dd9 - Browse repository at this point
Copy the full SHA 6ce3dd9View commit details -
[SPARK-12611][SQL][PYSPARK][TESTS] Fix test_infer_schema_to_local
Previously (when the PR was first created) not specifying b= explicitly was fine (and treated as default null) - instead be explicit about b being None in the test. Author: Holden Karau <holden@us.ibm.com> Closes #10564 from holdenk/SPARK-12611-fix-test-infer-schema-local. (cherry picked from commit 13dab9c) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 85518ed - Browse repository at this point
Copy the full SHA 85518edView commit details
Commits on Jan 27, 2016
-
[SPARK-12834][ML][PYTHON][BACKPORT] Change ser/de of JavaArray and Ja…
…vaList Backport of SPARK-12834 for branch-1.6 Original PR: #10772 Original commit message: We use `SerDe.dumps()` to serialize `JavaArray` and `JavaList` in `PythonMLLibAPI`, then deserialize them with `PickleSerializer` in Python side. However, there is no need to transform them in such an inefficient way. Instead of it, we can use type conversion to convert them, e.g. `list(JavaArray)` or `list(JavaList)`. What's more, there is an issue to Ser/De Scala Array as I said in https://issues.apache.org/jira/browse/SPARK-12780 Author: Xusen Yin <yinxusen@gmail.com> Closes #10941 from jkbradley/yinxusen-SPARK-12834-1.6.
Configuration menu - View commit details
-
Copy full SHA for 17d1071 - Browse repository at this point
Copy the full SHA 17d1071View commit details -
[SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata w…
…ith `None` triggers cryptic failure The error message is now changed from "Do not support type class scala.Tuple2." to "Do not support type class org.json4s.JsonAST$JNull$" to be more informative about what is not supported. Also, StructType metadata now handles JNull correctly, i.e., {'a': None}. test_metadata_null is added to tests.py to show the fix works. Author: Jason Lee <cjlee@us.ibm.com> Closes #8969 from jasoncl/SPARK-10847. (cherry picked from commit edd4737) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 96e32db - Browse repository at this point
Copy the full SHA 96e32dbView commit details
Commits on Jan 29, 2016
-
[SPARK-13082][PYSPARK] Backport the fix of 'read.json(rdd)' in #10559 …
…to branch-1.6 SPARK-13082 actually fixed by #10559. However, it's a big PR and not backported to 1.6. This PR just backported the fix of 'read.json(rdd)' to branch-1.6. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10988 from zsxwing/json-rdd.
Configuration menu - View commit details
-
Copy full SHA for 84dab72 - Browse repository at this point
Copy the full SHA 84dab72View commit details
Commits on Jan 30, 2016
-
[SPARK-13088] Fix DAG viz in latest version of chrome
Apparently chrome removed `SVGElement.prototype.getTransformToElement`, which is used by our JS library dagre-d3 when creating edges. The real diff can be found here: andrewor14/dagre-d3@7d6c000, which is taken from the fix in the main repo: dagrejs/dagre-d3@1ef067f Upstream issue: dagrejs/dagre-d3#202 Author: Andrew Or <andrew@databricks.com> Closes #10986 from andrewor14/fix-dag-viz. (cherry picked from commit 70e69fc) Signed-off-by: Andrew Or <andrew@databricks.com>
Andrew Or committedJan 30, 2016 Configuration menu - View commit details
-
Copy full SHA for bb01cbe - Browse repository at this point
Copy the full SHA bb01cbeView commit details
Commits on Feb 1, 2016
-
[SPARK-12231][SQL] create a combineFilters' projection when we call b…
…uildPartitionedTableScan Hello Michael & All: We have some issues to submit the new codes in the other PR(#10299), so we closed that PR and open this one with the fix. The reason for the previous failure is that the projection for the scan when there is a filter that is not pushed down (the "left-over" filter) could be different, in elements or ordering, from the original projection. With this new codes, the approach to solve this problem is: Insert a new Project if the "left-over" filter is nonempty and (the original projection is not empty and the projection for the scan has more than one elements which could otherwise cause different ordering in projection). We create 3 test cases to cover the otherwise failure cases. Author: Kevin Yu <qyu@us.ibm.com> Closes #10388 from kevinyu98/spark-12231. (cherry picked from commit fd50df4) Signed-off-by: Cheng Lian <lian@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for ddb9633 - Browse repository at this point
Copy the full SHA ddb9633View commit details -
[SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions
JIRA: https://issues.apache.org/jira/browse/SPARK-12989 In the rule `ExtractWindowExpressions`, we simply replace alias by the corresponding attribute. However, this will cause an issue exposed by the following case: ```scala val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", "num") .withColumn("Data", struct("A", "B", "C")) .drop("A") .drop("B") .drop("C") val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc) data.select($"*", max("num").over(winSpec) as "max").explain(true) ``` In this case, both `Data.A` and `Data.B` are `alias` in `WindowSpecDefinition`. If we replace these alias expression by their alias names, we are unable to know what they are since they will not be put in `missingExpr` too. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10963 from gatorsmile/seletStarAfterColDrop. (cherry picked from commit 33c8a49) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9a5b25d - Browse repository at this point
Copy the full SHA 9a5b25dView commit details -
[DOCS] Fix the jar location of datanucleus in sql-programming-guid.md
Configuration menu - View commit details
-
Copy full SHA for 215d5d8 - Browse repository at this point
Copy the full SHA 215d5d8View commit details -
[SPARK-11780][SQL] Add catalyst type aliases backwards compatibility
Configuration menu - View commit details
-
Copy full SHA for 70fcbf6 - Browse repository at this point
Copy the full SHA 70fcbf6View commit details
Commits on Feb 2, 2016
-
[SPARK-13087][SQL] Fix group by function for sort based aggregation
It is not valid to call `toAttribute` on a `NamedExpression` unless we know for sure that the child produced that `NamedExpression`. The current code worked fine when the grouping expressions were simple, but when they were a derived value this blew up at execution time. Author: Michael Armbrust <michael@databricks.com> Closes #11011 from marmbrus/groupByFunction.
Configuration menu - View commit details
-
Copy full SHA for bd8efba - Browse repository at this point
Copy the full SHA bd8efbaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 99594b2 - Browse repository at this point
Copy the full SHA 99594b2View commit details -
[SPARK-12780][ML][PYTHON][BACKPORT] Inconsistency returning value of …
…ML python models' properties Backport of [SPARK-12780] for branch-1.6 Original PR for master: #10724 This fixes StringIndexerModel.labels in pyspark. Author: Xusen Yin <yinxusen@gmail.com> Closes #10950 from jkbradley/yinxusen-spark-12780-backport.
Configuration menu - View commit details
-
Copy full SHA for 9a3d1bd - Browse repository at this point
Copy the full SHA 9a3d1bdView commit details -
[SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method
I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629 Please, let me know what do you think. Thanks! Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #10580 from NarineK/sparkrSavaAsRable. (cherry picked from commit 8a88e12) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for 53f518a - Browse repository at this point
Copy the full SHA 53f518aView commit details -
[SPARK-13121][STREAMING] java mapWithState mishandles scala Option
java mapwithstate with Function3 has wrong conversion of java `Optional` to scala `Option`, fixed code uses same conversion used in the mapwithstate call that uses Function4 as an input. `Optional.fromNullable(v.get)` fails if v is `None`, better to use `JavaUtils.optionToOptional(v)` instead. Author: Gabriele Nizzoli <mail@nizzoli.net> Closes #11007 from gabrielenizzoli/branch-1.6.
Configuration menu - View commit details
-
Copy full SHA for 4c28b4c - Browse repository at this point
Copy the full SHA 4c28b4cView commit details -
[SPARK-12711][ML] ML StopWordsRemover does not protect itself from co…
…lumn name duplication Fixes problem and verifies fix by test suite. Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn and deduplicates SchemaUtils.appendColumn functions. Author: Grzegorz Chilkiewicz <grzegorz.chilkiewicz@codilime.com> Closes #10741 from grzegorz-chilkiewicz/master. (cherry picked from commit b1835d7) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9c0cf22 - Browse repository at this point
Copy the full SHA 9c0cf22View commit details -
[SPARK-13056][SQL] map column would throw NPE if value is null
Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE would be thrown. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #10964 from adrian-wang/npewriter. (cherry picked from commit 358300c) Signed-off-by: Michael Armbrust <michael@databricks.com> Conflicts: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Configuration menu - View commit details
-
Copy full SHA for 3c92333 - Browse repository at this point
Copy the full SHA 3c92333View commit details -
[DOCS] Update StructType.scala
The example will throw error like <console>:20: error: not found: value StructType Need to add this line: import org.apache.spark.sql.types._ Author: Kevin (Sangwoo) Kim <sangwookim.me@gmail.com> Closes #10141 from swkimme/patch-1. (cherry picked from commit b377b03) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e81333b - Browse repository at this point
Copy the full SHA e81333bView commit details
Commits on Feb 3, 2016
-
[SPARK-13122] Fix race condition in MemoryStore.unrollSafely()
https://issues.apache.org/jira/browse/SPARK-13122 A race condition can occur in MemoryStore's unrollSafely() method if two threads that return the same value for currentTaskAttemptId() execute this method concurrently. This change makes the operation of reading the initial amount of unroll memory used, performing the unroll, and updating the associated memory maps atomic in order to avoid this race condition. Initial proposed fix wraps all of unrollSafely() in a memoryManager.synchronized { } block. A cleaner approach might be introduce a mechanism that synchronizes based on task attempt ID. An alternative option might be to track unroll/pending unroll memory based on block ID rather than task attempt ID. Author: Adam Budde <budde@amazon.com> Closes #11012 from budde/master. (cherry picked from commit ff71261) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
Adam Budde authored and Andrew Or committedFeb 3, 2016 Configuration menu - View commit details
-
Copy full SHA for 2f8abb4 - Browse repository at this point
Copy the full SHA 2f8abb4View commit details -
[SPARK-12739][STREAMING] Details of batch in Streaming tab uses two D…
…uration columns I have clearly prefix the two 'Duration' columns in 'Details of Batch' Streaming tab as 'Output Op Duration' and 'Job Duration' Author: Mario Briggs <mario.briggs@in.ibm.com> Author: mariobriggs <mariobriggs@in.ibm.com> Closes #11022 from mariobriggs/spark-12739. (cherry picked from commit e9eb248) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 5fe8796 - Browse repository at this point
Copy the full SHA 5fe8796View commit details
Commits on Feb 4, 2016
-
[SPARK-13101][SQL][BRANCH-1.6] nullability of array type element shou…
…ld not fail analysis of encoder nullability should only be considered as an optimization rather than part of the type system, so instead of failing analysis for mismatch nullability, we should pass analysis and add runtime null check. backport #11035 to 1.6 Author: Wenchen Fan <wenchen@databricks.com> Closes #11042 from cloud-fan/branch-1.6.
Configuration menu - View commit details
-
Copy full SHA for cdfb2a1 - Browse repository at this point
Copy the full SHA cdfb2a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2f390d3 - Browse repository at this point
Copy the full SHA 2f390d3View commit details -
[SPARK-13195][STREAMING] Fix NoSuchElementException when a state is n…
…ot set but timeoutThreshold is defined Check the state Existence before calling get. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11081 from zsxwing/SPARK-13195. (cherry picked from commit 8e2f296) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for a907c7c - Browse repository at this point
Copy the full SHA a907c7cView commit details
Commits on Feb 5, 2016
-
[SPARK-13214][DOCS] update dynamicAllocation documentation
Author: Bill Chambers <bill@databricks.com> Closes #11094 from anabranch/dynamic-docs. (cherry picked from commit 66e1383) Signed-off-by: Andrew Or <andrew@databricks.com>
Bill Chambers authored and Andrew Or committedFeb 5, 2016 Configuration menu - View commit details
-
Copy full SHA for 3ca5dc3 - Browse repository at this point
Copy the full SHA 3ca5dc3View commit details
Commits on Feb 8, 2016
-
[SPARK-13210][SQL] catch OOM when allocate memory and expand array
There is a bug when we try to grow the buffer, OOM is ignore wrongly (the assert also skipped by JVM), then we try grow the array again, this one will trigger spilling free the current page, the current record we inserted will be invalid. The root cause is that JVM has less free memory than MemoryManager thought, it will OOM when allocate a page without trigger spilling. We should catch the OOM, and acquire memory again to trigger spilling. And also, we could not grow the array in `insertRecord` of `InMemorySorter` (it was there just for easy testing). Author: Davies Liu <davies@databricks.com> Closes #11095 from davies/fix_expand.
Configuration menu - View commit details
-
Copy full SHA for 9b30096 - Browse repository at this point
Copy the full SHA 9b30096View commit details
Commits on Feb 9, 2016
-
[SPARK-12807][YARN] Spark External Shuffle not working in Hadoop clus…
…ters with Jackson 2.2.3 Patch to 1. Shade jackson 2.x in spark-yarn-shuffle JAR: core, databind, annotation 2. Use maven antrun to verify the JAR has the renamed classes Being Maven-based, I don't know if the verification phase kicks in on an SBT/jenkins build. It will on a `mvn install` Author: Steve Loughran <stevel@hortonworks.com> Closes #10780 from steveloughran/stevel/patches/SPARK-12807-master-shuffle. (cherry picked from commit 34d0b70) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 82fa864 - Browse repository at this point
Copy the full SHA 82fa864View commit details
Commits on Feb 10, 2016
-
[SPARK-10524][ML] Use the soft prediction to order categories' bins
JIRA: https://issues.apache.org/jira/browse/SPARK-10524 Currently we use the hard prediction (`ImpurityCalculator.predict`) to order categories' bins. But we should use the soft prediction. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Liang-Chi Hsieh <viirya@appier.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #8734 from viirya/dt-soft-centroids. (cherry picked from commit 9267bc6) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 89818cb - Browse repository at this point
Copy the full SHA 89818cbView commit details -
[SPARK-12921] Fix another non-reflective TaskAttemptContext access in…
… SpecificParquetRecordReaderBase This is a minor followup to #10843 to fix one remaining place where we forgot to use reflective access of TaskAttemptContext methods. Author: Josh Rosen <joshrosen@databricks.com> Closes #11131 from JoshRosen/SPARK-12921-take-2.
Configuration menu - View commit details
-
Copy full SHA for 93f1d91 - Browse repository at this point
Copy the full SHA 93f1d91View commit details
Commits on Feb 11, 2016
-
[SPARK-13274] Fix Aggregator Links on GroupedDataset Scala API
Configuration menu - View commit details
-
Copy full SHA for b57fac5 - Browse repository at this point
Copy the full SHA b57fac5View commit details -
[SPARK-13265][ML] Refactoring of basic ML import/export for other fil…
…e system besides HDFS jkbradley I tried to improve the function to export a model. When I tried to export a model to S3 under Spark 1.6, we couldn't do that. So, it should offer S3 besides HDFS. Can you review it when you have time? Thanks! Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #11151 from yu-iskw/SPARK-13265. (cherry picked from commit efb65e0) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 91a5ca5 - Browse repository at this point
Copy the full SHA 91a5ca5View commit details
Commits on Feb 12, 2016
-
[SPARK-13047][PYSPARK][ML] Pyspark Params.hasParam should not throw a…
…n error Pyspark Params class has a method `hasParam(paramName)` which returns `True` if the class has a parameter by that name, but throws an `AttributeError` otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of `hasParam` or add an additional method with this functionality. In Python: ```python from pyspark.ml.classification import NaiveBayes nb = NaiveBayes() print nb.hasParam("smoothing") print nb.hasParam("notAParam") ``` produces: > True > AttributeError: 'NaiveBayes' object has no attribute 'notAParam' However, in Scala: ```scala import org.apache.spark.ml.classification.NaiveBayes val nb = new NaiveBayes() nb.hasParam("smoothing") nb.hasParam("notAParam") ``` produces: > true > false cc holdenk Author: sethah <seth.hendrickson16@gmail.com> Closes #10962 from sethah/SPARK-13047. (cherry picked from commit b354673) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9d45ec4 - Browse repository at this point
Copy the full SHA 9d45ec4View commit details -
[SPARK-13153][PYSPARK] ML persistence failed when handle no default v…
…alue parameter Fix this defect by check default value exist or not. yanboliang Please help to review. Author: Tommy YU <tummyyu@163.com> Closes #11043 from Wenpei/spark-13153-handle-param-withnodefaultvalue. (cherry picked from commit d3e2e20) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 18661a2 - Browse repository at this point
Copy the full SHA 18661a2View commit details
Commits on Feb 13, 2016
-
[SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft…
… Windows Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK. Is it worth considering also including this fix in any future 1.5.x releases (if any)? I confirm this is my own original work and license it to the Spark project under its open source license. Author: markpavey <mark.pavey@thefilter.com> Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix. (cherry picked from commit 374c4b2) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 93a55f3 - Browse repository at this point
Copy the full SHA 93a55f3View commit details -
[SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering f…
…ailed test JIRA: https://issues.apache.org/jira/browse/SPARK-12363 This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #10539 from viirya/fix-poweriter. (cherry picked from commit e3441e3) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 107290c - Browse repository at this point
Copy the full SHA 107290cView commit details
Commits on Feb 14, 2016
-
[SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy
Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps. Author: Amit Dev <amitdev@gmail.com> Closes #11180 from amitdev/master. (cherry picked from commit 331293c) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for ec40c5a - Browse repository at this point
Copy the full SHA ec40c5aView commit details
Commits on Feb 15, 2016
-
[SPARK-13312][MLLIB] Update java train-validation-split example in ml…
…-guide Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312. This contribution is my original work and I license the work to this project. Author: JeremyNixon <jnixon2@gmail.com> Closes #11199 from JeremyNixon/update_train_val_split_example. (cherry picked from commit adb5483) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 71f53ed - Browse repository at this point
Copy the full SHA 71f53edView commit details
Commits on Feb 16, 2016
-
Correct SparseVector.parse documentation
There's a small typo in the SparseVector.parse docstring (which says that it returns a DenseVector rather than a SparseVector), which seems to be incorrect. Author: Miles Yucht <miles@databricks.com> Closes #11213 from mgyucht/fix-sparsevector-docs. (cherry picked from commit 827ed1c) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for d950891 - Browse repository at this point
Copy the full SHA d950891View commit details
Commits on Feb 17, 2016
-
[SPARK-13279] Remove O(n^2) operation from scheduler.
This commit removes an unnecessary duplicate check in addPendingTask that meant that scheduling a task set took time proportional to (# tasks)^2. Author: Sital Kedia <skedia@fb.com> Closes #11175 from sitalkedia/fix_stuck_driver. (cherry picked from commit 1e1e31e) Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 98354ca - Browse repository at this point
Copy the full SHA 98354caView commit details -
[SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's…
Configuration menu - View commit details
-
Copy full SHA for 66106a6 - Browse repository at this point
Copy the full SHA 66106a6View commit details
Commits on Feb 18, 2016
-
[SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask com…
…pares Option and String directly. ## What changes were proposed in this pull request? Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager) ## How was the this patch tested? Running Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11253 from srowen/SPARK-13371. (cherry picked from commit 7856253) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 16f35c4 - Browse repository at this point
Copy the full SHA 16f35c4View commit details
Commits on Feb 22, 2016
-
[SPARK-12546][SQL] Change default number of open parquet files
A common problem that users encounter with Spark 1.6.0 is that writing to a partitioned parquet table OOMs. The root cause is that parquet allocates a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more. Author: Michael Armbrust <michael@databricks.com> Closes #11308 from marmbrus/parquetWriteOOM. (cherry picked from commit 173aa94) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 699644c - Browse repository at this point
Copy the full SHA 699644cView commit details
Commits on Feb 23, 2016
-
[SPARK-13298][CORE][UI] Escape "label" to avoid DAG being broken by s…
…ome special character ## What changes were proposed in this pull request? When there are some special characters (e.g., `"`, `\`) in `label`, DAG will be broken. This patch just escapes `label` to avoid DAG being broken by some special characters ## How was the this patch tested? Jenkins tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11309 from zsxwing/SPARK-13298. (cherry picked from commit a11b399) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 85e6a22 - Browse repository at this point
Copy the full SHA 85e6a22View commit details -
[SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec
In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would lead to exception because `processCmd` need to get the `CliSessionState` instance by calling `SessionState.get()`, but the return value would be a instance of `SessionState`. See the exception below. spark-sql> !echo "test"; Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hive.ql.session.SessionState cannot be cast to org.apache.hadoop.hive.cli.CliSessionState at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #9589 from adrian-wang/clicommand. (cherry picked from commit 5d80fac) Signed-off-by: Michael Armbrust <michael@databricks.com> Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala
4Configuration menu - View commit details
-
Copy full SHA for f7898f9 - Browse repository at this point
Copy the full SHA f7898f9View commit details -
1
Configuration menu - View commit details
-
Copy full SHA for 40d11d0 - Browse repository at this point
Copy the full SHA 40d11d0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 152252f - Browse repository at this point
Copy the full SHA 152252fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2902798 - Browse repository at this point
Copy the full SHA 2902798View commit details -
[SPARK-12746][ML] ArrayType(_, true) should also accept ArrayType(_, …
…false) fix for branch-1.6 https://issues.apache.org/jira/browse/SPARK-13359 Author: Earthson Lu <Earthson.Lu@gmail.com> Closes #11237 from Earthson/SPARK-13359.
Configuration menu - View commit details
-
Copy full SHA for d31854d - Browse repository at this point
Copy the full SHA d31854dView commit details -
[SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply
`GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call it in LDA without validating this requirement. So it might introduce errors. Replacing it by `Graph.apply` would be safer and more proper because it is a public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` here (though it doesn't seem so based on the implementation) or the test cases are special. jkbradley ankurdave Author: Xiangrui Meng <meng@databricks.com> Closes #11226 from mengxr/SPARK-13355. (cherry picked from commit 764ca18) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0784e02 - Browse repository at this point
Copy the full SHA 0784e02View commit details -
[SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns.
## What changes were proposed in this pull request? This PR adds equality operators to UDT classes so that they can be correctly tested for dataType equality during union operations. This was previously causing `"AnalysisException: u"unresolved operator 'Union;""` when trying to unionAll two dataframes with UDT columns as below. ``` from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql import types schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)]) a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema) b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema) c = a.unionAll(b) ``` ## How was the this patch tested? Tested using two unit tests in sql/test.py and the DataFrameSuite. Additional information here : https://issues.apache.org/jira/browse/SPARK-13410 rxin Author: Franklyn D'souza <franklynd@gmail.com> Closes #11333 from damnMeddlingKid/udt-union-patch.
Configuration menu - View commit details
-
Copy full SHA for 573a2c9 - Browse repository at this point
Copy the full SHA 573a2c9View commit details
Commits on Feb 24, 2016
-
[SPARK-13390][SQL][BRANCH-1.6] Fix the issue that Iterator.map().toSe…
…q is not Serializable ## What changes were proposed in this pull request? `scala.collection.Iterator`'s methods (e.g., map, filter) will return an `AbstractIterator` which is not Serializable. E.g., ```Scala scala> val iter = Array(1, 2, 3).iterator.map(_ + 1) iter: Iterator[Int] = non-empty iterator scala> println(iter.isInstanceOf[Serializable]) false ``` If we call something like `Iterator.map(...).toSeq`, it will create a `Stream` that contains a non-serializable `AbstractIterator` field and make the `Stream` be non-serializable. This PR uses `toArray` instead of `toSeq` to fix such issue in `def createDataFrame(data: java.util.List[_], beanClass: Class[_]): DataFrame`. ## How was the this patch tested? Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11334 from zsxwing/SPARK-13390.
Configuration menu - View commit details
-
Copy full SHA for 06f4fce - Browse repository at this point
Copy the full SHA 06f4fceView commit details -
[SPARK-13475][TESTS][SQL] HiveCompatibilitySuite should still run in …
…PR builder even if a PR only changes sql/core ## What changes were proposed in this pull request? `HiveCompatibilitySuite` should still run in PR build even if a PR only changes sql/core. So, I am going to remove `ExtendedHiveTest` annotation from `HiveCompatibilitySuite`. https://issues.apache.org/jira/browse/SPARK-13475 Author: Yin Huai <yhuai@databricks.com> Closes #11351 from yhuai/SPARK-13475. (cherry picked from commit bc35380) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for fe71cab - Browse repository at this point
Copy the full SHA fe71cabView commit details
Commits on Feb 25, 2016
-
[SPARK-13482][MINOR][CONFIGURATION] Make consistency of the configura…
…iton named in TransportConf. `spark.storage.memoryMapThreshold` has two kind of the value, one is 2*1024*1024 as integer and the other one is '2m' as string. "2m" is recommanded in document but it will go wrong if the code goes into `TransportConf#memoryMapBytes`. [Jira](https://issues.apache.org/jira/browse/SPARK-13482) Author: huangzhaowei <carlmartinmax@gmail.com> Closes #11360 from SaintBacchus/SPARK-13482. (cherry picked from commit 264533b) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 8975996 - Browse repository at this point
Copy the full SHA 8975996View commit details -
[SPARK-13473][SQL] Don't push predicate through project with nondeter…
…ministic field(s) ## What changes were proposed in this pull request? Predicates shouldn't be pushed through project with nondeterministic field(s). See graphframes/graphframes#23 and SPARK-13473 for more details. This PR targets master, branch-1.6, and branch-1.5. ## How was this patch tested? A test case is added in `FilterPushdownSuite`. It constructs a query plan where a filter is over a project with a nondeterministic field. Optimized query plan shouldn't change in this case. Author: Cheng Lian <lian@databricks.com> Closes #11348 from liancheng/spark-13473-no-ppd-through-nondeterministic-project-field. (cherry picked from commit 3fa6491) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 3cc938a - Browse repository at this point
Copy the full SHA 3cc938aView commit details -
[SPARK-13444][MLLIB] QuantileDiscretizer chooses bad splits on large …
…DataFrames Change line 113 of QuantileDiscretizer.scala to `val requiredSamples = math.max(numBins * numBins, 10000.0)` so that `requiredSamples` is a `Double`. This will fix the division in line 114 which currently results in zero if `requiredSamples < dataset.count` Manual tests. I was having a problems using QuantileDiscretizer with my a dataset and after making this change QuantileDiscretizer behaves as expected. Author: Oliver Pierson <ocp@gatech.edu> Author: Oliver Pierson <opierson@umd.edu> Closes #11319 from oliverpierson/SPARK-13444. (cherry picked from commit 6f8e835) Signed-off-by: Sean Owen <sowen@cloudera.com>
1Configuration menu - View commit details
-
Copy full SHA for cb869a1 - Browse repository at this point
Copy the full SHA cb869a1View commit details -
[SPARK-13441][YARN] Fix NPE in yarn Client.createConfArchive method
## What changes were proposed in this pull request? Instead of using result of File.listFiles() directly, which may throw NPE, check for null first. If it is null, log a warning instead ## How was the this patch tested? Ran the ./dev/run-tests locally Tested manually on a cluster Author: Terence Yim <terence@cask.co> Closes #11337 from chtyim/fixes/SPARK-13441-null-check. (cherry picked from commit fae88af) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 1f03163 - Browse repository at this point
Copy the full SHA 1f03163View commit details -
[SPARK-13439][MESOS] Document that spark.mesos.uris is comma-separated
Configuration menu - View commit details
-
Copy full SHA for e3802a7 - Browse repository at this point
Copy the full SHA e3802a7View commit details -
[SPARK-12316] Wait a minutes to avoid cycle calling.
When application end, AM will clean the staging dir. But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'. Then it lead driver StackOverflowError. https://issues.apache.org/jira/browse/SPARK-12316 Author: huangzhaowei <carlmartinmax@gmail.com> Closes #10475 from SaintBacchus/SPARK-12316. (cherry picked from commit 5fcf4c2) Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
Configuration menu - View commit details
-
Copy full SHA for 5f7440b - Browse repository at this point
Copy the full SHA 5f7440bView commit details -
Revert "[SPARK-13444][MLLIB] QuantileDiscretizer chooses bad splits o…
…n large DataFrames" This reverts commit cb869a1.
Configuration menu - View commit details
-
Copy full SHA for d59a08f - Browse repository at this point
Copy the full SHA d59a08fView commit details -
[SPARK-12874][ML] ML StringIndexer does not protect itself from colum…
…n name duplication ## What changes were proposed in this pull request? ML StringIndexer does not protect itself from column name duplication. We should still improve a way to validate a schema of `StringIndexer` and `StringIndexerModel`. However, it would be great to fix at another issue. ## How was this patch tested? unit test Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #11370 from yu-iskw/SPARK-12874. (cherry picked from commit 14e2700) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for abe8f99 - Browse repository at this point
Copy the full SHA abe8f99View commit details
Commits on Feb 26, 2016
-
[SPARK-13454][SQL] Allow users to drop a table with a name starting w…
…ith an underscore. ## What changes were proposed in this pull request? This change adds a workaround to allow users to drop a table with a name starting with an underscore. Without this patch, we can create such a table, but we cannot drop it. The reason is that Hive's parser unquote an quoted identifier (see https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L453). So, when we issue a drop table command to Hive, a table name starting with an underscore is actually not quoted. Then, Hive will complain about it because it does not support a table name starting with an underscore without using backticks (underscores are allowed as long as it is not the first char though). ## How was this patch tested? Add a test to make sure we can drop a table with a name starting with an underscore. https://issues.apache.org/jira/browse/SPARK-13454 Author: Yin Huai <yhuai@databricks.com> Closes #11349 from yhuai/fixDropTable.
Configuration menu - View commit details
-
Copy full SHA for a57f87e - Browse repository at this point
Copy the full SHA a57f87eView commit details
Commits on Feb 27, 2016
-
[SPARK-13474][PROJECT INFRA] Update packaging scripts to push artifac…
…ts to home.apache.org Due to the people.apache.org -> home.apache.org migration, we need to update our packaging scripts to publish artifacts to the new server. Because the new server only supports sftp instead of ssh, we need to update the scripts to use lftp instead of ssh + rsync. Author: Josh Rosen <joshrosen@databricks.com> Closes #11350 from JoshRosen/update-release-scripts-for-apache-home. (cherry picked from commit f77dc4e) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 8a43c3b - Browse repository at this point
Copy the full SHA 8a43c3bView commit details -
Update CHANGES.txt and spark-ec2 and R package versions for 1.6.1
This patch updates a few more 1.6.0 version numbers for the 1.6.1 release candidate. Verified this by running ``` git grep "1\.6\.0" | grep -v since | grep -v deprecated | grep -v Since | grep -v versionadded | grep 1.6.0 ``` and inspecting the output. Author: Josh Rosen <joshrosen@databricks.com> Closes #11407 from JoshRosen/version-number-updates.
Configuration menu - View commit details
-
Copy full SHA for eb6f6da - Browse repository at this point
Copy the full SHA eb6f6daView commit details -
Configuration menu - View commit details
-
Copy full SHA for 15de51c - Browse repository at this point
Copy the full SHA 15de51cView commit details -
Configuration menu - View commit details
-
Copy full SHA for dcf60d7 - Browse repository at this point
Copy the full SHA dcf60d7View commit details
Commits on Feb 29, 2016
-
[SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map…
… string datatypes to Oracle VARCHAR datatype ## What changes were proposed in this pull request? This Pull request is used for the fix SPARK-12941, creating a data type mapping to Oracle for the corresponding data type"Stringtype" from dataframe. This PR is for the master branch fix, where as another PR is already tested with the branch 1.4 ## How was the this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) This patch was tested using the Oracle docker .Created a new integration suite for the same.The oracle.jdbc jar was to be downloaded from the maven repository.Since there was no jdbc jar available in the maven repository, the jar was downloaded from oracle site manually and installed in the local; thus tested. So, for SparkQA test case run, the ojdbc jar might be manually placed in the local maven repository(com/oracle/ojdbc6/11.2.0.2.0) while Spark QA test run. Author: thomastechs <thomas.sebastian@tcs.com> Closes #11306 from thomastechs/master. (cherry picked from commit 8afe491) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for fedb813 - Browse repository at this point
Copy the full SHA fedb813View commit details
Commits on Mar 3, 2016
-
[SPARK-13465] Add a task failure listener to TaskContext
## What changes were proposed in this pull request? TaskContext supports task completion callback, which gets called regardless of task failures. However, there is no way for the listener to know if there is an error. This patch adds a new listener that gets called when a task fails. ## How was this patch tested? New unit test case and integration test case covering the code path Author: Davies Liu <davies@databricks.com> Closes #11478 from davies/add_failure_1.6.
Configuration menu - View commit details
-
Copy full SHA for 1ce2c12 - Browse repository at this point
Copy the full SHA 1ce2c12View commit details -
[SPARK-13601] call failure callbacks before writer.close()
In order to tell OutputStream that the task has failed or not, we should call the failure callbacks BEFORE calling writer.close(). Added new unit tests. Author: Davies Liu <davies@databricks.com> Closes #11450 from davies/callback.
Configuration menu - View commit details
-
Copy full SHA for fa86dc4 - Browse repository at this point
Copy the full SHA fa86dc4View commit details
Commits on Mar 4, 2016
-
[SPARK-13601] [TESTS] use 1 partition in tests to avoid race conditions
Fix race conditions when cleanup files. Existing tests. Author: Davies Liu <davies@databricks.com> Closes #11507 from davies/flaky. (cherry picked from commit d062587) Signed-off-by: Davies Liu <davies.liu@gmail.com> Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/sources/CommitFailureTestRelationSuite.scala
Configuration menu - View commit details
-
Copy full SHA for b3a5129 - Browse repository at this point
Copy the full SHA b3a5129View commit details -
[SPARK-13652][CORE] Copy ByteBuffer in sendRpcSync as it will be recy…
…cled ## What changes were proposed in this pull request? `sendRpcSync` should copy the response content because the underlying buffer will be recycled and reused. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11499 from zsxwing/SPARK-13652. (cherry picked from commit 465c665) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 51c676e - Browse repository at this point
Copy the full SHA 51c676eView commit details -
[SPARK-11515][ML] QuantileDiscretizer should take random seed
cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9535 from yu-iskw/SPARK-11515. (cherry picked from commit 574571c) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 5a27129 - Browse repository at this point
Copy the full SHA 5a27129View commit details -
[SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map…
… string datatypes to Oracle VARCHAR datatype mapping A test suite added for the bug fix -SPARK 12941; for the mapping of the StringType to corresponding in Oracle manual tests done (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: thomastechs <thomas.sebastian@tcs.com> Author: THOMAS SEBASTIAN <thomas.sebastian@tcs.com> Closes #11489 from thomastechs/thomastechs-12941-master-new. (cherry picked from commit f6ac7c3) Signed-off-by: Yin Huai <yhuai@databricks.com> Conflicts: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Configuration menu - View commit details
-
Copy full SHA for 528e373 - Browse repository at this point
Copy the full SHA 528e373View commit details -
[SPARK-13444][MLLIB] QuantileDiscretizer chooses bad splits on large …
…DataFrames ## What changes were proposed in this pull request? Change line 113 of QuantileDiscretizer.scala to `val requiredSamples = math.max(numBins * numBins, 10000.0)` so that `requiredSamples` is a `Double`. This will fix the division in line 114 which currently results in zero if `requiredSamples < dataset.count` ## How was the this patch tested? Manual tests. I was having a problems using QuantileDiscretizer with my a dataset and after making this change QuantileDiscretizer behaves as expected. Author: Oliver Pierson <ocp@gatech.edu> Author: Oliver Pierson <opierson@umd.edu> Closes #11319 from oliverpierson/SPARK-13444.
Configuration menu - View commit details
-
Copy full SHA for f0cc511 - Browse repository at this point
Copy the full SHA f0cc511View commit details -
Configuration menu - View commit details
-
Copy full SHA for ffaf7c0 - Browse repository at this point
Copy the full SHA ffaf7c0View commit details
Commits on Mar 6, 2016
-
[SPARK-13697] [PYSPARK] Fix the missing module name of TransformFunct…
…ionSerializer.loads ## What changes were proposed in this pull request? Set the function's module name to `__main__` if it's missing in `TransformFunctionSerializer.loads`. ## How was this patch tested? Manually test in the shell. Before this patch: ``` >>> from pyspark.streaming import StreamingContext >>> from pyspark.streaming.util import TransformFunction >>> ssc = StreamingContext(sc, 1) >>> func = TransformFunction(sc, lambda x: x, sc.serializer) >>> func.rdd_wrapper(lambda x: x) TransformFunction(<function <lambda> at 0x106ac8b18>) >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers))) >>> func2 = ssc._transformerSerializer.loads(bytes) >>> print(func2.func.__module__) None >>> print(func2.rdd_wrap_func.__module__) None >>> ``` After this patch: ``` >>> from pyspark.streaming import StreamingContext >>> from pyspark.streaming.util import TransformFunction >>> ssc = StreamingContext(sc, 1) >>> func = TransformFunction(sc, lambda x: x, sc.serializer) >>> func.rdd_wrapper(lambda x: x) TransformFunction(<function <lambda> at 0x108bf1b90>) >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers))) >>> func2 = ssc._transformerSerializer.loads(bytes) >>> print(func2.func.__module__) __main__ >>> print(func2.rdd_wrap_func.__module__) __main__ >>> ``` Author: Shixiong Zhu <shixiong@databricks.com> Closes #11535 from zsxwing/loads-module. (cherry picked from commit ee913e6) Signed-off-by: Davies Liu <davies.liu@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 704a54c - Browse repository at this point
Copy the full SHA 704a54cView commit details
Commits on Mar 7, 2016
-
[SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrec…
…tly refers to StatefulNetworkWordCount ## What changes were proposed in this pull request? The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey. ## How was this patch tested? Have tested the new documentation with jekyll build. Author: rmishra <rmishra@pivotal.io> Closes #11545 from rishitesh/SPARK-13705. (cherry picked from commit 4b13896) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 18ef2f2 - Browse repository at this point
Copy the full SHA 18ef2f2View commit details -
[SPARK-13599][BUILD] remove transitive groovy dependencies from spark…
…-hive and spark-hiveserver (branch 1.6) ## What changes were proposed in this pull request? This is just the patch of #11449 cherry picked to branch-1.6; the enforcer and dep/ diffs are cut Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR. This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR. ## How was this patch tested? 1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver` 1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs 1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver -Dverbose > target/dependencies.txt` 1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive` 1. Patch applied 1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set 1. Examined created spark-assembly, verified no org.codehaus packages 1. Verified that the maven dependency tree no longer references groovy The `master` version updates the dependency files and an enforcer rule to keep groovy out; this patch strips it out. Author: Steve Loughran <stevel@hortonworks.com> Closes #11473 from steveloughran/fixes/SPARK-13599-groovy+branch-1.6.
Configuration menu - View commit details
-
Copy full SHA for 2434f16 - Browse repository at this point
Copy the full SHA 2434f16View commit details -
[MINOR][DOC] improve the doc for "spark.memory.offHeap.size"
The description of "spark.memory.offHeap.size" in the current document does not clearly state that memory is counted with bytes.... This PR contains a small fix for this tiny issue document fix Author: CodingCat <zhunansjtu@gmail.com> Closes #11561 from CodingCat/master. (cherry picked from commit a3ec50a) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for cf4e62e - Browse repository at this point
Copy the full SHA cf4e62eView commit details -
[SPARK-13648] Add Hive Cli to classes for isolated classloader
## What changes were proposed in this pull request? Adding the hive-cli classes to the classloader ## How was this patch tested? The hive Versionssuite tests were run This is my original work and I license the work to the project under the project's open source license. Author: Tim Preece <tim.preece.in.oz@gmail.com> Closes #11495 from preecet/master. (cherry picked from commit 46f25c2) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 695c8a2 - Browse repository at this point
Copy the full SHA 695c8a2View commit details
Commits on Mar 8, 2016
-
[SPARK-13711][CORE] Don't call SparkUncaughtExceptionHandler in AppCl…
…ient as it's in driver ## What changes were proposed in this pull request? AppClient runs in the driver side. It should not call `Utils.tryOrExit` as it will send exception to SparkUncaughtExceptionHandler and call `System.exit`. This PR just removed `Utils.tryOrExit`. ## How was this patch tested? manual tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11566 from zsxwing/SPARK-13711.
Configuration menu - View commit details
-
Copy full SHA for bace137 - Browse repository at this point
Copy the full SHA bace137View commit details
Commits on Mar 9, 2016
-
[SPARK-13755] Escape quotes in SQL plan visualization node labels
When generating Graphviz DOT files in the SQL query visualization we need to escape double-quotes inside node labels. This is a followup to #11309, which fixed a similar graph in Spark Core's DAG visualization. Author: Josh Rosen <joshrosen@databricks.com> Closes #11587 from JoshRosen/graphviz-escaping. (cherry picked from commit 81f54ac) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 8ec4f15 - Browse repository at this point
Copy the full SHA 8ec4f15View commit details -
[SPARK-13631][CORE] Thread-safe getLocationsWithLargestOutputs
## What changes were proposed in this pull request? If a job is being scheduled in one thread which has a dependency on an RDD currently executing a shuffle in another thread, Spark would throw a NullPointerException. This patch synchronizes access to `mapStatuses` and skips null status entries (which are in-progress shuffle tasks). ## How was this patch tested? Our client code unit test suite, which was reliably reproducing the race condition with 10 threads, shows that this fixes it. I have not found a minimal test case to add to Spark, but I will attempt to do so if desired. The same test case was tripping up on SPARK-4454, which was fixed by making other DAGScheduler code thread-safe. shivaram srowen Author: Andy Sloane <asloane@tetrationanalytics.com> Closes #11505 from a1k0n/SPARK-13631. (cherry picked from commit cbff280) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 95105b0 - Browse repository at this point
Copy the full SHA 95105b0View commit details -
[SPARK-13242] [SQL] codegen fallback in case-when if there many branches
## What changes were proposed in this pull request? If there are many branches in a CaseWhen expression, the generated code could go above the 64K limit for single java method, will fail to compile. This PR change it to fallback to interpret mode if there are more than 20 branches. ## How was this patch tested? Add tests Author: Davies Liu <davies@databricks.com> Closes #11606 from davies/fix_when_16.
Configuration menu - View commit details
-
Copy full SHA for bea91a9 - Browse repository at this point
Copy the full SHA bea91a9View commit details
Commits on Mar 10, 2016
-
[SPARK-13760][SQL] Fix BigDecimal constructor for FloatType
## What changes were proposed in this pull request? A very minor change for using `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The latter is deprecated and can result in inconsistencies due to an implicit conversion to `Double`. ## How was this patch tested? N/A cc yhuai Author: Sameer Agarwal <sameer@databricks.com> Closes #11597 from sameeragarwal/bigdecimal. (cherry picked from commit 926e9c4) Signed-off-by: Yin Huai <yhuai@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 8a1bd58 - Browse repository at this point
Copy the full SHA 8a1bd58View commit details -
Revert "[SPARK-13760][SQL] Fix BigDecimal constructor for FloatType"
This reverts commit 926e9c4.
Configuration menu - View commit details
-
Copy full SHA for 60cb270 - Browse repository at this point
Copy the full SHA 60cb270View commit details -
[SPARK-13663][CORE] Upgrade Snappy Java to 1.1.2.1
Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around. Supersedes #11524 Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #11631 from srowen/SPARK-13663. (cherry picked from commit 927e22e) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 07ace27 - Browse repository at this point
Copy the full SHA 07ace27View commit details
Commits on Mar 11, 2016
-
[MINOR][DOC] Fix supported hive version in doc
## What changes were proposed in this pull request? Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines. ``` -By default Spark will build with Hive 0.13.1 bindings. +By default Spark will build with Hive 1.2.1 bindings. -# Apache Hadoop 2.4.X with Hive 13 support +# Apache Hadoop 2.4.X with Hive 1.2.1 support ``` `sql/README.md` file also describe ## How was this patch tested? Manual. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11639 from dongjoon-hyun/fix_doc_hive_version. (cherry picked from commit 88fa866) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 078c714 - Browse repository at this point
Copy the full SHA 078c714View commit details -
[SPARK-13327][SPARKR] Added parameter validations for colnames<-
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Closes #11220 from olarayej/SPARK-13312-3. (cherry picked from commit 416e71a) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for db4795a - Browse repository at this point
Copy the full SHA db4795aView commit details
Commits on Mar 13, 2016
-
[SPARK-13810][CORE] Add Port Configuration Suggestions on Bind Except…
…ions ## What changes were proposed in this pull request? Currently, when a java.net.BindException is thrown, it displays the following message: java.net.BindException: Address already in use: Service '$serviceName' failed after 16 retries! This change adds port configuration suggestions to the BindException, for example, for the UI, it now displays java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries! Consider explicitly setting the appropriate port for 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. ## How was this patch tested? Manual tests Author: Bjorn Jonsson <bjornjon@gmail.com> Closes #11644 from bjornjon/master. (cherry picked from commit 515e4af) Signed-off-by: Sean Owen <sowen@cloudera.com>
Configuration menu - View commit details
-
Copy full SHA for 5e08db3 - Browse repository at this point
Copy the full SHA 5e08db3View commit details
Commits on Mar 14, 2016
-
[SQL] fix typo in DataSourceRegister
## What changes were proposed in this pull request? fix typo in DataSourceRegister ## How was this patch tested? found when going through latest code Author: Jacky Li <jacky.likun@huawei.com> Closes #11686 from jackylk/patch-12. (cherry picked from commit f3daa09) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 3519ce9 - Browse repository at this point
Copy the full SHA 3519ce9View commit details