Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 1.6 #11668

Closed
wants to merge 794 commits into from
Closed

Branch 1.6 #11668

wants to merge 794 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Dec 11, 2015

  1. Configuration menu
    Copy the full SHA
    3e39925 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    250249e View commit details
    Browse the repository at this point in the history
  3. [SPARK-12258] [SQL] passing null into ScalaUDF (follow-up)

    This is a follow-up PR for #10259
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #10266 from davies/null_udf2.
    
    (cherry picked from commit c119a34)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    Davies Liu authored and davies committed Dec 11, 2015
    Configuration menu
    Copy the full SHA
    eec3660 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    23f8dfd View commit details
    Browse the repository at this point in the history
  5. 2 Configuration menu
    Copy the full SHA
    2e45231 View commit details
    Browse the repository at this point in the history
  6. [SPARK-12146][SPARKR] SparkR jsonFile should support multiple input f…

    …iles
    
    * ```jsonFile``` should support multiple input files, such as:
    ```R
    jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments
    jsonFile(sqlContext, “path1,path2”)
    ```
    * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side.
    * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case.
    * If this PR is accepted, we should also make almost the same change for ```parquetFile```.
    
    cc felixcheung sun-rui shivaram
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10145 from yanboliang/spark-12146.
    
    (cherry picked from commit 0fb9825)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    yanboliang authored and shivaram committed Dec 11, 2015
    Configuration menu
    Copy the full SHA
    f05bae4 View commit details
    Browse the repository at this point in the history
  7. [SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation

    Adding in Pipeline Import and Export Documentation.
    
    Author: anabranch <wac.chambers@gmail.com>
    Author: Bill Chambers <wchambers@ischool.berkeley.edu>
    
    Closes #10179 from anabranch/master.
    
    (cherry picked from commit aa305dc)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    bllchmbrs authored and jkbradley committed Dec 11, 2015
    Configuration menu
    Copy the full SHA
    2ddd104 View commit details
    Browse the repository at this point in the history
  8. [SPARK-11497][MLLIB][PYTHON] PySpark RowMatrix Constructor Has Type E…

    …rasure Issue
    
    As noted in PR #9441, implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor.  As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark.  Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`.  Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`.  As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type.  `IndexedRowMatrix` and `CoordinateMatrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types.
    
    This PR currently contains that retagging fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`.  This PR blocks #9441, so once this is merged, the other can be rebased.
    
    cc holdenk
    
    Author: Mike Dusenberry <mwdusenb@us.ibm.com>
    
    Closes #9458 from dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue.
    
    (cherry picked from commit 1b82203)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    dusenberrymw authored and jkbradley committed Dec 11, 2015
    Configuration menu
    Copy the full SHA
    bfcc8cf View commit details
    Browse the repository at this point in the history
  9. [SPARK-12217][ML] Document invalid handling for StringIndexer

    Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation.
    
    I wonder if I should also add a snippet to the code example, input welcome.
    
    Author: BenFradet <benjamin.fradet@gmail.com>
    
    Closes #10257 from BenFradet/SPARK-12217.
    
    (cherry picked from commit aea676c)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    BenFradet authored and jkbradley committed Dec 11, 2015
    Configuration menu
    Copy the full SHA
    75531c7 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2015

  1. [SPARK-11978][ML] Move dataset_example.py to examples/ml and rename t…

    …o dataframe_example.py
    
    Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion.
    #9873 finished the work of Scala example, here we focus on the Python one.
    Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```.
    BTW, fix minor missing issues of #9873.
    cc mengxr
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #9957 from yanboliang/SPARK-11978.
    
    (cherry picked from commit a0ff6d1)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    yanboliang authored and jkbradley committed Dec 12, 2015
    Configuration menu
    Copy the full SHA
    c2f2046 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitions

    Modifies the String overload to call the Column overload and ensures this is called in a test.
    
    Author: Ankur Dave <ankurdave@gmail.com>
    
    Closes #10271 from ankurdave/SPARK-12298.
    
    (cherry picked from commit 1e799d6)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    ankurdave authored and yhuai committed Dec 12, 2015
    Configuration menu
    Copy the full SHA
    03d8015 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit t…

    …est cases
    
    The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.
    
    This could cause SparkR unit tests failed. For example, I hit it in another PR:
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull
    
    Author: gatorsmile <gatorsmile@gmail.com>
    
    Closes #10160 from gatorsmile/sampleR.
    
    (cherry picked from commit 1e3526c)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    gatorsmile authored and shivaram committed Dec 12, 2015
    Configuration menu
    Copy the full SHA
    47461fe View commit details
    Browse the repository at this point in the history
  4. [SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap t…

    …rait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver
    
    Author: Jean-Baptiste Onofré <jbonofre@apache.org>
    
    Closes #10203 from jbonofre/SPARK-11193.
    
    (cherry picked from commit 03138b6)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    jbonofre authored and srowen committed Dec 12, 2015
    Configuration menu
    Copy the full SHA
    2679fce View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2015

  1. [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md

    https://issues.apache.org/jira/browse/SPARK-12199
    
    Follow-up PR of SPARK-11551. Fix some errors in ml-features.md
    
    mengxr
    
    Author: Xusen Yin <yinxusen@gmail.com>
    
    Closes #10193 from yinxusen/SPARK-12199.
    
    (cherry picked from commit 98b212d)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    yinxusen authored and jkbradley committed Dec 13, 2015
    Configuration menu
    Copy the full SHA
    e05364b View commit details
    Browse the repository at this point in the history
  2. [SPARK-12267][CORE] Store the remote RpcEnv address to send the corre…

    …ct disconnetion message
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10261 from zsxwing/SPARK-12267.
    
    (cherry picked from commit 8af2f8c)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Dec 13, 2015
    Configuration menu
    Copy the full SHA
    d7e3bfd View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2015

  1. [SPARK-12281][CORE] Fix a race condition when reporting ExecutorState…

    … in the shutdown hook
    
    1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook.
    2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook.
    
    This should fix the potential exceptions when exiting a local cluster
    ```
    java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal
    	at scala.Predef$.assert(Predef.scala:179)
    	at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
    	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
    	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
    	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
    	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    
    java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    	at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
    	at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
    	at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
    	at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
    	at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
    	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
    	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
    	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
    	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    ```
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10269 from zsxwing/executor-state.
    
    (cherry picked from commit 2aecda2)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Dec 14, 2015
    Configuration menu
    Copy the full SHA
    fbf16da View commit details
    Browse the repository at this point in the history
  2. [SPARK-12275][SQL] No plan for BroadcastHint in some condition

    When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies.
    
    https://issues.apache.org/jira/browse/SPARK-12275
    
    Author: yucai <yucai.yu@intel.com>
    
    Closes #10265 from yucai/broadcast_hint.
    
    (cherry picked from commit ed87f6d)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    yucai authored and yhuai committed Dec 14, 2015
    Configuration menu
    Copy the full SHA
    94ce502 View commit details
    Browse the repository at this point in the history
  3. [MINOR][DOC] Fix broken word2vec link

    Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is.
    
    Author: BenFradet <benjamin.fradet@gmail.com>
    
    Closes #10282 from BenFradet/SPARK-12199.
    
    (cherry picked from commit e25f1fe)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    BenFradet authored and srowen committed Dec 14, 2015
    Configuration menu
    Copy the full SHA
    c0f0f6c View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2015

  1. [SPARK-12327] Disable commented code lintr temporarily

    cc yhuai felixcheung shaneknapp
    
    Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    
    Closes #10300 from shivaram/comment-lintr-disable.
    
    (cherry picked from commit fb3778d)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    shivaram committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    352a0c8 View commit details
    Browse the repository at this point in the history
  2. [STREAMING][MINOR] Fix typo in function name of StateImpl

    cc\ tdas zsxwing , please review. Thanks a lot.
    
    Author: jerryshao <sshao@hortonworks.com>
    
    Closes #10305 from jerryshao/fix-typo-state-impl.
    
    (cherry picked from commit bc1ff9f)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    jerryshao authored and zsxwing committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    23c8846 View commit details
    Browse the repository at this point in the history
  3. Update branch-1.6 for 1.6.0 release

    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #10317 from marmbrus/versions.
    marmbrus committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    80d2617 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    00a39d9 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    08aa3b4 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2015

  1. [SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after cal…

    …ling setConf
    
    This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala
    
    andrewor14
    FYI
    
    Author: tedyu <yuzhihong@gmail.com>
    
    Closes #10164 from tedyu/master.
    
    (cherry picked from commit f725b2e)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    tedyu authored and Andrew Or committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    9e4ac56 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12351][MESOS] Add documentation about submitting Spark with me…

    …sos cluster mode.
    
    Adding more documentation about submitting jobs with mesos cluster mode.
    
    Author: Timothy Chen <tnachen@gmail.com>
    
    Closes #10086 from tnachen/mesos_supervise_docs.
    
    (cherry picked from commit c2de99a)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    tnachen authored and Andrew Or committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    2c324d3 View commit details
    Browse the repository at this point in the history
  3. [SPARK-9886][CORE] Fix to use ShutdownHookManager in

    ExternalBlockStore.scala
    
    Author: Naveen <naveenminchu@gmail.com>
    
    Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
    
    (cherry picked from commit 8a215d2)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    naveenminchu authored and Andrew Or committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    8e9a600 View commit details
    Browse the repository at this point in the history
  4. [SPARK-12062][CORE] Change Master to asyc rebuild UI when application…

    … completes
    
    This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild.
    
    Author: Bryan Cutler <bjcutler@us.ibm.com>
    
    Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
    
    (cherry picked from commit c5b6b39)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    BryanCutler authored and Andrew Or committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    93095eb View commit details
    Browse the repository at this point in the history
  5. [SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readabi…

    …lity
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes #8645 from cloud-fan/test.
    
    (cherry picked from commit a89e8b6)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    cloud-fan authored and Andrew Or committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    fb08f7b View commit details
    Browse the repository at this point in the history
  6. [SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation

    This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow.
    Credit goes to the original author Titan-C (mentioned in the NOTICE).
    
    Note that I am not a CSS expert, so I can only address comments up to some extent.
    
    Default view:
    <img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png">
    
    When collapsed manually by the user:
    <img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png">
    
    Disappears when column is too narrow:
    <img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png">
    
    Can still be opened by the user if necessary:
    <img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png">
    
    Author: Timothy Hunter <timhunter@databricks.com>
    
    Closes #10297 from thunterdb/12324.
    
    (cherry picked from commit a6325fc)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    thunterdb authored and jkbradley committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    a2d584e View commit details
    Browse the repository at this point in the history
  7. [SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR

    Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10281 from yanboliang/spark-12310.
    
    (cherry picked from commit 22f6cd8)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    yanboliang authored and shivaram committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    ac0e2ea View commit details
    Browse the repository at this point in the history
  8. [SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml

    cc jkbradley
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #10244 from yu-iskw/SPARK-12215.
    
    (cherry picked from commit 26d70bd)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    yu-iskw authored and jkbradley committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    16edd93 View commit details
    Browse the repository at this point in the history
  9. [SPARK-12318][SPARKR] Save mode in SparkR should be error by default

    shivaram  Please help review.
    
    Author: Jeff Zhang <zjffdu@apache.org>
    
    Closes #10290 from zjffdu/SPARK-12318.
    
    (cherry picked from commit 2eb5af5)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    zjffdu authored and shivaram committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    f815127 View commit details
    Browse the repository at this point in the history
  10. [SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs wit…

    …h Mesos cluster mode.
    
    SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.
    
    We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.
    
    Author: Timothy Chen <tnachen@gmail.com>
    
    Closes #10332 from tnachen/scheduler_ui.
    
    (cherry picked from commit ad8c1f0)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    tnachen authored and Andrew Or committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    e5b8571 View commit details
    Browse the repository at this point in the history
  11. [SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for…

    … bisecting k-means
    
    This PR includes only an example code in order to finish it quickly.
    I'll send another PR for the docs soon.
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #9952 from yu-iskw/SPARK-6518.
    
    (cherry picked from commit 7b6dc29)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    yu-iskw authored and jkbradley committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    e1adf6d View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    168c89e View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    aee88eb View commit details
    Browse the repository at this point in the history
  14. [SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6

    No known breaking changes, but some deprecations and changes of behavior.
    
    CC: mengxr
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes #10235 from jkbradley/mllib-guide-update-1.6.
    
    (cherry picked from commit 8148cc7)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    jkbradley committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    dffa610 View commit details
    Browse the repository at this point in the history
  15. [SPARK-12364][ML][SPARKR] Add ML example for SparkR

    We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```.
    
    cc mengxr jkbradley shivaram
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10324 from yanboliang/spark-12364.
    
    (cherry picked from commit 1a8b2a1)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    yanboliang authored and jkbradley committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    04e868b View commit details
    Browse the repository at this point in the history
  16. [SPARK-12380] [PYSPARK] use SQLContext.getOrCreate in mllib

    MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #10338 from davies/create_context.
    
    (cherry picked from commit 27b98e9)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    Davies Liu authored and davies committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    552b38f View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2015

  1. [MINOR] Add missing interpolation in NettyRPCEnv

    ```
    Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
    Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
    	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
    	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
    ```
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #10334 from andrewor14/rpc-typo.
    
    (cherry picked from commit 861549a)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    Andrew Or authored and zsxwing committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    638b89b View commit details
    Browse the repository at this point in the history
  2. [SPARK-10248][CORE] track exceptions in dagscheduler event loop in tests

    `DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs).  However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.
    
    This was suggested by mateiz on #7699.  It may have already turned up an issue in "zero split job".
    
    Author: Imran Rashid <irashid@cloudera.com>
    
    Closes #8466 from squito/SPARK-10248.
    
    (cherry picked from commit 38d9795)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    squito authored and Andrew Or committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    fb02e4e View commit details
    Browse the repository at this point in the history
  3. [SPARK-12365][CORE] Use ShutdownHookManager where Runtime.getRuntime.…

    …addShutdownHook() is called
    
    SPARK-9886 fixed ExternalBlockStore.scala
    
    This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook()
    
    Author: tedyu <yuzhihong@gmail.com>
    
    Closes #10325 from ted-yu/master.
    
    (cherry picked from commit f590178)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    
    Conflicts:
    	sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
    tedyu authored and Andrew Or committed Dec 17, 2015
    1 Configuration menu
    Copy the full SHA
    4af6438 View commit details
    Browse the repository at this point in the history
  4. [SPARK-12186][WEB UI] Send the complete request URI including the que…

    …ry string when redirecting.
    
    Author: Rohit Agarwal <rohita@qubole.com>
    
    Closes #10180 from mindprince/SPARK-12186.
    
    (cherry picked from commit fdb3822)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Rohit Agarwal authored and Andrew Or committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    154567d View commit details
    Browse the repository at this point in the history
  5. [SPARK-12386][CORE] Fix NPE when spark.executor.port is set.

    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes #10339 from vanzin/SPARK-12386.
    
    (cherry picked from commit d1508dd)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Marcelo Vanzin authored and Andrew Or committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    4ad0803 View commit details
    Browse the repository at this point in the history
  6. [SPARK-12057][SQL] Prevent failure on corrupt JSON records

    This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference.
    
    Regarding the schema inference change, if we have something like
    ```
    {"f1":1}
    [1,2,3]
    ```
    originally, we will get a DF without any column.
    After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`.
    
    When merge this PR, please make sure that the author is simplyianm.
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-12057
    
    Closes #10043
    
    Author: Ian Macalinao <me@ian.pw>
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes #10288 from yhuai/handleCorruptJson.
    
    (cherry picked from commit 9d66c42)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    yhuai authored and rxin committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    d509194 View commit details
    Browse the repository at this point in the history
  7. Once driver register successfully, stop it to connect to master.

    This commit is to resolve SPARK-12396.
    
    Author: echo2mei <534384876@qq.com>
    
    Closes #10354 from echoTomei/master.
    
    (cherry picked from commit 5a514b6)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    echoTomei authored and davies committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    da7542f View commit details
    Browse the repository at this point in the history
  8. Revert "Once driver register successfully, stop it to connect to mast…

    …er."
    
    This reverts commit da7542f.
    davies committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    a846648 View commit details
    Browse the repository at this point in the history
  9. [SPARK-12395] [SQL] fix resulting columns of outer join

    For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null).
    
    The order of columns had been changed to match that with MySQL and PostgreSQL [1].
    
    This PR also fix the nullability of output for outer join.
    
    [1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #10353 from davies/fix_join.
    
    (cherry picked from commit a170d34)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    Davies Liu authored and davies committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    1ebedb2 View commit details
    Browse the repository at this point in the history
  10. [SQL] Update SQLContext.read.text doc

    Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc.
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10349 from yanboliang/text-value.
    
    (cherry picked from commit 6e07716)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    yanboliang authored and rxin committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    41ad8ac View commit details
    Browse the repository at this point in the history
  11. [SPARK-12220][CORE] Make Utils.fetchFile support files that contain s…

    …pecial characters
    
    This PR encodes and decodes the file name to fix the issue.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10208 from zsxwing/uri.
    
    (cherry picked from commit 86e405f)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    1fbca41 View commit details
    Browse the repository at this point in the history
  12. [SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos REST…

    … server
    
    Fix problem with #10332, this one should fix Cluster mode on Mesos
    
    Author: Iulian Dragos <jaguarul@gmail.com>
    
    Closes #10359 from dragos/issue/fix-spark-12345-one-more-time.
    
    (cherry picked from commit 8184568)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    dragos authored and sarutak committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    881f254 View commit details
    Browse the repository at this point in the history
  13. [SPARK-12390] Clean up unused serializer parameter in BlockManager

    No change in functionality is intended. This only changes internal API.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #10343 from andrewor14/clean-bm-serializer.
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/storage/BlockManager.scala
    Andrew Or committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    88bbb54 View commit details
    Browse the repository at this point in the history
  14. [SPARK-12410][STREAMING] Fix places that use '.' and '|' directly in …

    …split
    
    String.split accepts a regular expression, so we should escape "." and "|".
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10361 from zsxwing/reg-bug.
    
    (cherry picked from commit 540b5ae)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    c0ab14f View commit details
    Browse the repository at this point in the history
  15. [SPARK-12397][SQL] Improve error messages for data sources when they …

    …are not found
    
    Point users to spark-packages.org to find them.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes #10351 from rxin/SPARK-12397.
    
    (cherry picked from commit e096a65)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    rxin authored and marmbrus committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    48dcee4 View commit details
    Browse the repository at this point in the history
  16. [SPARK-12376][TESTS] Spark Streaming Java8APISuite fails in assertOrd…

    …erInvariantEquals method
    
    org.apache.spark.streaming.Java8APISuite.java is failing due to trying to sort immutable list in assertOrderInvariantEquals method.
    
    Author: Evan Chen <chene@us.ibm.com>
    
    Closes #10336 from evanyc15/SPARK-12376-StreamingJavaAPISuite.
    Evan Chen authored and zsxwing committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    4df1dd4 View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2015

  1. [SPARK-11749][STREAMING] Duplicate creating the RDD in file stream wh…

    …en recovering from checkpoint data
    
    Add a transient flag `DStream.restoredFromCheckpointData` to control the restore processing in DStream to avoid duplicate works:  check this flag first in `DStream.restoreCheckpointData`, only when `false`, the restore process will be executed.
    
    Author: jhu-chang <gt.hu.chang@gmail.com>
    
    Closes #9765 from jhu-chang/SPARK-11749.
    
    (cherry picked from commit f4346f6)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    jhu-chang authored and zsxwing committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    9177ea3 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12413] Fix Mesos ZK persistence

    I believe this fixes SPARK-12413.  I'm currently running an integration test to verify.
    
    Author: Michael Gummelt <mgummelt@mesosphere.io>
    
    Closes #10366 from mgummelt/fix-zk-mesos.
    
    (cherry picked from commit 2bebaa3)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    Michael Gummelt authored and sarutak committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    df02319 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12218][SQL] Invalid splitting of nested AND expressions in Dat…

    …a Source filter API
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-12218
    
    When creating filters for Parquet/ORC, we should not push nested AND expressions partially.
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes #10362 from yhuai/SPARK-12218.
    
    (cherry picked from commit 41ee7c5)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    yhuai committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    1dc71ec View commit details
    Browse the repository at this point in the history
  4. Revert "[SPARK-12365][CORE] Use ShutdownHookManager where Runtime.get…

    …Runtime.addShutdownHook() is called"
    
    This reverts commit 4af6438.
    Andrew Or committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    3b903e4 View commit details
    Browse the repository at this point in the history
  5. [SPARK-12404][SQL] Ensure objects passed to StaticInvoke is Serializable

    Now `StaticInvoke` receives `Any` as a object and `StaticInvoke` can be serialized but sometimes the object passed is not serializable.
    
    For example, following code raises Exception because `RowEncoder#extractorsFor` invoked indirectly makes `StaticInvoke`.
    
    ```
    case class TimestampContainer(timestamp: java.sql.Timestamp)
    val rdd = sc.parallelize(1 to 2).map(_ => TimestampContainer(System.currentTimeMillis))
    val df = rdd.toDF
    val ds = df.as[TimestampContainer]
    val rdd2 = ds.rdd                                 <----------------- invokes extractorsFor indirectory
    ```
    
    I'll add test cases.
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #10357 from sarutak/SPARK-12404.
    
    (cherry picked from commit 6eba655)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    sarutak authored and marmbrus committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    bd33d4e View commit details
    Browse the repository at this point in the history
  6. [SPARK-11985][STREAMING][KINESIS][DOCS] Update Kinesis docs

     - Provide example on `message handler`
     - Provide bit on KPL record de-aggregation
     - Fix typos
    
    Author: Burak Yavuz <brkyvz@gmail.com>
    
    Closes #9970 from brkyvz/kinesis-docs.
    
    (cherry picked from commit 2377b70)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    brkyvz authored and zsxwing committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    eca401e View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2015

  1. [SQL] Fix mistake doc of join type for dataframe.join

    Fix mistake doc of join type for ```dataframe.join```.
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10378 from yanboliang/leftsemi.
    
    (cherry picked from commit a073a73)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    yanboliang authored and rxin committed Dec 19, 2015
    Configuration menu
    Copy the full SHA
    d6a519f View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2015

  1. Doc typo: ltrim = trim from left end, not right

    Author: pshearer <pshearer@massmutual.com>
    
    Closes #10414 from pshearer/patch-1.
    
    (cherry picked from commit fc6dbcc)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    shearerpmm authored and Andrew Or committed Dec 21, 2015
    Configuration menu
    Copy the full SHA
    c754a08 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12466] Fix harmless NPE in tests

    ```
    [info] ReplayListenerSuite:
    [info] - Simple replay (58 milliseconds)
    java.lang.NullPointerException
    	at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
    	at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
    ```
    https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
    
    This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests).
    
    Tested locally to verify that the NPE is gone.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #10417 from andrewor14/fix-harmless-npe.
    
    (cherry picked from commit d655d37)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Andrew Or committed Dec 21, 2015
    Configuration menu
    Copy the full SHA
    ca39985 View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2015

  1. Configuration menu
    Copy the full SHA
    4062cda View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5b19e7c View commit details
    Browse the repository at this point in the history
  3. [MINOR] Fix typos in JavaStreamingContext

    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10424 from zsxwing/typo.
    
    (cherry picked from commit 93da856)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    zsxwing authored and rxin committed Dec 22, 2015
    Configuration menu
    Copy the full SHA
    309ef35 View commit details
    Browse the repository at this point in the history
  4. [SPARK-11823][SQL] Fix flaky JDBC cancellation test in HiveThriftBina…

    …ryServerSuite
    
    This patch fixes a flaky "test jdbc cancel" test in HiveThriftBinaryServerSuite. This test is prone to a race-condition which causes it to block indefinitely with while waiting for an extremely slow query to complete, which caused many Jenkins builds to time out.
    
    For more background, see my comments on #6207 (the PR which introduced this test).
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #10425 from JoshRosen/SPARK-11823.
    
    (cherry picked from commit 2235cd4)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 22, 2015
    Configuration menu
    Copy the full SHA
    0f905d7 View commit details
    Browse the repository at this point in the history
  5. [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler

    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10439 from zsxwing/kafka-message-handler-doc.
    
    (cherry picked from commit 93db50d)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Dec 22, 2015
    Configuration menu
    Copy the full SHA
    94fb5e8 View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2015

  1. [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example f…

    …or Streaming
    
    This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10385 from zsxwing/accumulator-broadcast-example.
    
    (cherry picked from commit 20591af)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Dec 23, 2015
    Configuration menu
    Copy the full SHA
    942c057 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12477][SQL] - Tungsten projection fails for null values in arr…

    …ay fields
    
    Accessing null elements in an array field fails when tungsten is enabled.
    It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.
    
    This PR solves this by checking if the accessed element in the array field is null, in the generated code.
    
    Example:
    ```
    // Array of String
    case class AS( as: Seq[String] )
    val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
    dfAS.registerTempTable("T_AS")
    for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}
    ```
    
    With Tungsten disabled:
    ```
    0 = [a]
    1 = [null]
    2 = [b]
    ```
    
    With Tungsten enabled:
    ```
    0 = [a]
    15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15)
    java.lang.NullPointerException
    	at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
    	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    ```
    
    Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com>
    
    Closes #10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.
    
    (cherry picked from commit 43b2a63)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    pierre-borckmans authored and rxin committed Dec 23, 2015
    Configuration menu
    Copy the full SHA
    c6c9bf9 View commit details
    Browse the repository at this point in the history

Commits on Dec 24, 2015

  1. [SPARK-12499][BUILD] don't force MAVEN_OPTS

    allow the user to override MAVEN_OPTS (2GB wasn't sufficient for me)
    
    Author: Adrian Bridgett <adrian@smop.co.uk>
    
    Closes #10448 from abridgett/feature/do_not_force_maven_opts.
    
    (cherry picked from commit ead6abf)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    abridgett authored and JoshRosen committed Dec 24, 2015
    Configuration menu
    Copy the full SHA
    5987b16 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12411][CORE] Decrease executor heartbeat timeout to match hear…

    …tbeat interval
    
    Previously, the rpc timeout was the default network timeout, which is the same value
    the driver uses to determine dead executors. This means if there is a network issue,
    the executor is determined dead after one heartbeat attempt. There is a separate config
    for the heartbeat interval which is a better value to use for the heartbeat RPC. With
    this change, the executor will make multiple heartbeat attempts even with RPC issues.
    
    Author: Nong Li <nong@databricks.com>
    
    Closes #10365 from nongli/spark-12411.
    nongli authored and Andrew Or committed Dec 24, 2015
    Configuration menu
    Copy the full SHA
    b49856a View commit details
    Browse the repository at this point in the history
  3. [SPARK-12502][BUILD][PYTHON] Script /dev/run-tests fails when IBM Jav…

    …a is used
    
    fix an exception with IBM JDK by removing update field from a JavaVersion tuple. This is because IBM JDK does not have information on update '_xx'
    
    Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
    
    Closes #10463 from kiszk/SPARK-12502.
    
    (cherry picked from commit 9e85bb7)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    kiszk authored and sarutak committed Dec 24, 2015
    Configuration menu
    Copy the full SHA
    4dd8712 View commit details
    Browse the repository at this point in the history
  4. [SPARK-12010][SQL] Spark JDBC requires support for column-name-free I…

    …NSERT syntax
    
    In the past Spark JDBC write only worked with technologies which support the following INSERT statement syntax (JdbcUtils.scala: insertStatement()):
    
    INSERT INTO $table VALUES ( ?, ?, ..., ? )
    
    But some technologies require a list of column names:
    
    INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? )
    
    This was blocking the use of e.g. the Progress JDBC Driver for Cassandra.
    
    Another limitation is that syntax 1 relies no the dataframe field ordering match that of the target table. This works fine, as long as the target table has been created by writer.jdbc().
    
    If the target table contains more columns (not created by writer.jdbc()), then the insert fails due mismatch of number of columns or their data types.
    
    This PR switches to the recommended second INSERT syntax. Column names are taken from datafram field names.
    
    Author: CK50 <christian.kurz@oracle.com>
    
    Closes #10380 from CK50/master-SPARK-12010-2.
    
    (cherry picked from commit 502476e)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    CK50 authored and srowen committed Dec 24, 2015
    Configuration menu
    Copy the full SHA
    865dd8b View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2015

  1. [SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equ…

    …i-Join
    
    After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code.
    
    For example, users can do the Equi-Join like
      ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
    - There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`).
    - After a PR: #8600, the 1.6 does not have such an issue, but the description has not been updated.
    
    Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join.
    
    Author: gatorsmile <gatorsmile@gmail.com>
    
    Closes #10477 from gatorsmile/pyOuterJoin.
    gatorsmile authored and davies committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    b8da77e View commit details
    Browse the repository at this point in the history
  2. [SPARK-12517] add default RDD name for one created via sc.textFile

    The feature was first added at commit: 7b877b2 but was later removed (probably by mistake) at commit: fc8b581.
    This change sets the default path of RDDs created via sc.textFile(...) to the path argument.
    
    Here is the symptom:
    
    * Using spark-1.5.2-bin-hadoop2.6:
    
    scala> sc.textFile("/home/root/.bashrc").name
    res5: String = null
    
    scala> sc.binaryFiles("/home/root/.bashrc").name
    res6: String = /home/root/.bashrc
    
    * while using Spark 1.3.1:
    
    scala> sc.textFile("/home/root/.bashrc").name
    res0: String = /home/root/.bashrc
    
    scala> sc.binaryFiles("/home/root/.bashrc").name
    res1: String = /home/root/.bashrc
    
    Author: Yaron Weinsberg <wyaron@gmail.com>
    Author: yaron <yaron@il.ibm.com>
    
    Closes #10456 from wyaron/master.
    
    (cherry picked from commit 73b70f0)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    wyaron authored and sarutak committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    1fbcb6e View commit details
    Browse the repository at this point in the history
  3. [SPARK-12424][ML] The implementation of ParamMap#filter is wrong.

    ParamMap#filter uses `mutable.Map#filterKeys`. The return type of `filterKey` is collection.Map, not mutable.Map but the result is casted to mutable.Map using `asInstanceOf` so we get `ClassCastException`.
    Also, the return type of Map#filterKeys is not Serializable. It's the issue of Scala (https://issues.scala-lang.org/browse/SI-6654).
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes #10381 from sarutak/SPARK-12424.
    
    (cherry picked from commit 07165ca)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    sarutak committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    7c7d76f View commit details
    Browse the repository at this point in the history
  4. [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer t…

    …hrow Buffer underflow exception
    
    Since we only need to implement `def skipBytes(n: Int)`,
    code in #10213 could be simplified.
    davies scwf
    
    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes #10253 from adrian-wang/kryo.
    
    (cherry picked from commit a6d3853)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    adrian-wang authored and sarutak committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    a9c52d4 View commit details
    Browse the repository at this point in the history
  5. [SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs

    Include the following changes:
    
    1. Close `java.sql.Statement`
    2. Fix incorrect `asInstanceOf`.
    3. Remove unnecessary `synchronized` and `ReentrantLock`.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10440 from zsxwing/findbugs.
    
    (cherry picked from commit 710b411)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    fd20248 View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2015

  1. [SPARK-11394][SQL] Throw IllegalArgumentException for unsupported typ…

    …es in postgresql
    
    If DataFrame has BYTE types, throws an exception:
    org.postgresql.util.PSQLException: ERROR: type "byte" does not exist
    
    Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
    
    Closes #9350 from maropu/FixBugInPostgreJdbc.
    
    (cherry picked from commit 73862a1)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    maropu authored and yhuai committed Dec 29, 2015
    Configuration menu
    Copy the full SHA
    85a8718 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12526][SPARKR] ifelse, when, otherwise` unable to take Col…

    …umn as value
    
    `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values.
    
    For example:
    ```r
    ifelse(lit(1) == lit(1), lit(2), lit(3))
    ifelse(df$mpg > 0, df$mpg, 0)
    ```
    will both fail with
    ```r
    attempt to replicate an object of type 'environment'
    ```
    
    The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.
    
    For reference, added test cases which trigger failures:
    ```r
    . Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
    error in evaluating the argument 'x' in selecting a method for function 'collect':
      error in evaluating the argument 'col' in selecting a method for function 'select':
      attempt to replicate an object of type 'environment'
    Calls: when -> when -> ifelse -> ifelse
    
    1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
    2: eval(code, new_test_environment)
    3: eval(expr, envir, enclos)
    4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
    5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
    6: condition(object)
    7: compare(actual, expected, ...)
    8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
    Error: Test failures
    Execution halted
    ```
    
    Author: Forest Fang <forest.fang@outlook.com>
    
    Closes #10481 from saurfang/spark-12526.
    
    (cherry picked from commit d80cc90)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    saurfang authored and shivaram committed Dec 29, 2015
    Configuration menu
    Copy the full SHA
    c069ffc View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2015

  1. [SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collections

    Current schema inference for local python collections halts as soon as there are no NullTypes. This is different than when we specify a sampling ratio of 1.0 on a distributed collection. This could result in incomplete schema information.
    
    Author: Holden Karau <holden@us.ibm.com>
    
    Closes #10275 from holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections.
    
    (cherry picked from commit d1ca634)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    holdenk authored and davies committed Dec 30, 2015
    Configuration menu
    Copy the full SHA
    8dc6549 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12399] Display correct error message when accessing REST API w…

    …ith an unknown app Id
    
    I got an exception when accessing the below REST API with an unknown application Id.
    `http://<server-url>:18080/api/v1/applications/xxx/jobs`
    Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx`
    ```
    org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx
    	at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
    	at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
    	at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
    	at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
    	at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116)
    	at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226)
    	at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46)
    	at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
    ```
    
    Author: Carson Wang <carson.wang@intel.com>
    
    Closes #10352 from carsonwang/unknownAppFix.
    
    (cherry picked from commit b244297)
    Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
    carsonwang authored and Marcelo Vanzin committed Dec 30, 2015
    Configuration menu
    Copy the full SHA
    cd86075 View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2016

  1. [SPARK-12327][SPARKR] fix code for lintr warning for commented code

    shivaram
    
    Author: felixcheung <felixcheung_m@hotmail.com>
    
    Closes #10408 from felixcheung/rcodecomment.
    
    (cherry picked from commit c3d5056)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    felixcheung authored and shivaram committed Jan 3, 2016
    Configuration menu
    Copy the full SHA
    4e9dd16 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2016

  1. [SPARK-12562][SQL] DataFrame.write.format(text) requires the column n…

    …ame to be called value
    
    Author: Xiu Guo <xguo27@gmail.com>
    
    Closes #10515 from xguo27/SPARK-12562.
    
    (cherry picked from commit 84f8492)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    xguo27 authored and rxin committed Jan 4, 2016
    Configuration menu
    Copy the full SHA
    f7a3223 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12486] Worker should kill the executors more forcefully if pos…

    …sible.
    
    This patch updates the ExecutorRunner's terminate path to use the new java 8 API
    to terminate processes more forcefully if possible. If the executor is unhealthy,
    it would previously ignore the destroy() call. Presumably, the new java API was
    added to handle cases like this.
    
    We could update the termination path in the future to use OS specific commands
    for older java versions.
    
    Author: Nong Li <nong@databricks.com>
    
    Closes #10438 from nongli/spark-12486-executors.
    
    (cherry picked from commit 8f65939)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    nongli authored and Andrew Or committed Jan 4, 2016
    Configuration menu
    Copy the full SHA
    cd02038 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12470] [SQL] Fix size reduction calculation

    also only allocate required buffer size
    
    Author: Pete Robbins <robbinspg@gmail.com>
    
    Closes #10421 from robbinspg/master.
    
    (cherry picked from commit b504b6a)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    
    Conflicts:
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala
    robbinspg authored and davies committed Jan 4, 2016
    Configuration menu
    Copy the full SHA
    b5a1f56 View commit details
    Browse the repository at this point in the history
  4. [SPARK-12579][SQL] Force user-specified JDBC driver to take precedence

    Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection.
    
    In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection.
    
    This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly).
    
    If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different).
    
    This patch is inspired by a similar patch that I made to the `spark-redshift` library (databricks/spark-redshift#143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons).
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #10519 from JoshRosen/jdbc-driver-precedence.
    
    (cherry picked from commit 6c83d93)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    JoshRosen authored and yhuai committed Jan 4, 2016
    Configuration menu
    Copy the full SHA
    7f37c1e View commit details
    Browse the repository at this point in the history
  5. [DOC] Adjust coverage for partitionBy()

    This is the related thread: http://search-hadoop.com/m/q3RTtO3ReeJ1iF02&subj=Re+partitioning+json+data+in+spark
    
    Michael suggested fixing the doc.
    
    Please review.
    
    Author: tedyu <yuzhihong@gmail.com>
    
    Closes #10499 from ted-yu/master.
    
    (cherry picked from commit 40d0396)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    tedyu authored and marmbrus committed Jan 4, 2016
    Configuration menu
    Copy the full SHA
    1005ee3 View commit details
    Browse the repository at this point in the history
  6. [SPARK-12589][SQL] Fix UnsafeRowParquetRecordReader to properly set t…

    …he row length.
    
    The reader was previously not setting the row length meaning it was wrong if there were variable
    length columns. This problem does not manifest usually, since the value in the column is correct and
    projecting the row fixes the issue.
    
    Author: Nong Li <nong@databricks.com>
    
    Closes #10576 from nongli/spark-12589.
    
    (cherry picked from commit 34de24a)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    
    Conflicts:
    	sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
    nongli authored and yhuai committed Jan 4, 2016
    Configuration menu
    Copy the full SHA
    8ac9198 View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2016

  1. [SPARKR][DOC] minor doc update for version in migration guide

    checked that the change is in Spark 1.6.0.
    shivaram
    
    Author: felixcheung <felixcheung_m@hotmail.com>
    
    Closes #10574 from felixcheung/rwritemodedoc.
    
    (cherry picked from commit 8896ec9)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    felixcheung authored and shivaram committed Jan 5, 2016
    Configuration menu
    Copy the full SHA
    8950482 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12568][SQL] Add BINARY to Encoders

    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #10516 from marmbrus/datasetCleanup.
    
    (cherry picked from commit 53beddc)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    marmbrus committed Jan 5, 2016
    Configuration menu
    Copy the full SHA
    d9e4438 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12647][SQL] Fix o.a.s.sqlexecution.ExchangeCoordinatorSuite.de…

    …termining the number of reducers: aggregate operator
    
    change expected partition sizes
    
    Author: Pete Robbins <robbinspg@gmail.com>
    
    Closes #10599 from robbinspg/branch-1.6.
    robbinspg authored and yhuai committed Jan 5, 2016
    Configuration menu
    Copy the full SHA
    5afa62b View commit details
    Browse the repository at this point in the history
  4. [SPARK-12617] [PYSPARK] Clean up the leak sockets of Py4J

    This patch added Py4jCallbackConnectionCleaner to clean the leak sockets of Py4J every 30 seconds. This is a workaround before Py4J fixes the leak issue py4j/py4j#187
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10579 from zsxwing/SPARK-12617.
    
    (cherry picked from commit 047a31b)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    zsxwing authored and davies committed Jan 5, 2016
    Configuration menu
    Copy the full SHA
    f31d0fd View commit details
    Browse the repository at this point in the history
  5. [SPARK-12511] [PYSPARK] [STREAMING] Make sure PythonDStream.registerS…

    …erializer is called only once
    
    There is an issue that Py4J's PythonProxyHandler.finalize blocks forever. (py4j/py4j#184)
    
    Py4j will create a PythonProxyHandler in Java for "transformer_serializer" when calling "registerSerializer". If we call "registerSerializer" twice, the second PythonProxyHandler will override the first one, then the first one will be GCed and trigger "PythonProxyHandler.finalize". To avoid that, we should not call"registerSerializer" more than once, so that "PythonProxyHandler" in Java side won't be GCed.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10514 from zsxwing/SPARK-12511.
    
    (cherry picked from commit 6cfe341)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    zsxwing authored and davies committed Jan 5, 2016
    Configuration menu
    Copy the full SHA
    83fe5cf View commit details
    Browse the repository at this point in the history
  6. [SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans

    SPARK-12450 . Un-persist broadcasted variables in KMeans.
    
    Author: RJ Nowling <rnowling@gmail.com>
    
    Closes #10415 from rnowling/spark-12450.
    
    (cherry picked from commit 78015a8)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    rnowling authored and jkbradley committed Jan 5, 2016
    Configuration menu
    Copy the full SHA
    0afad66 View commit details
    Browse the repository at this point in the history
  7. [SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk

    Successfully ran kinesis demo on a live, aws hosted kinesis stream against master and 1.6 branches.  For reasons I don't entirely understand it required a manual merge to 1.5 which I did as shown here: BrianLondon@075c22e
    
    The demo ran successfully on the 1.5 branch as well.
    
    According to `mvn dependency:tree` it is still pulling a fairly old version of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis regression in 1.5.2.
    
    Author: BrianLondon <brian@seatgeek.com>
    
    Closes #10492 from BrianLondon/remove-only.
    
    (cherry picked from commit ff89975)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    BrianLondon authored and srowen committed Jan 5, 2016
    Configuration menu
    Copy the full SHA
    bf3dca2 View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2016

  1. [SPARK-12393][SPARKR] Add read.text and write.text for SparkR

    Add ```read.text``` and ```write.text``` for SparkR.
    cc sun-rui felixcheung shivaram
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10348 from yanboliang/spark-12393.
    
    (cherry picked from commit d1fea41)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    yanboliang authored and shivaram committed Jan 6, 2016
    Configuration menu
    Copy the full SHA
    c3135d0 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None

    If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting `initialModel.weights` to `list`.
    
    Author: zero323 <matthew.szymkiewicz@gmail.com>
    
    Closes #9986 from zero323/SPARK-12006.
    
    (cherry picked from commit fcd013c)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    zero323 authored and jkbradley committed Jan 6, 2016
    Configuration menu
    Copy the full SHA
    1756819 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming

    Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10621 from zsxwing/SPARK-12617-2.
    
    (cherry picked from commit 1e6648d)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Jan 6, 2016
    Configuration menu
    Copy the full SHA
    d821fae View commit details
    Browse the repository at this point in the history
  4. [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of defau…

    …lt root path to gain the streaming batch url.
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes #10617 from SaintBacchus/SPARK-12672.
    SaintBacchus authored and zsxwing committed Jan 6, 2016
    Configuration menu
    Copy the full SHA
    8f0ead3 View commit details
    Browse the repository at this point in the history
  5. Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead …

    …of default root path to gain the streaming batch url."
    
    This reverts commit 8f0ead3. Will merge #10618 instead.
    zsxwing committed Jan 6, 2016
    Configuration menu
    Copy the full SHA
    39b0a34 View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2016

  1. [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in…

    … pyspark
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-12016
    
    We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper when loading it in pyspark.
    
    Author: Liang-Chi Hsieh <viirya@appier.com>
    
    Closes #10100 from viirya/fix-load-py-wordvecmodel.
    
    (cherry picked from commit b51a4cd)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    viirya authored and jkbradley committed Jan 7, 2016
    Configuration menu
    Copy the full SHA
    11b901b View commit details
    Browse the repository at this point in the history
  2. [SPARK-12673][UI] Add missing uri prepending for job description

    Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot:
    
    ![screen shot 2016-01-06 at 5 28 26 pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png)
    
    Author: jerryshao <sshao@hortonworks.com>
    
    Closes #10618 from jerryshao/SPARK-12673.
    
    (cherry picked from commit 174e72c)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    jerryshao authored and zsxwing committed Jan 7, 2016
    Configuration menu
    Copy the full SHA
    94af69c View commit details
    Browse the repository at this point in the history
  3. [SPARK-12678][CORE] MapPartitionsRDD clearDependencies

    MapPartitionsRDD was keeping a reference to `prev` after a call to
    `clearDependencies` which could lead to memory leak.
    
    Author: Guillaume Poulin <poulin.guillaume@gmail.com>
    
    Closes #10623 from gpoulin/map_partition_deps.
    
    (cherry picked from commit b673852)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    gpoulin authored and rxin committed Jan 7, 2016
    Configuration menu
    Copy the full SHA
    d061b85 View commit details
    Browse the repository at this point in the history
  4. Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is …

    …not None"
    
    This reverts commit fcd013c.
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes #10632 from yhuai/pythonStyle.
    
    (cherry picked from commit e5cde7a)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    yhuai committed Jan 7, 2016
    Configuration menu
    Copy the full SHA
    34effc4 View commit details
    Browse the repository at this point in the history
  5. [DOC] fix 'spark.memory.offHeap.enabled' default value to false

    modify 'spark.memory.offHeap.enabled' default value to false
    
    Author: zzcclp <xm_zzc@sina.com>
    
    Closes #10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value.
    
    (cherry picked from commit 84e77a1)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    zzcclp authored and rxin committed Jan 7, 2016
    Configuration menu
    Copy the full SHA
    47a58c7 View commit details
    Browse the repository at this point in the history
  6. [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None

    If initial model passed to GMM is not empty it causes net.razorvine.pickle.PickleException. It can be fixed by converting initialModel.weights to list.
    
    Author: zero323 <matthew.szymkiewicz@gmail.com>
    
    Closes #10644 from zero323/SPARK-12006.
    
    (cherry picked from commit 592f649)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    zero323 authored and jkbradley committed Jan 7, 2016
    Configuration menu
    Copy the full SHA
    69a885a View commit details
    Browse the repository at this point in the history
  7. [SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overla…

    …pping splits
    
    https://issues.apache.org/jira/browse/SPARK-12662
    
    cc yhuai
    
    Author: Sameer Agarwal <sameer@databricks.com>
    
    Closes #10626 from sameeragarwal/randomsplit.
    
    (cherry picked from commit f194d99)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    sameeragarwal authored and rxin committed Jan 7, 2016
    Configuration menu
    Copy the full SHA
    017b73e View commit details
    Browse the repository at this point in the history
  8. [SPARK-12598][CORE] bug in setMinPartitions

    There is a bug in the calculation of ```maxSplitSize```.  The ```totalLen``` should be divided by ```minPartitions``` and not by ```files.size```.
    
    Author: Darek Blasiak <darek.blasiak@640labs.com>
    
    Closes #10546 from datafarmer/setminpartitionsbug.
    
    (cherry picked from commit 8346518)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    datafarmer authored and srowen committed Jan 7, 2016
    Configuration menu
    Copy the full SHA
    6ef8235 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2016

  1. [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and all…

    …owBatching configurations for Streaming
    
    /cc tdas brkyvz
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10453 from zsxwing/streaming-conf.
    
    (cherry picked from commit c94199e)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Jan 8, 2016
    Configuration menu
    Copy the full SHA
    a7c3636 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (…

    …branch 1.6)
    
    backport #10609 to branch 1.6
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10656 from zsxwing/SPARK-12591-branch-1.6.
    zsxwing authored and tdas committed Jan 8, 2016
    Configuration menu
    Copy the full SHA
    0d96c54 View commit details
    Browse the repository at this point in the history
  3. [DOCUMENTATION] doc fix of job scheduling

    spark.shuffle.service.enabled is spark application related configuration, it is not necessary to set it in yarn-site.xml
    
    Author: Jeff Zhang <zjffdu@apache.org>
    
    Closes #10657 from zjffdu/doc-fix.
    
    (cherry picked from commit 00d9261)
    Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
    zjffdu authored and Marcelo Vanzin committed Jan 8, 2016
    Configuration menu
    Copy the full SHA
    fe2cf34 View commit details
    Browse the repository at this point in the history
  4. fixed numVertices in transitive closure example

    Author: Udo Klein <git@blinkenlight.net>
    
    Closes #10642 from udoklein/patch-2.
    
    (cherry picked from commit 8c70cb4)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    udoklein authored and srowen committed Jan 8, 2016
    Configuration menu
    Copy the full SHA
    e4227cb View commit details
    Browse the repository at this point in the history
  5. [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…

    …s on secure Hadoop
    
    https://issues.apache.org/jira/browse/SPARK-12654
    
    So the bug here is that WholeTextFileRDD.getPartitions has:
    val conf = getConf
    in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext.
    The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works.
    
    Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
    
    Closes #10651 from tgravescs/SPARK-12654.
    
    (cherry picked from commit 553fd7b)
    Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
    tgravescs authored and Tom Graves committed Jan 8, 2016
    Configuration menu
    Copy the full SHA
    faf094c View commit details
    Browse the repository at this point in the history
  6. [SPARK-12696] Backport Dataset Bug fixes to 1.6

    We've fixed a lot of bugs in master, and since this is experimental in 1.6 we should consider back porting the fixes.  The only thing that is obviously risky to me is 0e07ed3, we might try to remove that.
    
    Author: Wenchen Fan <wenchen@databricks.com>
    Author: gatorsmile <gatorsmile@gmail.com>
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    Author: Cheng Lian <lian@databricks.com>
    Author: Nong Li <nong@databricks.com>
    
    Closes #10650 from marmbrus/dataset-backports.
    marmbrus committed Jan 8, 2016
    Configuration menu
    Copy the full SHA
    a619050 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2016

  1. [SPARK-12645][SPARKR] SparkR support hash function

    Add ```hash``` function for SparkR ```DataFrame```.
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #10597 from yanboliang/spark-12645.
    
    (cherry picked from commit 3d77cff)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    yanboliang authored and shivaram committed Jan 9, 2016
    Configuration menu
    Copy the full SHA
    8b5f230 View commit details
    Browse the repository at this point in the history

Commits on Jan 10, 2016

  1. [SPARK-10359][PROJECT-INFRA] Backport dev/test-dependencies script to…

    … branch-1.6
    
    This patch backports the `dev/test-dependencies` script (from #10461) to branch-1.6.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #10680 from JoshRosen/test-deps-16-backport.
    JoshRosen committed Jan 10, 2016
    Configuration menu
    Copy the full SHA
    7903b06 View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2016

  1. [SPARK-12734][BUILD] Backport Netty exclusion + Maven enforcer fixes …

    …to branch-1.6
    
    This patch backports the Netty exclusion fixes from #10672 to branch-1.6.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #10691 from JoshRosen/netty-exclude-16-backport.
    JoshRosen committed Jan 11, 2016
    Configuration menu
    Copy the full SHA
    43b72d8 View commit details
    Browse the repository at this point in the history
  2. removed lambda from sortByKey()

    According to the documentation the sortByKey method does not take a lambda as an argument, thus the example is flawed. Removed the argument completely as this will default to ascending sort.
    
    Author: Udo Klein <git@blinkenlight.net>
    
    Closes #10640 from udoklein/patch-1.
    
    (cherry picked from commit bd723bd)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    udoklein authored and srowen committed Jan 11, 2016
    Configuration menu
    Copy the full SHA
    d4cfd2a View commit details
    Browse the repository at this point in the history
  3. [STREAMING][MINOR] Typo fixes

    Author: Jacek Laskowski <jacek@japila.pl>
    
    Closes #10698 from jaceklaskowski/streaming-kafka-typo-fixes.
    
    (cherry picked from commit b313bad)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    jaceklaskowski authored and zsxwing committed Jan 11, 2016
    Configuration menu
    Copy the full SHA
    ce906b3 View commit details
    Browse the repository at this point in the history
  4. [SPARK-12734][HOTFIX] Build changes must trigger all tests; clean aft…

    …er install in dep tests
    
    This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script.
    
    First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed.
    
    I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests.
    
    /cc zsxwing
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #10704 from JoshRosen/fix-build-test-problems.
    
    (cherry picked from commit a449914)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Jan 11, 2016
    Configuration menu
    Copy the full SHA
    3b32aa9 View commit details
    Browse the repository at this point in the history
  5. [SPARK-12758][SQL] add note to Spark SQL Migration guide about Timest…

    …ampType casting
    
    Warning users about casting changes.
    
    Author: Brandon Bradley <bradleytastic@gmail.com>
    
    Closes #10708 from blbradley/spark-12758.
    
    (cherry picked from commit a767ee8)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    blbradley authored and marmbrus committed Jan 11, 2016
    Configuration menu
    Copy the full SHA
    dd2cf64 View commit details
    Browse the repository at this point in the history

Commits on Jan 12, 2016

  1. [SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancel

    https://issues.apache.org/jira/browse/SPARK-11823
    
    This test often hangs and times out, leaving hanging processes. Let's ignore it for now and improve the test.
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes #10715 from yhuai/SPARK-11823-ignore.
    
    (cherry picked from commit aaa2c3b)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    yhuai authored and JoshRosen committed Jan 12, 2016
    Configuration menu
    Copy the full SHA
    a6c9c68 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12638][API DOC] Parameter explanation not very accurate for rd…

    …d function "aggregate"
    
    Currently, RDD function aggregate's parameter doesn't explain well, especially parameter "zeroValue".
    It's helpful to let junior scala user know that "zeroValue" attend both "seqOp" and "combOp" phase.
    
    Author: Tommy YU <tummyyu@163.com>
    
    Closes #10587 from Wenpei/rdd_aggregate_doc.
    
    (cherry picked from commit 9f0995b)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    Wenpei authored and srowen committed Jan 12, 2016
    Configuration menu
    Copy the full SHA
    46fc7a1 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12582][TEST] IndexShuffleBlockResolverSuite fails in windows

    [SPARK-12582][Test] IndexShuffleBlockResolverSuite fails in windows
    
    * IndexShuffleBlockResolverSuite fails in windows due to file is not closed.
    * mv IndexShuffleBlockResolverSuite.scala from "test/java" to "test/scala".
    
    https://issues.apache.org/jira/browse/SPARK-12582
    
    Author: Yucai Yu <yucai.yu@intel.com>
    
    Closes #10526 from yucai/master.
    
    (cherry picked from commit 7e15044)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    Yucai Yu authored and srowen committed Jan 12, 2016
    Configuration menu
    Copy the full SHA
    3221a7d View commit details
    Browse the repository at this point in the history
  4. [SPARK-5273][MLLIB][DOCS] Improve documentation examples for LinearRe…

    …gression
    
    Use a much smaller step size in LinearRegressionWithSGD MLlib examples to achieve a reasonable RMSE.
    
    Our training folks hit this exact same issue when concocting an example and had the same solution.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #10675 from srowen/SPARK-5273.
    
    (cherry picked from commit 9c7f34a)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    srowen committed Jan 12, 2016
    Configuration menu
    Copy the full SHA
    4c67d55 View commit details
    Browse the repository at this point in the history
  5. [SPARK-7615][MLLIB] MLLIB Word2Vec wordVectors divided by Euclidean N…

    …orm equals to zero
    
    Cosine similarity with 0 vector should be 0
    
    Related to #10152
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #10696 from srowen/SPARK-7615.
    
    (cherry picked from commit c48f2a3)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    srowen committed Jan 12, 2016
    Configuration menu
    Copy the full SHA
    94b39f7 View commit details
    Browse the repository at this point in the history
  6. Revert "[SPARK-12645][SPARKR] SparkR support hash function"

    This reverts commit 8b5f230.
    yhuai committed Jan 12, 2016
    Configuration menu
    Copy the full SHA
    03e523e View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2016

  1. [HOT-FIX] bypass hive test when parse logical plan to json

    #10311 introduces some rare, non-deterministic flakiness for hive udf tests, see #10311 (comment)
    
    I can't reproduce it locally, and may need more time to investigate, a quick solution is: bypass hive tests for json serialization.
    
    Author: Wenchen Fan <wenchen@databricks.com>
    
    Closes #10430 from cloud-fan/hot-fix.
    
    (cherry picked from commit 8543997)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    cloud-fan authored and marmbrus committed Jan 13, 2016
    Configuration menu
    Copy the full SHA
    f71e5cc View commit details
    Browse the repository at this point in the history
  2. [SPARK-12558][SQL] AnalysisException when multiple functions applied …

    …in GROUP BY clause
    
    cloud-fan Can you please take a look ?
    
    In this case, we are failing during check analysis while validating the aggregation expression. I have added a semanticEquals for HiveGenericUDF to fix this. Please let me know if this is the right way to address this issue.
    
    Author: Dilip Biswal <dbiswal@us.ibm.com>
    
    Closes #10520 from dilipbiswal/spark-12558.
    
    (cherry picked from commit dc7b387)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    
    Conflicts:
    	sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
    dilipbiswal authored and yhuai committed Jan 13, 2016
    Configuration menu
    Copy the full SHA
    dcdc864 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12805][MESOS] Fixes documentation on Mesos run modes

    The default run has changed, but the documentation didn't fully reflect the change.
    
    Author: Luc Bourlier <luc.bourlier@typesafe.com>
    
    Closes #10740 from skyluc/issue/mesos-modes-doc.
    
    (cherry picked from commit cc91e21)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    Luc Bourlier authored and rxin committed Jan 13, 2016
    Configuration menu
    Copy the full SHA
    f9ecd3a View commit details
    Browse the repository at this point in the history
  4. [SPARK-12685][MLLIB][BACKPORT TO 1.4] word2vec trainWordsCount gets o…

    …verflow
    
    jira: https://issues.apache.org/jira/browse/SPARK-12685
    
    master PR: #10627
    
    the log of word2vec reports
    trainWordsCount = -785727483
    during computation over a large dataset.
    
    Update the priority as it will affect the computation process.
    alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))
    
    Author: Yuhao Yang <hhbyyh@gmail.com>
    
    Closes #10721 from hhbyyh/branch-1.4.
    
    (cherry picked from commit 7bd2564)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    hhbyyh authored and jkbradley committed Jan 13, 2016
    Configuration menu
    Copy the full SHA
    364f799 View commit details
    Browse the repository at this point in the history
  5. [SPARK-12268][PYSPARK] Make pyspark shell pythonstartup work under py…

    …thon3
    
    This replaces the `execfile` used for running custom python shell scripts
    with explicit open, compile and exec (as recommended by 2to3). The reason
    for this change is to make the pythonstartup option compatible with python3.
    
    Author: Erik Selin <erik.selin@gmail.com>
    
    Closes #10255 from tyro89/pythonstartup-python3.
    
    (cherry picked from commit e4e0b3f)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    erikselin authored and JoshRosen committed Jan 13, 2016
    Configuration menu
    Copy the full SHA
    cf6d506 View commit details
    Browse the repository at this point in the history
  6. [SPARK-12690][CORE] Fix NPE in UnsafeInMemorySorter.free()

    I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it.
    
    ```
    ERROR spark.TaskContextImpl: Error in TaskCompletionListener
    java.lang.NullPointerException
            at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110)
            at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288)
            at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141)
            at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
            at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
            at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
            at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
            at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
            at org.apache.spark.scheduler.Task.run(Task.scala:91)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
            at java.lang.Thread.run(Thread.java:722)
    ```
    
    Author: Carson Wang <carson.wang@intel.com>
    
    Closes #10637 from carsonwang/FixNPE.
    
    (cherry picked from commit eabc7b8)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    carsonwang authored and JoshRosen committed Jan 13, 2016
    Configuration menu
    Copy the full SHA
    26f13fa View commit details
    Browse the repository at this point in the history

Commits on Jan 14, 2016

  1. [SPARK-12026][MLLIB] ChiSqTest gets slower and slower over time when …

    …number of features is large
    
    jira: https://issues.apache.org/jira/browse/SPARK-12026
    
    The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger.
    
    I tested on local and the change can improve the performance and the running time was stable.
    
    Author: Yuhao Yang <hhbyyh@gmail.com>
    
    Closes #10146 from hhbyyh/chiSq.
    
    (cherry picked from commit 021dafc)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    hhbyyh authored and jkbradley committed Jan 14, 2016
    Configuration menu
    Copy the full SHA
    a490787 View commit details
    Browse the repository at this point in the history
  2. [SPARK-9844][CORE] File appender race condition during shutdown

    When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed.  Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop.  This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception.  Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur.  Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set.
    
    Author: Bryan Cutler <cutlerb@gmail.com>
    
    Closes #10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844.
    
    (cherry picked from commit 56cdbd6)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    BryanCutler authored and srowen committed Jan 14, 2016
    Configuration menu
    Copy the full SHA
    0c67993 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12784][UI] Fix Spark UI IndexOutOfBoundsException with dynamic…

    … allocation
    
    Add `listener.synchronized` to get `storageStatusList` and `execInfo` atomically.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10728 from zsxwing/SPARK-12784.
    
    (cherry picked from commit 501e99e)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Jan 14, 2016
    Configuration menu
    Copy the full SHA
    d1855ad View commit details
    Browse the repository at this point in the history

Commits on Jan 15, 2016

  1. [SPARK-12708][UI] Sorting task error in Stages Page when yarn mode.

    If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message.
    
    ![spark-12708](https://cloud.githubusercontent.com/assets/6679275/12193320/80814f8c-b62a-11e5-9914-7bf3907029df.png)
    
    It's similar to SPARK-4313 .
    
    Author: root <root@R520T1.(none)>
    Author: Koyo Yoshida <koyo0615@gmail.com>
    
    Closes #10663 from yoshidakuy/SPARK-12708.
    
    (cherry picked from commit 32cca93)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    yoshidakuy authored and sarutak committed Jan 15, 2016
    Configuration menu
    Copy the full SHA
    d23e57d View commit details
    Browse the repository at this point in the history
  2. [SPARK-11031][SPARKR] Method str() on a DataFrame

    Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>
    Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
    Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
    Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>
    
    Closes #9613 from olarayej/SPARK-11031.
    
    (cherry picked from commit ba4a641)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    Oscar D. Lara Yejas authored and shivaram committed Jan 15, 2016
    Configuration menu
    Copy the full SHA
    5a00528 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12701][CORE] FileAppender should use join to ensure writing th…

    …read completion
    
    Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning.
    
    Author: Bryan Cutler <cutlerb@gmail.com>
    
    Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701.
    
    (cherry picked from commit ea104b8)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    BryanCutler authored and srowen committed Jan 15, 2016
    Configuration menu
    Copy the full SHA
    7733668 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2016

  1. [SPARK-12722][DOCS] Fixed typo in Pipeline example

    http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline
    ```
    val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model")
    ```
    should be
    ```
    val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model")
    ```
    cc: jkbradley
    
    Author: Jeff Lam <sha0lin@alumni.carnegiemellon.edu>
    
    Closes #10769 from Agent007/SPARK-12722.
    
    (cherry picked from commit 86972fa)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    Jeff Lam authored and srowen committed Jan 16, 2016
    Configuration menu
    Copy the full SHA
    5803fce View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2016

  1. [SPARK-12558][FOLLOW-UP] AnalysisException when multiple functions ap…

    …plied in GROUP BY clause
    
    Addresses the comments from Yin.
    #10520
    
    Author: Dilip Biswal <dbiswal@us.ibm.com>
    
    Closes #10758 from dilipbiswal/spark-12558-followup.
    
    (cherry picked from commit db9a860)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    
    Conflicts:
    	sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
    dilipbiswal authored and yhuai committed Jan 18, 2016
    Configuration menu
    Copy the full SHA
    53184ce View commit details
    Browse the repository at this point in the history
  2. [SPARK-12346][ML] Missing attribute names in GLM for vector-type feat…

    …ures
    
    Currently `summary()` fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing `VectorAssembler` to make up suitable names when inputs are missing names.
    
    cc mengxr
    
    Author: Eric Liang <ekl@databricks.com>
    
    Closes #10323 from ericl/spark-12346.
    
    (cherry picked from commit 5e492e9)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    ericl authored and mengxr committed Jan 18, 2016
    Configuration menu
    Copy the full SHA
    8c2b67f View commit details
    Browse the repository at this point in the history
  3. [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume i…

    …ntegration doc
    
    This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10746 from zsxwing/flume-doc.
    
    (cherry picked from commit a973f48)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Jan 18, 2016
    Configuration menu
    Copy the full SHA
    7482c7b View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2016

  1. [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis…

    … integration doc
    
    This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10822 from zsxwing/kinesis-doc.
    
    (cherry picked from commit 721845c)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Jan 19, 2016
    Configuration menu
    Copy the full SHA
    d43704d View commit details
    Browse the repository at this point in the history
  2. [SPARK-12841][SQL][BRANCH-1.6] fix cast in filter

    In SPARK-10743 we wrap cast with `UnresolvedAlias` to give `Cast` a better alias if possible. However, for cases like filter, the `UnresolvedAlias` can't be resolved and actually we don't need a better alias for this case. This PR move the cast wrapping logic to `Column.named` so that we will only do it when we need a alias name.
    
    backport #10781 to 1.6
    
    Author: Wenchen Fan <wenchen@databricks.com>
    
    Closes #10819 from cloud-fan/bug.
    cloud-fan authored and yhuai committed Jan 19, 2016
    Configuration menu
    Copy the full SHA
    68265ac View commit details
    Browse the repository at this point in the history
  3. [SQL][MINOR] Fix one little mismatched comment according to the codes…

    … in interface.scala
    
    Author: proflin <proflin.me@gmail.com>
    
    Closes #10824 from proflin/master.
    
    (cherry picked from commit c00744e)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    lw-lin authored and rxin committed Jan 19, 2016
    Configuration menu
    Copy the full SHA
    30f55e5 View commit details
    Browse the repository at this point in the history
  4. [MLLIB] Fix CholeskyDecomposition assertion's message

    Change assertion's message so it's consistent with the code. The old message says that the invoked method was lapack.dports, where in fact it was lapack.dppsv method.
    
    Author: Wojciech Jurczyk <wojtek.jurczyk@gmail.com>
    
    Closes #10818 from wjur/wjur/rename_error_message.
    
    (cherry picked from commit ebd9ce0)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    wjur authored and srowen committed Jan 19, 2016
    Configuration menu
    Copy the full SHA
    962e618 View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2016

  1. [SPARK-12921] Use SparkHadoopUtil reflection in SpecificParquetRecord…

    …ReaderBase
    
    It looks like there's one place left in the codebase, SpecificParquetRecordReaderBase, where we didn't use SparkHadoopUtil's reflective accesses of TaskAttemptContext methods, which could create problems when using a single Spark artifact with both Hadoop 1.x and 2.x.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #10843 from JoshRosen/SPARK-12921.
    JoshRosen committed Jan 21, 2016
    Configuration menu
    Copy the full SHA
    40fa218 View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2016

  1. [SPARK-12747][SQL] Use correct type name for Postgres JDBC's real array

    https://issues.apache.org/jira/browse/SPARK-12747
    
    Postgres JDBC driver uses "FLOAT4" or "FLOAT8" not "real".
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes #10695 from viirya/fix-postgres-jdbc.
    
    (cherry picked from commit 55c7dd0)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    viirya authored and rxin committed Jan 22, 2016
    Configuration menu
    Copy the full SHA
    b5d7dbe View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2016

  1. [SPARK-12859][STREAMING][WEB UI] Names of input streams with receiver…

    …s don't fit in Streaming page
    
    Added CSS style to force names of input streams with receivers to wrap
    
    Author: Alex Bozarth <ajbozart@us.ibm.com>
    
    Closes #10873 from ajbozarth/spark12859.
    
    (cherry picked from commit 358a33b)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    ajbozarth authored and sarutak committed Jan 23, 2016
    Configuration menu
    Copy the full SHA
    dca238a View commit details
    Browse the repository at this point in the history
  2. [SPARK-12760][DOCS] invalid lambda expression in python example for …

    …local vs cluster
    
    srowen thanks for the PR at #10866! sorry it took me a while.
    
    This is related to #10866, basically the assignment in the lambda expression in the python example is actually invalid
    
    ```
    In [1]: data = [1, 2, 3, 4, 5]
    In [2]: counter = 0
    In [3]: rdd = sc.parallelize(data)
    In [4]: rdd.foreach(lambda x: counter += x)
      File "<ipython-input-4-fcb86c182bad>", line 1
        rdd.foreach(lambda x: counter += x)
                                       ^
    SyntaxError: invalid syntax
    ```
    
    Author: Mortada Mehyar <mortada.mehyar@gmail.com>
    
    Closes #10867 from mortada/doc_python_fix.
    
    (cherry picked from commit 56f57f8)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    mortada authored and srowen committed Jan 23, 2016
    Configuration menu
    Copy the full SHA
    e8ae242 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12760][DOCS] inaccurate description for difference between loc…

    …al vs cluster mode in closure handling
    
    Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #10866 from srowen/SPARK-12760.
    
    (cherry picked from commit aca2a01)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    srowen committed Jan 23, 2016
    Configuration menu
    Copy the full SHA
    f13a3d1 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2016

  1. [SPARK-12120][PYSPARK] Improve exception message when failing to init…

    …ialize HiveContext in PySpark
    
    davies Mind to review ?
    
    This is the error message after this PR
    
    ```
    15/12/03 16:59:53 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    /Users/jzhang/github/spark/python/pyspark/sql/context.py:689: UserWarning: You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
      warnings.warn("You must build Spark with Hive. "
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 663, in read
        return DataFrameReader(self)
      File "/Users/jzhang/github/spark/python/pyspark/sql/readwriter.py", line 56, in __init__
        self._jreader = sqlContext._ssql_ctx.read()
      File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 692, in _ssql_ctx
        raise e
    py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
    : java.lang.RuntimeException: java.net.ConnectException: Call From jzhangMBPr.local/127.0.0.1 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    	at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
    	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
    	at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
    	at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
    	at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
    	at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
    	at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
    	at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
    	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
    	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    	at py4j.Gateway.invoke(Gateway.java:214)
    	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    	at py4j.GatewayConnection.run(GatewayConnection.java:209)
    	at java.lang.Thread.run(Thread.java:745)
    ```
    
    Author: Jeff Zhang <zjffdu@apache.org>
    
    Closes #10126 from zjffdu/SPARK-12120.
    
    (cherry picked from commit e789b1d)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    zjffdu authored and JoshRosen committed Jan 24, 2016
    Configuration menu
    Copy the full SHA
    f913f7e View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2016

  1. [SPARK-12624][PYSPARK] Checks row length when converting Java arrays …

    …to Python rows
    
    When actual row length doesn't conform to specified schema field length, we should give a better error message instead of throwing an unintuitive `ArrayOutOfBoundsException`.
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #10886 from liancheng/spark-12624.
    
    (cherry picked from commit 3327fd2)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    liancheng authored and yhuai committed Jan 25, 2016
    Configuration menu
    Copy the full SHA
    88614dd View commit details
    Browse the repository at this point in the history
  2. [SPARK-12932][JAVA API] improved error message for java type inferenc…

    …e failure
    
    Author: Andy Grove <andygrove73@gmail.com>
    
    Closes #10865 from andygrove/SPARK-12932.
    
    (cherry picked from commit d8e4805)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    andygrove authored and srowen committed Jan 25, 2016
    Configuration menu
    Copy the full SHA
    88114d3 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12755][CORE] Stop the event logger before the DAG scheduler

    [SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped.
    
    This contribution is my original work, and I license this work to the Spark project under the project's open source license.
    
    Author: Michael Allman <michael@videoamp.com>
    
    Closes #10700 from mallman/stop_event_logger_first.
    
    (cherry picked from commit 4ee8191)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    Michael Allman authored and srowen committed Jan 25, 2016
    Configuration menu
    Copy the full SHA
    b40e58c View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2016

  1. [SPARK-12961][CORE] Prevent snappy-java memory leak

    JIRA: https://issues.apache.org/jira/browse/SPARK-12961
    
    To prevent memory leak in snappy-java, just call the method once and cache the result. After the library releases new version, we can remove this object.
    
    JoshRosen
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes #10875 from viirya/prevent-snappy-memory-leak.
    
    (cherry picked from commit 5936bf9)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    viirya authored and srowen committed Jan 26, 2016
    Configuration menu
    Copy the full SHA
    572bc39 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12682][SQL] Add support for (optionally) not storing tables in…

    … hive metadata format
    
    This PR adds a new table option (`skip_hive_metadata`) that'd allow the user to skip storing the table metadata in hive metadata format. While this could be useful in general, the specific use-case for this change is that Hive doesn't handle wide schemas well (see https://issues.apache.org/jira/browse/SPARK-12682 and https://issues.apache.org/jira/browse/SPARK-6024) which in turn prevents such tables from being queried in SparkSQL.
    
    Author: Sameer Agarwal <sameer@databricks.com>
    
    Closes #10826 from sameeragarwal/skip-hive-metadata.
    
    (cherry picked from commit 08c781c)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    sameeragarwal authored and yhuai committed Jan 26, 2016
    Configuration menu
    Copy the full SHA
    f0c98a6 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12682][SQL][HOT-FIX] Fix test compilation

    Author: Yin Huai <yhuai@databricks.com>
    
    Closes #10925 from yhuai/branch-1.6-hot-fix.
    yhuai committed Jan 26, 2016
    Configuration menu
    Copy the full SHA
    6ce3dd9 View commit details
    Browse the repository at this point in the history
  4. [SPARK-12611][SQL][PYSPARK][TESTS] Fix test_infer_schema_to_local

    Previously (when the PR was first created) not specifying b= explicitly was fine (and treated as default null) - instead be explicit about b being None in the test.
    
    Author: Holden Karau <holden@us.ibm.com>
    
    Closes #10564 from holdenk/SPARK-12611-fix-test-infer-schema-local.
    
    (cherry picked from commit 13dab9c)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    holdenk authored and yhuai committed Jan 26, 2016
    Configuration menu
    Copy the full SHA
    85518ed View commit details
    Browse the repository at this point in the history

Commits on Jan 27, 2016

  1. [SPARK-12834][ML][PYTHON][BACKPORT] Change ser/de of JavaArray and Ja…

    …vaList
    
    Backport of SPARK-12834 for branch-1.6
    
    Original PR: #10772
    
    Original commit message:
    We use `SerDe.dumps()` to serialize `JavaArray` and `JavaList` in `PythonMLLibAPI`, then deserialize them with `PickleSerializer` in Python side. However, there is no need to transform them in such an inefficient way. Instead of it, we can use type conversion to convert them, e.g. `list(JavaArray)` or `list(JavaList)`. What's more, there is an issue to Ser/De Scala Array as I said in https://issues.apache.org/jira/browse/SPARK-12780
    
    Author: Xusen Yin <yinxusen@gmail.com>
    
    Closes #10941 from jkbradley/yinxusen-SPARK-12834-1.6.
    yinxusen authored and jkbradley committed Jan 27, 2016
    Configuration menu
    Copy the full SHA
    17d1071 View commit details
    Browse the repository at this point in the history
  2. [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata w…

    …ith `None` triggers cryptic failure
    
    The error message is now changed from "Do not support type class scala.Tuple2." to "Do not support type class org.json4s.JsonAST$JNull$" to be more informative about what is not supported. Also, StructType metadata now handles JNull correctly, i.e., {'a': None}. test_metadata_null is added to tests.py to show the fix works.
    
    Author: Jason Lee <cjlee@us.ibm.com>
    
    Closes #8969 from jasoncl/SPARK-10847.
    
    (cherry picked from commit edd4737)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    jasoncl authored and yhuai committed Jan 27, 2016
    Configuration menu
    Copy the full SHA
    96e32db View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2016

  1. [SPARK-13082][PYSPARK] Backport the fix of 'read.json(rdd)' in #10559

    …to branch-1.6
    
    SPARK-13082 actually fixed by  #10559. However, it's a big PR and not backported to 1.6. This PR just backported the fix of 'read.json(rdd)' to branch-1.6.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #10988 from zsxwing/json-rdd.
    zsxwing committed Jan 29, 2016
    Configuration menu
    Copy the full SHA
    84dab72 View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2016

  1. [SPARK-13088] Fix DAG viz in latest version of chrome

    Apparently chrome removed `SVGElement.prototype.getTransformToElement`, which is used by our JS library dagre-d3 when creating edges. The real diff can be found here: andrewor14/dagre-d3@7d6c000, which is taken from the fix in the main repo: dagrejs/dagre-d3@1ef067f
    
    Upstream issue: dagrejs/dagre-d3#202
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #10986 from andrewor14/fix-dag-viz.
    
    (cherry picked from commit 70e69fc)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Andrew Or committed Jan 30, 2016
    Configuration menu
    Copy the full SHA
    bb01cbe View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2016

  1. [SPARK-12231][SQL] create a combineFilters' projection when we call b…

    …uildPartitionedTableScan
    
    Hello Michael & All:
    
    We have some issues to submit the new codes in the other PR(#10299), so we closed that PR and open this one with the fix.
    
    The reason for the previous failure is that the projection for the scan when there is a filter that is not pushed down (the "left-over" filter) could be different, in elements or ordering, from the original projection.
    
    With this new codes, the approach to solve this problem is:
    
    Insert a new Project if the "left-over" filter is nonempty and (the original projection is not empty and the projection for the scan has more than one elements which could otherwise cause different ordering in projection).
    
    We create 3 test cases to cover the otherwise failure cases.
    
    Author: Kevin Yu <qyu@us.ibm.com>
    
    Closes #10388 from kevinyu98/spark-12231.
    
    (cherry picked from commit fd50df4)
    Signed-off-by: Cheng Lian <lian@databricks.com>
    kevinyu98 authored and liancheng committed Feb 1, 2016
    Configuration menu
    Copy the full SHA
    ddb9633 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions

    JIRA: https://issues.apache.org/jira/browse/SPARK-12989
    
    In the rule `ExtractWindowExpressions`, we simply replace alias by the corresponding attribute. However, this will cause an issue exposed by the following case:
    
    ```scala
    val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", "num")
      .withColumn("Data", struct("A", "B", "C"))
      .drop("A")
      .drop("B")
      .drop("C")
    
    val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc)
    data.select($"*", max("num").over(winSpec) as "max").explain(true)
    ```
    In this case, both `Data.A` and `Data.B` are `alias` in `WindowSpecDefinition`. If we replace these alias expression by their alias names, we are unable to know what they are since they will not be put in `missingExpr` too.
    
    Author: gatorsmile <gatorsmile@gmail.com>
    Author: xiaoli <lixiao1983@gmail.com>
    Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
    
    Closes #10963 from gatorsmile/seletStarAfterColDrop.
    
    (cherry picked from commit 33c8a49)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    gatorsmile authored and marmbrus committed Feb 1, 2016
    Configuration menu
    Copy the full SHA
    9a5b25d View commit details
    Browse the repository at this point in the history
  3. [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md

    ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds.
    
    Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
    
    Closes #10901 from maropu/DocFix.
    
    (cherry picked from commit da9146c)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    maropu authored and marmbrus committed Feb 1, 2016
    Configuration menu
    Copy the full SHA
    215d5d8 View commit details
    Browse the repository at this point in the history
  4. [SPARK-11780][SQL] Add catalyst type aliases backwards compatibility

    Changed a target at branch-1.6 from #10635.
    
    Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
    
    Closes #10915 from maropu/pr9935-v3.
    maropu authored and marmbrus committed Feb 1, 2016
    Configuration menu
    Copy the full SHA
    70fcbf6 View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2016

  1. [SPARK-13087][SQL] Fix group by function for sort based aggregation

    It is not valid to call `toAttribute` on a `NamedExpression` unless we know for sure that the child produced that `NamedExpression`.  The current code worked fine when the grouping expressions were simple, but when they were a derived value this blew up at execution time.
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #11011 from marmbrus/groupByFunction.
    marmbrus authored and yhuai committed Feb 2, 2016
    Configuration menu
    Copy the full SHA
    bd8efba View commit details
    Browse the repository at this point in the history
  2. [SPARK-13094][SQL] Add encoders for seq/array of primitives

    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #11014 from marmbrus/seqEncoders.
    
    (cherry picked from commit 29d9218)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    marmbrus committed Feb 2, 2016
    Configuration menu
    Copy the full SHA
    99594b2 View commit details
    Browse the repository at this point in the history
  3. [SPARK-12780][ML][PYTHON][BACKPORT] Inconsistency returning value of …

    …ML python models' properties
    
    Backport of [SPARK-12780] for branch-1.6
    
    Original PR for master: #10724
    
    This fixes StringIndexerModel.labels in pyspark.
    
    Author: Xusen Yin <yinxusen@gmail.com>
    
    Closes #10950 from jkbradley/yinxusen-spark-12780-backport.
    yinxusen authored and jkbradley committed Feb 2, 2016
    Configuration menu
    Copy the full SHA
    9a3d1bd View commit details
    Browse the repository at this point in the history
  4. [SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method

    I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629
    Please, let me know what do you think.
    Thanks!
    
    Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
    
    Closes #10580 from NarineK/sparkrSavaAsRable.
    
    (cherry picked from commit 8a88e12)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    NarineK authored and shivaram committed Feb 2, 2016
    Configuration menu
    Copy the full SHA
    53f518a View commit details
    Browse the repository at this point in the history
  5. [SPARK-13121][STREAMING] java mapWithState mishandles scala Option

    java mapwithstate with Function3 has wrong conversion of java `Optional` to scala `Option`, fixed code uses same conversion used in the mapwithstate call that uses Function4 as an input. `Optional.fromNullable(v.get)` fails if v is `None`, better to use `JavaUtils.optionToOptional(v)` instead.
    
    Author: Gabriele Nizzoli <mail@nizzoli.net>
    
    Closes #11007 from gabrielenizzoli/branch-1.6.
    gabrielenizzoli authored and zsxwing committed Feb 2, 2016
    Configuration menu
    Copy the full SHA
    4c28b4c View commit details
    Browse the repository at this point in the history
  6. [SPARK-12711][ML] ML StopWordsRemover does not protect itself from co…

    …lumn name duplication
    
    Fixes problem and verifies fix by test suite.
    Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn
    and deduplicates SchemaUtils.appendColumn functions.
    
    Author: Grzegorz Chilkiewicz <grzegorz.chilkiewicz@codilime.com>
    
    Closes #10741 from grzegorz-chilkiewicz/master.
    
    (cherry picked from commit b1835d7)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    grzegorz-chilkiewicz authored and jkbradley committed Feb 2, 2016
    Configuration menu
    Copy the full SHA
    9c0cf22 View commit details
    Browse the repository at this point in the history
  7. [SPARK-13056][SQL] map column would throw NPE if value is null

    Jira:
    https://issues.apache.org/jira/browse/SPARK-13056
    
    Create a map like
    { "a": "somestring", "b": null}
    Query like
    SELECT col["b"] FROM t1;
    NPE would be thrown.
    
    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes #10964 from adrian-wang/npewriter.
    
    (cherry picked from commit 358300c)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    
    Conflicts:
    	sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
    adrian-wang authored and marmbrus committed Feb 2, 2016
    Configuration menu
    Copy the full SHA
    3c92333 View commit details
    Browse the repository at this point in the history
  8. [DOCS] Update StructType.scala

    The example will throw error like
    <console>:20: error: not found: value StructType
    
    Need to add this line:
    import org.apache.spark.sql.types._
    
    Author: Kevin (Sangwoo) Kim <sangwookim.me@gmail.com>
    
    Closes #10141 from swkimme/patch-1.
    
    (cherry picked from commit b377b03)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    swkimme authored and marmbrus committed Feb 2, 2016
    Configuration menu
    Copy the full SHA
    e81333b View commit details
    Browse the repository at this point in the history

Commits on Feb 3, 2016

  1. [SPARK-13122] Fix race condition in MemoryStore.unrollSafely()

    https://issues.apache.org/jira/browse/SPARK-13122
    
    A race condition can occur in MemoryStore's unrollSafely() method if two threads that
    return the same value for currentTaskAttemptId() execute this method concurrently. This
    change makes the operation of reading the initial amount of unroll memory used, performing
    the unroll, and updating the associated memory maps atomic in order to avoid this race
    condition.
    
    Initial proposed fix wraps all of unrollSafely() in a memoryManager.synchronized { } block. A cleaner approach might be introduce a mechanism that synchronizes based on task attempt ID. An alternative option might be to track unroll/pending unroll memory based on block ID rather than task attempt ID.
    
    Author: Adam Budde <budde@amazon.com>
    
    Closes #11012 from budde/master.
    
    (cherry picked from commit ff71261)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
    Adam Budde authored and Andrew Or committed Feb 3, 2016
    Configuration menu
    Copy the full SHA
    2f8abb4 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12739][STREAMING] Details of batch in Streaming tab uses two D…

    …uration columns
    
    I have clearly prefix the two 'Duration' columns in 'Details of Batch' Streaming tab as 'Output Op Duration' and 'Job Duration'
    
    Author: Mario Briggs <mario.briggs@in.ibm.com>
    Author: mariobriggs <mariobriggs@in.ibm.com>
    
    Closes #11022 from mariobriggs/spark-12739.
    
    (cherry picked from commit e9eb248)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    mariobriggs authored and zsxwing committed Feb 3, 2016
    Configuration menu
    Copy the full SHA
    5fe8796 View commit details
    Browse the repository at this point in the history

Commits on Feb 4, 2016

  1. [SPARK-13101][SQL][BRANCH-1.6] nullability of array type element shou…

    …ld not fail analysis of encoder
    
    nullability should only be considered as an optimization rather than part of the type system, so instead of failing analysis for mismatch nullability, we should pass analysis and add runtime null check.
    
    backport #11035 to 1.6
    
    Author: Wenchen Fan <wenchen@databricks.com>
    
    Closes #11042 from cloud-fan/branch-1.6.
    cloud-fan authored and marmbrus committed Feb 4, 2016
    Configuration menu
    Copy the full SHA
    cdfb2a1 View commit details
    Browse the repository at this point in the history
  2. [ML][DOC] fix wrong api link in ml onevsrest

    minor fix for api link in ml onevsrest
    
    Author: Yuhao Yang <hhbyyh@gmail.com>
    
    Closes #11068 from hhbyyh/onevsrestDoc.
    
    (cherry picked from commit c2c956b)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    hhbyyh authored and mengxr committed Feb 4, 2016
    Configuration menu
    Copy the full SHA
    2f390d3 View commit details
    Browse the repository at this point in the history
  3. [SPARK-13195][STREAMING] Fix NoSuchElementException when a state is n…

    …ot set but timeoutThreshold is defined
    
    Check the state Existence before calling get.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #11081 from zsxwing/SPARK-13195.
    
    (cherry picked from commit 8e2f296)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Feb 4, 2016
    Configuration menu
    Copy the full SHA
    a907c7c View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2016

  1. [SPARK-13214][DOCS] update dynamicAllocation documentation

    Author: Bill Chambers <bill@databricks.com>
    
    Closes #11094 from anabranch/dynamic-docs.
    
    (cherry picked from commit 66e1383)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Bill Chambers authored and Andrew Or committed Feb 5, 2016
    Configuration menu
    Copy the full SHA
    3ca5dc3 View commit details
    Browse the repository at this point in the history

Commits on Feb 8, 2016

  1. [SPARK-13210][SQL] catch OOM when allocate memory and expand array

    There is a bug when we try to grow the buffer, OOM is ignore wrongly (the assert also skipped by JVM), then we try grow the array again, this one will trigger spilling free the current page, the current record we inserted will be invalid.
    
    The root cause is that JVM has less free memory than MemoryManager thought, it will OOM when allocate a page without trigger spilling. We should catch the OOM, and acquire memory again to trigger spilling.
    
    And also, we could not grow the array in `insertRecord` of `InMemorySorter` (it was there just for easy testing).
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #11095 from davies/fix_expand.
    Davies Liu authored and JoshRosen committed Feb 8, 2016
    Configuration menu
    Copy the full SHA
    9b30096 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2016

  1. [SPARK-12807][YARN] Spark External Shuffle not working in Hadoop clus…

    …ters with Jackson 2.2.3
    
    Patch to
    
    1. Shade jackson 2.x in spark-yarn-shuffle JAR: core, databind, annotation
    2. Use maven antrun to verify the JAR has the renamed classes
    
    Being Maven-based, I don't know if the verification phase kicks in on an SBT/jenkins build. It will on a `mvn install`
    
    Author: Steve Loughran <stevel@hortonworks.com>
    
    Closes #10780 from steveloughran/stevel/patches/SPARK-12807-master-shuffle.
    
    (cherry picked from commit 34d0b70)
    Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
    steveloughran authored and Marcelo Vanzin committed Feb 9, 2016
    Configuration menu
    Copy the full SHA
    82fa864 View commit details
    Browse the repository at this point in the history

Commits on Feb 10, 2016

  1. [SPARK-10524][ML] Use the soft prediction to order categories' bins

    JIRA: https://issues.apache.org/jira/browse/SPARK-10524
    
    Currently we use the hard prediction (`ImpurityCalculator.predict`) to order categories' bins. But we should use the soft prediction.
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    Author: Liang-Chi Hsieh <viirya@appier.com>
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes #8734 from viirya/dt-soft-centroids.
    
    (cherry picked from commit 9267bc6)
    Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
    viirya authored and jkbradley committed Feb 10, 2016
    Configuration menu
    Copy the full SHA
    89818cb View commit details
    Browse the repository at this point in the history
  2. [SPARK-12921] Fix another non-reflective TaskAttemptContext access in…

    … SpecificParquetRecordReaderBase
    
    This is a minor followup to #10843 to fix one remaining place where we forgot to use reflective access of TaskAttemptContext methods.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #11131 from JoshRosen/SPARK-12921-take-2.
    JoshRosen committed Feb 10, 2016
    Configuration menu
    Copy the full SHA
    93f1d91 View commit details
    Browse the repository at this point in the history

Commits on Feb 11, 2016

  1. [SPARK-13274] Fix Aggregator Links on GroupedDataset Scala API

    Update Aggregator links to point to #org.apache.spark.sql.expressions.Aggregator
    
    Author: raela <raela@databricks.com>
    
    Closes #11158 from raelawang/master.
    
    (cherry picked from commit 719973b)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    raelawang authored and rxin committed Feb 11, 2016
    Configuration menu
    Copy the full SHA
    b57fac5 View commit details
    Browse the repository at this point in the history
  2. [SPARK-13265][ML] Refactoring of basic ML import/export for other fil…

    …e system besides HDFS
    
    jkbradley I tried to improve the function to export a model. When I tried to export a model to S3 under Spark 1.6, we couldn't do that. So, it should offer S3 besides HDFS. Can you review it when you have time? Thanks!
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #11151 from yu-iskw/SPARK-13265.
    
    (cherry picked from commit efb65e0)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    yu-iskw authored and mengxr committed Feb 11, 2016
    Configuration menu
    Copy the full SHA
    91a5ca5 View commit details
    Browse the repository at this point in the history

Commits on Feb 12, 2016

  1. [SPARK-13047][PYSPARK][ML] Pyspark Params.hasParam should not throw a…

    …n error
    
    Pyspark Params class has a method `hasParam(paramName)` which returns `True` if the class has a parameter by that name, but throws an `AttributeError` otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of `hasParam` or add an additional method with this functionality.
    
    In Python:
    ```python
    from pyspark.ml.classification import NaiveBayes
    nb = NaiveBayes()
    print nb.hasParam("smoothing")
    print nb.hasParam("notAParam")
    ```
    produces:
    > True
    > AttributeError: 'NaiveBayes' object has no attribute 'notAParam'
    
    However, in Scala:
    ```scala
    import org.apache.spark.ml.classification.NaiveBayes
    val nb  = new NaiveBayes()
    nb.hasParam("smoothing")
    nb.hasParam("notAParam")
    ```
    produces:
    > true
    > false
    
    cc holdenk
    
    Author: sethah <seth.hendrickson16@gmail.com>
    
    Closes #10962 from sethah/SPARK-13047.
    
    (cherry picked from commit b354673)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    sethah authored and mengxr committed Feb 12, 2016
    Configuration menu
    Copy the full SHA
    9d45ec4 View commit details
    Browse the repository at this point in the history
  2. [SPARK-13153][PYSPARK] ML persistence failed when handle no default v…

    …alue parameter
    
    Fix this defect by check default value exist or not.
    
    yanboliang Please help to review.
    
    Author: Tommy YU <tummyyu@163.com>
    
    Closes #11043 from Wenpei/spark-13153-handle-param-withnodefaultvalue.
    
    (cherry picked from commit d3e2e20)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    Wenpei authored and mengxr committed Feb 12, 2016
    Configuration menu
    Copy the full SHA
    18661a2 View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2016

  1. [SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft…

    … Windows
    
    Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK.
    
    Is it worth considering also including this fix in any future 1.5.x releases (if any)?
    
    I confirm this is my own original work and license it to the Spark project under its open source license.
    
    Author: markpavey <mark.pavey@thefilter.com>
    
    Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix.
    
    (cherry picked from commit 374c4b2)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    markpavey authored and srowen committed Feb 13, 2016
    Configuration menu
    Copy the full SHA
    93a55f3 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering f…

    …ailed test
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-12363
    
    This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work.
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #10539 from viirya/fix-poweriter.
    
    (cherry picked from commit e3441e3)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    viirya authored and mengxr committed Feb 13, 2016
    Configuration menu
    Copy the full SHA
    107290c View commit details
    Browse the repository at this point in the history

Commits on Feb 14, 2016

  1. [SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy

    Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps.
    
    Author: Amit Dev <amitdev@gmail.com>
    
    Closes #11180 from amitdev/master.
    
    (cherry picked from commit 331293c)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    amitdev authored and srowen committed Feb 14, 2016
    Configuration menu
    Copy the full SHA
    ec40c5a View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2016

  1. [SPARK-13312][MLLIB] Update java train-validation-split example in ml…

    …-guide
    
    Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312.
    
    This contribution is my original work and I license the work to this project.
    
    Author: JeremyNixon <jnixon2@gmail.com>
    
    Closes #11199 from JeremyNixon/update_train_val_split_example.
    
    (cherry picked from commit adb5483)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    JeremyNixon authored and srowen committed Feb 15, 2016
    Configuration menu
    Copy the full SHA
    71f53ed View commit details
    Browse the repository at this point in the history

Commits on Feb 16, 2016

  1. Correct SparseVector.parse documentation

    There's a small typo in the SparseVector.parse docstring (which says that it returns a DenseVector rather than a SparseVector), which seems to be incorrect.
    
    Author: Miles Yucht <miles@databricks.com>
    
    Closes #11213 from mgyucht/fix-sparsevector-docs.
    
    (cherry picked from commit 827ed1c)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    mgyucht authored and srowen committed Feb 16, 2016
    Configuration menu
    Copy the full SHA
    d950891 View commit details
    Browse the repository at this point in the history

Commits on Feb 17, 2016

  1. [SPARK-13279] Remove O(n^2) operation from scheduler.

    This commit removes an unnecessary duplicate check in addPendingTask that meant
    that scheduling a task set took time proportional to (# tasks)^2.
    
    Author: Sital Kedia <skedia@fb.com>
    
    Closes #11175 from sitalkedia/fix_stuck_driver.
    
    (cherry picked from commit 1e1e31e)
    Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
    Sital Kedia authored and kayousterhout committed Feb 17, 2016
    Configuration menu
    Copy the full SHA
    98354ca View commit details
    Browse the repository at this point in the history
  2. [SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's…

    … default is "python2.7"
    
    Author: Christopher C. Aycock <chris@chrisaycock.com>
    
    Closes #11239 from chrisaycock/master.
    
    (cherry picked from commit a7c74d7)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    chrisaycock authored and JoshRosen committed Feb 17, 2016
    Configuration menu
    Copy the full SHA
    66106a6 View commit details
    Browse the repository at this point in the history

Commits on Feb 18, 2016

  1. [SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask com…

    …pares Option and String directly.
    
    ## What changes were proposed in this pull request?
    
    Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager)
    
    ## How was the this patch tested?
    
    Running Jenkins tests
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #11253 from srowen/SPARK-13371.
    
    (cherry picked from commit 7856253)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    srowen authored and Andrew Or committed Feb 18, 2016
    Configuration menu
    Copy the full SHA
    16f35c4 View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2016

  1. [SPARK-12546][SQL] Change default number of open parquet files

    A common problem that users encounter with Spark 1.6.0 is that writing to a partitioned parquet table OOMs.  The root cause is that parquet allocates a significant amount of memory that is not accounted for by our own mechanisms.  As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more.
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #11308 from marmbrus/parquetWriteOOM.
    
    (cherry picked from commit 173aa94)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    marmbrus committed Feb 22, 2016
    Configuration menu
    Copy the full SHA
    699644c View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2016

  1. [SPARK-13298][CORE][UI] Escape "label" to avoid DAG being broken by s…

    …ome special character
    
    ## What changes were proposed in this pull request?
    
    When there are some special characters (e.g., `"`, `\`) in `label`, DAG will be broken. This patch just escapes `label` to avoid DAG being broken by some special characters
    
    ## How was the this patch tested?
    
    Jenkins tests
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #11309 from zsxwing/SPARK-13298.
    
    (cherry picked from commit a11b399)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    zsxwing authored and Andrew Or committed Feb 23, 2016
    Configuration menu
    Copy the full SHA
    85e6a22 View commit details
    Browse the repository at this point in the history
  2. [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec

    In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would lead to exception because `processCmd` need to get the `CliSessionState` instance by calling `SessionState.get()`, but the return value would be a instance of `SessionState`. See the exception below.
    
    spark-sql> !echo "test";
    Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hive.ql.session.SessionState cannot be cast to org.apache.hadoop.hive.cli.CliSessionState
    	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301)
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691)
    	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    
    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes #9589 from adrian-wang/clicommand.
    
    (cherry picked from commit 5d80fac)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    
    Conflicts:
    	sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala
    adrian-wang authored and marmbrus committed Feb 23, 2016
    4 Configuration menu
    Copy the full SHA
    f7898f9 View commit details
    Browse the repository at this point in the history
  3. 1 Configuration menu
    Copy the full SHA
    40d11d0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    152252f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2902798 View commit details
    Browse the repository at this point in the history
  6. [SPARK-12746][ML] ArrayType(_, true) should also accept ArrayType(_, …

    …false) fix for branch-1.6
    
    https://issues.apache.org/jira/browse/SPARK-13359
    
    Author: Earthson Lu <Earthson.Lu@gmail.com>
    
    Closes #11237 from Earthson/SPARK-13359.
    Earthson authored and mengxr committed Feb 23, 2016
    Configuration menu
    Copy the full SHA
    d31854d View commit details
    Browse the repository at this point in the history
  7. [SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

    `GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call it in LDA without validating this requirement. So it might introduce errors. Replacing it by `Graph.apply` would be safer and more proper because it is a public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` here (though it doesn't seem so based on the implementation) or the test cases are special. jkbradley ankurdave
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #11226 from mengxr/SPARK-13355.
    
    (cherry picked from commit 764ca18)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    mengxr committed Feb 23, 2016
    Configuration menu
    Copy the full SHA
    0784e02 View commit details
    Browse the repository at this point in the history
  8. [SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns.

    ## What changes were proposed in this pull request?
    
    This PR adds equality operators to UDT classes so that they can be correctly tested for dataType equality during union operations.
    
    This was previously causing `"AnalysisException: u"unresolved operator 'Union;""` when trying to unionAll two dataframes with UDT columns as below.
    
    ```
    from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
    from pyspark.sql import types
    
    schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)])
    
    a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema)
    b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema)
    
    c = a.unionAll(b)
    ```
    
    ## How was the this patch tested?
    
    Tested using two unit tests in sql/test.py and the DataFrameSuite.
    
    Additional information here : https://issues.apache.org/jira/browse/SPARK-13410
    
    rxin
    
    Author: Franklyn D'souza <franklynd@gmail.com>
    
    Closes #11333 from damnMeddlingKid/udt-union-patch.
    damnMeddlingKid authored and rxin committed Feb 23, 2016
    Configuration menu
    Copy the full SHA
    573a2c9 View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2016

  1. [SPARK-13390][SQL][BRANCH-1.6] Fix the issue that Iterator.map().toSe…

    …q is not Serializable
    
    ## What changes were proposed in this pull request?
    
    `scala.collection.Iterator`'s methods (e.g., map, filter) will return an `AbstractIterator` which is not Serializable. E.g.,
    ```Scala
    scala> val iter = Array(1, 2, 3).iterator.map(_ + 1)
    iter: Iterator[Int] = non-empty iterator
    
    scala> println(iter.isInstanceOf[Serializable])
    false
    ```
    If we call something like `Iterator.map(...).toSeq`, it will create a `Stream` that contains a non-serializable `AbstractIterator` field and make the `Stream` be non-serializable.
    
    This PR uses `toArray` instead of `toSeq` to fix such issue in `def createDataFrame(data: java.util.List[_], beanClass: Class[_]): DataFrame`.
    
    ## How was the this patch tested?
    
    Jenkins tests.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #11334 from zsxwing/SPARK-13390.
    zsxwing authored and srowen committed Feb 24, 2016
    Configuration menu
    Copy the full SHA
    06f4fce View commit details
    Browse the repository at this point in the history
  2. [SPARK-13475][TESTS][SQL] HiveCompatibilitySuite should still run in …

    …PR builder even if a PR only changes sql/core
    
    ## What changes were proposed in this pull request?
    
    `HiveCompatibilitySuite` should still run in PR build even if a PR only changes sql/core. So, I am going to remove `ExtendedHiveTest` annotation from `HiveCompatibilitySuite`.
    
    https://issues.apache.org/jira/browse/SPARK-13475
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes #11351 from yhuai/SPARK-13475.
    
    (cherry picked from commit bc35380)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    yhuai committed Feb 24, 2016
    Configuration menu
    Copy the full SHA
    fe71cab View commit details
    Browse the repository at this point in the history

Commits on Feb 25, 2016

  1. [SPARK-13482][MINOR][CONFIGURATION] Make consistency of the configura…

    …iton named in TransportConf.
    
    `spark.storage.memoryMapThreshold` has two kind of the value, one is 2*1024*1024 as integer and the other one is '2m' as string.
    "2m" is recommanded in document but it will go wrong if the code goes into `TransportConf#memoryMapBytes`.
    
    [Jira](https://issues.apache.org/jira/browse/SPARK-13482)
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes #11360 from SaintBacchus/SPARK-13482.
    
    (cherry picked from commit 264533b)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    SaintBacchus authored and rxin committed Feb 25, 2016
    Configuration menu
    Copy the full SHA
    8975996 View commit details
    Browse the repository at this point in the history
  2. [SPARK-13473][SQL] Don't push predicate through project with nondeter…

    …ministic field(s)
    
    ## What changes were proposed in this pull request?
    
    Predicates shouldn't be pushed through project with nondeterministic field(s).
    
    See graphframes/graphframes#23 and SPARK-13473 for more details.
    
    This PR targets master, branch-1.6, and branch-1.5.
    
    ## How was this patch tested?
    
    A test case is added in `FilterPushdownSuite`. It constructs a query plan where a filter is over a project with a nondeterministic field. Optimized query plan shouldn't change in this case.
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #11348 from liancheng/spark-13473-no-ppd-through-nondeterministic-project-field.
    
    (cherry picked from commit 3fa6491)
    Signed-off-by: Wenchen Fan <wenchen@databricks.com>
    liancheng authored and cloud-fan committed Feb 25, 2016
    Configuration menu
    Copy the full SHA
    3cc938a View commit details
    Browse the repository at this point in the history
  3. [SPARK-13444][MLLIB] QuantileDiscretizer chooses bad splits on large …

    …DataFrames
    
    Change line 113 of QuantileDiscretizer.scala to
    
    `val requiredSamples = math.max(numBins * numBins, 10000.0)`
    
    so that `requiredSamples` is a `Double`.  This will fix the division in line 114 which currently results in zero if `requiredSamples < dataset.count`
    
    Manual tests.  I was having a problems using QuantileDiscretizer with my a dataset and after making this change QuantileDiscretizer behaves as expected.
    
    Author: Oliver Pierson <ocp@gatech.edu>
    Author: Oliver Pierson <opierson@umd.edu>
    
    Closes #11319 from oliverpierson/SPARK-13444.
    
    (cherry picked from commit 6f8e835)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    Oliver Pierson authored and srowen committed Feb 25, 2016
    1 Configuration menu
    Copy the full SHA
    cb869a1 View commit details
    Browse the repository at this point in the history
  4. [SPARK-13441][YARN] Fix NPE in yarn Client.createConfArchive method

    ## What changes were proposed in this pull request?
    
    Instead of using result of File.listFiles() directly, which may throw NPE, check for null first. If it is null, log a warning instead
    
    ## How was the this patch tested?
    
    Ran the ./dev/run-tests locally
    Tested manually on a cluster
    
    Author: Terence Yim <terence@cask.co>
    
    Closes #11337 from chtyim/fixes/SPARK-13441-null-check.
    
    (cherry picked from commit fae88af)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    chtyim authored and srowen committed Feb 25, 2016
    Configuration menu
    Copy the full SHA
    1f03163 View commit details
    Browse the repository at this point in the history
  5. [SPARK-13439][MESOS] Document that spark.mesos.uris is comma-separated

    Author: Michael Gummelt <mgummelt@mesosphere.io>
    
    Closes #11311 from mgummelt/document_csv.
    
    (cherry picked from commit c98a93d)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    Michael Gummelt authored and srowen committed Feb 25, 2016
    Configuration menu
    Copy the full SHA
    e3802a7 View commit details
    Browse the repository at this point in the history
  6. [SPARK-12316] Wait a minutes to avoid cycle calling.

    When application end, AM will clean the staging dir.
    But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'.
    Then it lead driver StackOverflowError.
    https://issues.apache.org/jira/browse/SPARK-12316
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes #10475 from SaintBacchus/SPARK-12316.
    
    (cherry picked from commit 5fcf4c2)
    Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
    SaintBacchus authored and Tom Graves committed Feb 25, 2016
    Configuration menu
    Copy the full SHA
    5f7440b View commit details
    Browse the repository at this point in the history
  7. Revert "[SPARK-13444][MLLIB] QuantileDiscretizer chooses bad splits o…

    …n large DataFrames"
    
    This reverts commit cb869a1.
    mengxr committed Feb 25, 2016
    Configuration menu
    Copy the full SHA
    d59a08f View commit details
    Browse the repository at this point in the history
  8. [SPARK-12874][ML] ML StringIndexer does not protect itself from colum…

    …n name duplication
    
    ## What changes were proposed in this pull request?
    ML StringIndexer does not protect itself from column name duplication.
    
    We should still improve a way to validate a schema of `StringIndexer` and `StringIndexerModel`.  However, it would be great to fix at another issue.
    
    ## How was this patch tested?
    unit test
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #11370 from yu-iskw/SPARK-12874.
    
    (cherry picked from commit 14e2700)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    yu-iskw authored and mengxr committed Feb 25, 2016
    Configuration menu
    Copy the full SHA
    abe8f99 View commit details
    Browse the repository at this point in the history

Commits on Feb 26, 2016

  1. [SPARK-13454][SQL] Allow users to drop a table with a name starting w…

    …ith an underscore.
    
    ## What changes were proposed in this pull request?
    
    This change adds a workaround to allow users to drop a table with a name starting with an underscore. Without this patch, we can create such a table, but we cannot drop it. The reason is that Hive's parser unquote an quoted identifier (see https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L453). So, when we issue a drop table command to Hive, a table name starting with an underscore is actually not quoted. Then, Hive will complain about it because it does not support a table name starting with an underscore without using backticks (underscores are allowed as long as it is not the first char though).
    
    ## How was this patch tested?
    
    Add a test to make sure we can drop a table with a name starting with an underscore.
    
    https://issues.apache.org/jira/browse/SPARK-13454
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes #11349 from yhuai/fixDropTable.
    yhuai committed Feb 26, 2016
    Configuration menu
    Copy the full SHA
    a57f87e View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2016

  1. [SPARK-13474][PROJECT INFRA] Update packaging scripts to push artifac…

    …ts to home.apache.org
    
    Due to the people.apache.org -> home.apache.org migration, we need to update our packaging scripts to publish artifacts to the new server. Because the new server only supports sftp instead of ssh, we need to update the scripts to use lftp instead of ssh + rsync.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #11350 from JoshRosen/update-release-scripts-for-apache-home.
    
    (cherry picked from commit f77dc4e)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Feb 27, 2016
    Configuration menu
    Copy the full SHA
    8a43c3b View commit details
    Browse the repository at this point in the history
  2. Update CHANGES.txt and spark-ec2 and R package versions for 1.6.1

    This patch updates a few more 1.6.0 version numbers for the 1.6.1 release candidate.
    
    Verified this by running
    
    ```
    git grep "1\.6\.0" | grep -v since | grep -v deprecated | grep -v Since | grep -v versionadded | grep 1.6.0
    ```
    
    and inspecting the output.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #11407 from JoshRosen/version-number-updates.
    JoshRosen committed Feb 27, 2016
    Configuration menu
    Copy the full SHA
    eb6f6da View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    15de51c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    dcf60d7 View commit details
    Browse the repository at this point in the history

Commits on Feb 29, 2016

  1. [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map…

    … string datatypes to Oracle VARCHAR datatype
    
    ## What changes were proposed in this pull request?
    
    This Pull request is used for the fix SPARK-12941, creating a data type mapping to Oracle for the corresponding data type"Stringtype" from dataframe. This PR is for the master branch fix, where as another PR is already tested with the branch 1.4
    
    ## How was the this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    This patch was tested using the Oracle docker .Created a new integration suite for the same.The oracle.jdbc jar was to be downloaded from the maven repository.Since there was no jdbc jar available in the maven repository, the jar was downloaded from oracle site manually and installed in the local; thus tested. So, for SparkQA test case run, the ojdbc jar might be manually placed in the local maven repository(com/oracle/ojdbc6/11.2.0.2.0) while Spark QA test run.
    
    Author: thomastechs <thomas.sebastian@tcs.com>
    
    Closes #11306 from thomastechs/master.
    
    (cherry picked from commit 8afe491)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    thomastechs authored and yhuai committed Feb 29, 2016
    Configuration menu
    Copy the full SHA
    fedb813 View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2016

  1. [SPARK-13465] Add a task failure listener to TaskContext

    ## What changes were proposed in this pull request?
    
    TaskContext supports task completion callback, which gets called regardless of task failures. However, there is no way for the listener to know if there is an error. This patch adds a new listener that gets called when a task fails.
    
    ## How was this patch tested?
    
    New unit test case and integration test case covering the code path
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #11478 from davies/add_failure_1.6.
    Davies Liu authored and davies committed Mar 3, 2016
    Configuration menu
    Copy the full SHA
    1ce2c12 View commit details
    Browse the repository at this point in the history
  2. [SPARK-13601] call failure callbacks before writer.close()

    In order to tell OutputStream that the task has failed or not, we should call the failure callbacks BEFORE calling writer.close().
    
    Added new unit tests.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #11450 from davies/callback.
    davies committed Mar 3, 2016
    Configuration menu
    Copy the full SHA
    fa86dc4 View commit details
    Browse the repository at this point in the history

Commits on Mar 4, 2016

  1. [SPARK-13601] [TESTS] use 1 partition in tests to avoid race conditions

    Fix race conditions when cleanup files.
    
    Existing tests.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #11507 from davies/flaky.
    
    (cherry picked from commit d062587)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    
    Conflicts:
    	sql/hive/src/test/scala/org/apache/spark/sql/sources/CommitFailureTestRelationSuite.scala
    Davies Liu authored and davies committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    b3a5129 View commit details
    Browse the repository at this point in the history
  2. [SPARK-13652][CORE] Copy ByteBuffer in sendRpcSync as it will be recy…

    …cled
    
    ## What changes were proposed in this pull request?
    
    `sendRpcSync` should copy the response content because the underlying buffer will be recycled and reused.
    
    ## How was this patch tested?
    
    Jenkins unit tests.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #11499 from zsxwing/SPARK-13652.
    
    (cherry picked from commit 465c665)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    51c676e View commit details
    Browse the repository at this point in the history
  3. [SPARK-11515][ML] QuantileDiscretizer should take random seed

    cc jkbradley
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #9535 from yu-iskw/SPARK-11515.
    
    (cherry picked from commit 574571c)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    yu-iskw authored and srowen committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    5a27129 View commit details
    Browse the repository at this point in the history
  4. [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map…

    … string datatypes to Oracle VARCHAR datatype mapping
    
    A test suite added for the bug fix -SPARK 12941; for the mapping of the StringType to corresponding in Oracle
    
    manual tests done
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Author: thomastechs <thomas.sebastian@tcs.com>
    Author: THOMAS SEBASTIAN <thomas.sebastian@tcs.com>
    
    Closes #11489 from thomastechs/thomastechs-12941-master-new.
    
    (cherry picked from commit f6ac7c3)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    
    Conflicts:
    	sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
    thomastechs authored and yhuai committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    528e373 View commit details
    Browse the repository at this point in the history
  5. [SPARK-13444][MLLIB] QuantileDiscretizer chooses bad splits on large …

    …DataFrames
    
    ## What changes were proposed in this pull request?
    
    Change line 113 of QuantileDiscretizer.scala to
    
    `val requiredSamples = math.max(numBins * numBins, 10000.0)`
    
    so that `requiredSamples` is a `Double`.  This will fix the division in line 114 which currently results in zero if `requiredSamples < dataset.count`
    
    ## How was the this patch tested?
    Manual tests.  I was having a problems using QuantileDiscretizer with my a dataset and after making this change QuantileDiscretizer behaves as expected.
    
    Author: Oliver Pierson <ocp@gatech.edu>
    Author: Oliver Pierson <opierson@umd.edu>
    
    Closes #11319 from oliverpierson/SPARK-13444.
    Oliver Pierson authored and mengxr committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    f0cc511 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ffaf7c0 View commit details
    Browse the repository at this point in the history

Commits on Mar 6, 2016

  1. [SPARK-13697] [PYSPARK] Fix the missing module name of TransformFunct…

    …ionSerializer.loads
    
    ## What changes were proposed in this pull request?
    
    Set the function's module name to `__main__` if it's missing in `TransformFunctionSerializer.loads`.
    
    ## How was this patch tested?
    
    Manually test in the shell.
    
    Before this patch:
    ```
    >>> from pyspark.streaming import StreamingContext
    >>> from pyspark.streaming.util import TransformFunction
    >>> ssc = StreamingContext(sc, 1)
    >>> func = TransformFunction(sc, lambda x: x, sc.serializer)
    >>> func.rdd_wrapper(lambda x: x)
    TransformFunction(<function <lambda> at 0x106ac8b18>)
    >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers)))
    >>> func2 = ssc._transformerSerializer.loads(bytes)
    >>> print(func2.func.__module__)
    None
    >>> print(func2.rdd_wrap_func.__module__)
    None
    >>>
    ```
    After this patch:
    ```
    >>> from pyspark.streaming import StreamingContext
    >>> from pyspark.streaming.util import TransformFunction
    >>> ssc = StreamingContext(sc, 1)
    >>> func = TransformFunction(sc, lambda x: x, sc.serializer)
    >>> func.rdd_wrapper(lambda x: x)
    TransformFunction(<function <lambda> at 0x108bf1b90>)
    >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers)))
    >>> func2 = ssc._transformerSerializer.loads(bytes)
    >>> print(func2.func.__module__)
    __main__
    >>> print(func2.rdd_wrap_func.__module__)
    __main__
    >>>
    ```
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #11535 from zsxwing/loads-module.
    
    (cherry picked from commit ee913e6)
    Signed-off-by: Davies Liu <davies.liu@gmail.com>
    zsxwing authored and davies committed Mar 6, 2016
    Configuration menu
    Copy the full SHA
    704a54c View commit details
    Browse the repository at this point in the history

Commits on Mar 7, 2016

  1. [SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrec…

    …tly refers to StatefulNetworkWordCount
    
    ## What changes were proposed in this pull request?
    The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey.
    
    ## How was this patch tested?
    Have tested the new documentation with jekyll build.
    
    Author: rmishra <rmishra@pivotal.io>
    
    Closes #11545 from rishitesh/SPARK-13705.
    
    (cherry picked from commit 4b13896)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    rmishra authored and srowen committed Mar 7, 2016
    Configuration menu
    Copy the full SHA
    18ef2f2 View commit details
    Browse the repository at this point in the history
  2. [SPARK-13599][BUILD] remove transitive groovy dependencies from spark…

    …-hive and spark-hiveserver (branch 1.6)
    
    ## What changes were proposed in this pull request?
    
    This is just the patch of #11449 cherry picked to branch-1.6; the enforcer and dep/ diffs are cut
    
    Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.
    
    This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR.
    
    ## How was this patch tested?
    
    1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver`
    1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
    1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver  -Dverbose > target/dependencies.txt`
    1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive`
    1. Patch applied
    1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set
    1. Examined created spark-assembly, verified no org.codehaus packages
    1. Verified that the maven dependency tree no longer references groovy
    
    The `master` version updates the dependency files and an enforcer rule to keep groovy out; this patch strips it out.
    
    Author: Steve Loughran <stevel@hortonworks.com>
    
    Closes #11473 from steveloughran/fixes/SPARK-13599-groovy+branch-1.6.
    steveloughran authored and srowen committed Mar 7, 2016
    Configuration menu
    Copy the full SHA
    2434f16 View commit details
    Browse the repository at this point in the history
  3. [MINOR][DOC] improve the doc for "spark.memory.offHeap.size"

    The description of "spark.memory.offHeap.size" in the current document does not clearly state that memory is counted with bytes....
    
    This PR contains a small fix for this tiny issue
    
    document fix
    
    Author: CodingCat <zhunansjtu@gmail.com>
    
    Closes #11561 from CodingCat/master.
    
    (cherry picked from commit a3ec50a)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    CodingCat authored and zsxwing committed Mar 7, 2016
    Configuration menu
    Copy the full SHA
    cf4e62e View commit details
    Browse the repository at this point in the history
  4. [SPARK-13648] Add Hive Cli to classes for isolated classloader

    ## What changes were proposed in this pull request?
    
    Adding the hive-cli classes to the classloader
    
    ## How was this patch tested?
    
    The hive Versionssuite tests were run
    
    This is my original work and I license the work to the project under the project's open source license.
    
    Author: Tim Preece <tim.preece.in.oz@gmail.com>
    
    Closes #11495 from preecet/master.
    
    (cherry picked from commit 46f25c2)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    preecet authored and marmbrus committed Mar 7, 2016
    Configuration menu
    Copy the full SHA
    695c8a2 View commit details
    Browse the repository at this point in the history

Commits on Mar 8, 2016

  1. [SPARK-13711][CORE] Don't call SparkUncaughtExceptionHandler in AppCl…

    …ient as it's in driver
    
    ## What changes were proposed in this pull request?
    
    AppClient runs in the driver side. It should not call `Utils.tryOrExit` as it will send exception to SparkUncaughtExceptionHandler and call `System.exit`. This PR just removed `Utils.tryOrExit`.
    
    ## How was this patch tested?
    
    manual tests.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #11566 from zsxwing/SPARK-13711.
    zsxwing committed Mar 8, 2016
    Configuration menu
    Copy the full SHA
    bace137 View commit details
    Browse the repository at this point in the history

Commits on Mar 9, 2016

  1. [SPARK-13755] Escape quotes in SQL plan visualization node labels

    When generating Graphviz DOT files in the SQL query visualization we need to escape double-quotes inside node labels. This is a followup to #11309, which fixed a similar graph in Spark Core's DAG visualization.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #11587 from JoshRosen/graphviz-escaping.
    
    (cherry picked from commit 81f54ac)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Mar 9, 2016
    Configuration menu
    Copy the full SHA
    8ec4f15 View commit details
    Browse the repository at this point in the history
  2. [SPARK-13631][CORE] Thread-safe getLocationsWithLargestOutputs

    ## What changes were proposed in this pull request?
    
    If a job is being scheduled in one thread which has a dependency on an
    RDD currently executing a shuffle in another thread, Spark would throw a
    NullPointerException. This patch synchronizes access to `mapStatuses` and
    skips null status entries (which are in-progress shuffle tasks).
    
    ## How was this patch tested?
    
    Our client code unit test suite, which was reliably reproducing the race
    condition with 10 threads, shows that this fixes it. I have not found a minimal
    test case to add to Spark, but I will attempt to do so if desired.
    
    The same test case was tripping up on SPARK-4454, which was fixed by
    making other DAGScheduler code thread-safe.
    
    shivaram srowen
    
    Author: Andy Sloane <asloane@tetrationanalytics.com>
    
    Closes #11505 from a1k0n/SPARK-13631.
    
    (cherry picked from commit cbff280)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    Andy Sloane authored and srowen committed Mar 9, 2016
    Configuration menu
    Copy the full SHA
    95105b0 View commit details
    Browse the repository at this point in the history
  3. [SPARK-13242] [SQL] codegen fallback in case-when if there many branches

    ## What changes were proposed in this pull request?
    
    If there are many branches in a CaseWhen expression, the generated code could go above the 64K limit for single java method, will fail to compile. This PR change it to fallback to interpret mode if there are more than 20 branches.
    
    ## How was this patch tested?
    
    Add tests
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #11606 from davies/fix_when_16.
    Davies Liu authored and davies committed Mar 9, 2016
    Configuration menu
    Copy the full SHA
    bea91a9 View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2016

  1. [SPARK-13760][SQL] Fix BigDecimal constructor for FloatType

    ## What changes were proposed in this pull request?
    
    A very minor change for using `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The latter is deprecated and can result in inconsistencies due to an implicit conversion to `Double`.
    
    ## How was this patch tested?
    
    N/A
    
    cc yhuai
    
    Author: Sameer Agarwal <sameer@databricks.com>
    
    Closes #11597 from sameeragarwal/bigdecimal.
    
    (cherry picked from commit 926e9c4)
    Signed-off-by: Yin Huai <yhuai@databricks.com>
    sameeragarwal authored and yhuai committed Mar 10, 2016
    Configuration menu
    Copy the full SHA
    8a1bd58 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    60cb270 View commit details
    Browse the repository at this point in the history
  3. [SPARK-13663][CORE] Upgrade Snappy Java to 1.1.2.1

    Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around.
    Supersedes #11524
    
    Jenkins tests.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #11631 from srowen/SPARK-13663.
    
    (cherry picked from commit 927e22e)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    srowen committed Mar 10, 2016
    Configuration menu
    Copy the full SHA
    07ace27 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2016

  1. [MINOR][DOC] Fix supported hive version in doc

    ## What changes were proposed in this pull request?
    
    Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines.
    ```
    -By default Spark will build with Hive 0.13.1 bindings.
    +By default Spark will build with Hive 1.2.1 bindings.
    -# Apache Hadoop 2.4.X with Hive 13 support
    +# Apache Hadoop 2.4.X with Hive 1.2.1 support
    ```
    `sql/README.md` file also describe
    
    ## How was this patch tested?
    
    Manual.
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Author: Dongjoon Hyun <dongjoon@apache.org>
    
    Closes #11639 from dongjoon-hyun/fix_doc_hive_version.
    
    (cherry picked from commit 88fa866)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    dongjoon-hyun authored and rxin committed Mar 11, 2016
    Configuration menu
    Copy the full SHA
    078c714 View commit details
    Browse the repository at this point in the history
  2. [SPARK-13327][SPARKR] Added parameter validations for colnames<-

    Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>
    Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>
    
    Closes #11220 from olarayej/SPARK-13312-3.
    
    (cherry picked from commit 416e71a)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    Oscar D. Lara Yejas authored and shivaram committed Mar 11, 2016
    Configuration menu
    Copy the full SHA
    db4795a View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2016

  1. [SPARK-13810][CORE] Add Port Configuration Suggestions on Bind Except…

    …ions
    
    ## What changes were proposed in this pull request?
    Currently, when a java.net.BindException is thrown, it displays the following message:
    
    java.net.BindException: Address already in use: Service '$serviceName' failed after 16 retries!
    
    This change adds port configuration suggestions to the BindException, for example, for the UI, it now displays
    
    java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries! Consider explicitly setting the appropriate port for 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
    
    ## How was this patch tested?
    Manual tests
    
    Author: Bjorn Jonsson <bjornjon@gmail.com>
    
    Closes #11644 from bjornjon/master.
    
    (cherry picked from commit 515e4af)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    bjornjon authored and srowen committed Mar 13, 2016
    Configuration menu
    Copy the full SHA
    5e08db3 View commit details
    Browse the repository at this point in the history

Commits on Mar 14, 2016

  1. [SQL] fix typo in DataSourceRegister

    ## What changes were proposed in this pull request?
    fix typo in DataSourceRegister
    
    ## How was this patch tested?
    
    found when going through latest code
    
    Author: Jacky Li <jacky.likun@huawei.com>
    
    Closes #11686 from jackylk/patch-12.
    
    (cherry picked from commit f3daa09)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    jackylk authored and rxin committed Mar 14, 2016
    Configuration menu
    Copy the full SHA
    3519ce9 View commit details
    Browse the repository at this point in the history