Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6747] [SQL] Support List<> as a return type in Hive UDF #6179

Closed
wants to merge 818 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Jun 22, 2015

  1. [SPARK-8482] Added M4 instances to the list.

    AWS recently added M4 instances (https://aws.amazon.com/blogs/aws/the-new-m4-instance-type-bonus-price-reduction-on-m3-c4/).
    
    Author: Pradeep Chhetri <pradeep.chhetri89@gmail.com>
    
    Closes apache#6899 from pradeepchhetri/master and squashes the following commits:
    
    4f4ea79 [Pradeep Chhetri] Added t2.large instance
    3d2bb6c [Pradeep Chhetri] Added M4 instances to the list
    pradeepchhetri authored and shivaram committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    ba8a453 View commit details
    Browse the repository at this point in the history
  2. [SPARK-8511] [PYSPARK] Modify a test to remove a saved model in `regr…

    …ession.py`
    
    [[SPARK-8511] Modify a test to remove a saved model in `regression.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8511)
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes apache#6926 from yu-iskw/SPARK-8511 and squashes the following commits:
    
    7cd0948 [Yu ISHIKAWA] Use `shutil.rmtree()` to temporary directories for saving model testings, instead of `os.removedirs()`
    4a01c9e [Yu ISHIKAWA] [SPARK-8511][pyspark] Modify a test to remove a saved model in `regression.py`
    yu-iskw authored and jkbradley committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    5d89d9f View commit details
    Browse the repository at this point in the history
  3. [SPARK-8104] [SQL] auto alias expressions in analyzer

    Currently we auto alias expression in parser. However, during parser phase we don't have enough information to do the right alias. For example, Generator that has more than 1 kind of element need MultiAlias, ExtractValue don't need Alias if it's in middle of a ExtractValue chain.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#6647 from cloud-fan/alias and squashes the following commits:
    
    552eba4 [Wenchen Fan] fix python
    5b5786d [Wenchen Fan] fix agg
    73a90cb [Wenchen Fan] fix case-preserve of ExtractValue
    4cfd23c [Wenchen Fan] fix order by
    d18f401 [Wenchen Fan] refine
    9f07359 [Wenchen Fan] address comments
    39c1aef [Wenchen Fan] small fix
    33640ec [Wenchen Fan] auto alias expressions in analyzer
    cloud-fan authored and marmbrus committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    da7bbb9 View commit details
    Browse the repository at this point in the history
  4. [SPARK-8532] [SQL] In Python's DataFrameWriter, save/saveAsTable/json…

    …/parquet/jdbc always override mode
    
    https://issues.apache.org/jira/browse/SPARK-8532
    
    This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`.
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes apache#6937 from yhuai/SPARK-8532 and squashes the following commits:
    
    f972d5d [Yin Huai] davies's comment.
    d37abd2 [Yin Huai] style.
    d21290a [Yin Huai] Python doc.
    889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet.
    7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value.
    d696dff [Yin Huai] Python style.
    88eb6c4 [Yin Huai] If mode is "error", do not call mode method.
    c40c461 [Yin Huai] Regression test.
    yhuai committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    5ab9fcf View commit details
    Browse the repository at this point in the history
  5. [SPARK-8455] [ML] Implement n-gram feature transformer

    Implementation of n-gram feature transformer for ML.
    
    Author: Feynman Liang <fliang@databricks.com>
    
    Closes apache#6887 from feynmanliang/ngram-featurizer and squashes the following commits:
    
    d2c839f [Feynman Liang] Make n > input length yield empty output
    9fadd36 [Feynman Liang] Add empty and corner test cases, fix names and spaces
    fe93873 [Feynman Liang] Implement n-gram feature transformer
    Feynman Liang authored and jkbradley committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    afe35f0 View commit details
    Browse the repository at this point in the history
  6. [SPARK-8537] [SPARKR] Add a validation rule about the curly braces in…

    … SparkR to `.lintr`
    
    [[SPARK-8537] Add a validation rule about the curly braces in SparkR to `.lintr` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8537)
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes apache#6940 from yu-iskw/SPARK-8537 and squashes the following commits:
    
    7eec1a0 [Yu ISHIKAWA] [SPARK-8537][SparkR] Add a validation rule about the curly braces in SparkR to `.lintr`
    yu-iskw authored and shivaram committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    b1f3a48 View commit details
    Browse the repository at this point in the history
  7. [SPARK-8356] [SQL] Reconcile callUDF and callUdf

    Deprecates ```callUdf``` in favor of ```callUDF```.
    
    Author: BenFradet <benjamin.fradet@gmail.com>
    
    Closes apache#6902 from BenFradet/SPARK-8356 and squashes the following commits:
    
    ef4e9d8 [BenFradet] deprecated callUDF, use udf instead
    9b1de4d [BenFradet] reinstated unit test for the deprecated callUdf
    cbd80a5 [BenFradet] deprecated callUdf in favor of callUDF
    BenFradet authored and marmbrus committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    50d3242 View commit details
    Browse the repository at this point in the history
  8. [SPARK-8492] [SQL] support binaryType in UnsafeRow

    Support BinaryType in UnsafeRow, just like StringType.
    
    Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM).
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#6911 from davies/unsafe_bin and squashes the following commits:
    
    d68706f [Davies Liu] update comment
    519f698 [Davies Liu] address comment
    98a964b [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_bin
    180b49d [Davies Liu] fix zero-out
    22e4c0a [Davies Liu] zero-out padding bytes
    6abfe93 [Davies Liu] fix style
    447dea0 [Davies Liu] support binaryType in UnsafeRow
    Davies Liu committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    96aa013 View commit details
    Browse the repository at this point in the history
  9. [HOTFIX] [TESTS] Typo mqqt -> mqtt

    This was introduced in apache#6866.
    Andrew Or committed Jun 22, 2015
    Configuration menu
    Copy the full SHA
    1dfb0f7 View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2015

  1. [SPARK-7153] [SQL] support all integral type ordinal in GetArrayItem

    first convert `ordinal` to `Number`, then convert to int type.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#5706 from cloud-fan/7153 and squashes the following commits:
    
    915db79 [Wenchen Fan] fix 7153
    cloud-fan authored and marmbrus committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    860a49e View commit details
    Browse the repository at this point in the history
  2. [SPARK-8307] [SQL] improve timestamp from parquet

    This PR change to convert julian day to unix timestamp directly (without Calendar and Timestamp).
    
    cc adrian-wang rxin
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#6759 from davies/improve_ts and squashes the following commits:
    
    849e301 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
    b0e4cad [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
    8e2d56f [Davies Liu] address comments
    634b9f5 [Davies Liu] fix mima
    4891efb [Davies Liu] address comment
    bfc437c [Davies Liu] fix build
    ae5979c [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
    602b969 [Davies Liu] remove jodd
    2f2e48c [Davies Liu] fix test
    8ace611 [Davies Liu] fix mima
    212143b [Davies Liu] fix mina
    c834108 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
    a3171b8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
    5233974 [Davies Liu] fix scala style
    361fd62 [Davies Liu] address comments
    ea196d4 [Davies Liu] improve timestamp from parquet
    Davies Liu committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    6b7f2ce View commit details
    Browse the repository at this point in the history
  3. [SPARK-7859] [SQL] Collect_set() behavior differences which fails the…

    … unit test under jdk8
    
    To reproduce that:
    ```
    JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 | build/sbt -Phadoop-2.3 -Phive  'test-only org.apache.spark.sql.hive.execution.HiveWindowFunctionQueryWithoutCodeGenSuite'
    ```
    
    A simple workaround to fix that is update the original query, for getting the output size instead of the exact elements of the array (output by collect_set())
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes apache#6402 from chenghao-intel/windowing and squashes the following commits:
    
    99312ad [Cheng Hao] add order by for the select clause
    edf8ce3 [Cheng Hao] update the code as suggested
    7062da7 [Cheng Hao] fix the collect_set() behaviour differences under different versions of JDK
    chenghao-intel authored and yhuai committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    13321e6 View commit details
    Browse the repository at this point in the history
  4. MAINTENANCE: Automated closing of pull requests.

    This commit exists to close the following pull requests on Github:
    
    Closes apache#2849 (close requested by 'srowen')
    Closes apache#2786 (close requested by 'andrewor14')
    Closes apache#4678 (close requested by 'JoshRosen')
    Closes apache#5457 (close requested by 'andrewor14')
    Closes apache#3346 (close requested by 'andrewor14')
    Closes apache#6518 (close requested by 'andrewor14')
    Closes apache#5403 (close requested by 'pwendell')
    Closes apache#2110 (close requested by 'srowen')
    pwendell committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    c4d2343 View commit details
    Browse the repository at this point in the history
  5. [SPARK-8548] [SPARKR] Remove the trailing whitespaces from the SparkR…

    … files
    
    [[SPARK-8548] Remove the trailing whitespaces from the SparkR files - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8548)
    
    - This is the result of `lint-r`
        https://gist.github.com/yu-iskw/0019b37a2c1167f33986
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes apache#6945 from yu-iskw/SPARK-8548 and squashes the following commits:
    
    0bd567a [Yu ISHIKAWA] [SPARK-8548][SparkR] Remove the trailing whitespaces from the SparkR files
    yu-iskw authored and shivaram committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    44fa7df View commit details
    Browse the repository at this point in the history
  6. [SPARK-7781] [MLLIB] gradient boosted trees.train regressor missing m…

    …ax bins
    
    Author: Holden Karau <holden@pigscanfly.ca>
    
    Closes apache#6331 from holdenk/SPARK-7781-GradientBoostedTrees.trainRegressor-missing-max-bins and squashes the following commits:
    
    2894695 [Holden Karau] remove extra blank line
    2573e8d [Holden Karau] Update the scala side of the pythonmllibapi and make the test a bit nicer too
    3a09170 [Holden Karau] add maxBins to to the train method as well
    af7f274 [Holden Karau] Add maxBins to GradientBoostedTrees.trainRegressor and correctly mention the default of 32 in other places where it mentioned 100
    holdenk authored and jkbradley committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    164fe2a View commit details
    Browse the repository at this point in the history
  7. [SPARK-8431] [SPARKR] Add in operator to DataFrame Column in SparkR

    [[SPARK-8431] Add in operator to DataFrame Column in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8431)
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes apache#6941 from yu-iskw/SPARK-8431 and squashes the following commits:
    
    1f64423 [Yu ISHIKAWA] Modify the comment
    f4309a7 [Yu ISHIKAWA] Make a `setMethod` for `%in%` be independent
    6e37936 [Yu ISHIKAWA] Modify a variable name
    c196173 [Yu ISHIKAWA] [SPARK-8431][SparkR] Add in operator to DataFrame Column in SparkR
    yu-iskw authored and Davies Liu committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    d4f6335 View commit details
    Browse the repository at this point in the history
  8. [SPARK-8359] [SQL] Fix incorrect decimal precision after multiplication

    JIRA: https://issues.apache.org/jira/browse/SPARK-8359
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes apache#6814 from viirya/fix_decimal2 and squashes the following commits:
    
    071a757 [Liang-Chi Hsieh] Remove maximum precision and use MathContext.UNLIMITED.
    df217d4 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal2
    a43bfc3 [Liang-Chi Hsieh] Add MathContext with maximum supported precision.
    72eeb3f [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal2
    44c9348 [Liang-Chi Hsieh] Fix incorrect decimal precision after multiplication.
    viirya authored and Davies Liu committed Jun 23, 2015
    5 Configuration menu
    Copy the full SHA
    31bd306 View commit details
    Browse the repository at this point in the history
  9. [SPARK-8483] [STREAMING] Remove commons-lang3 dependency from Flume Si…

    …nk. Also bump Flume version to 1.6.0
    
    Author: Hari Shreedharan <hshreedharan@apache.org>
    
    Closes apache#6910 from harishreedharan/remove-commons-lang3 and squashes the following commits:
    
    9875f7d [Hari Shreedharan] Revert back to Flume 1.4.0
    ca35eb0 [Hari Shreedharan] [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0
    harishreedharan authored and tdas committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    9b618fb View commit details
    Browse the repository at this point in the history
  10. [SPARK-8541] [PYSPARK] test the absolute error in approx doctests

    A minor change but one which is (presumably) visible on the public api docs webpage.
    
    Author: Scott Taylor <github@megatron.me.uk>
    
    Closes apache#6942 from megatron-me-uk/patch-3 and squashes the following commits:
    
    fbed000 [Scott Taylor] test the absolute error in approx doctests
    megatron-me-uk authored and JoshRosen committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    f0dcbe8 View commit details
    Browse the repository at this point in the history
  11. [SPARK-8300] DataFrame hint for broadcast join.

    Users can now do
    ```scala
    left.join(broadcast(right), "joinKey")
    ```
    to give the query planner a hint that "right" DataFrame is small and should be broadcasted.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#6751 from rxin/broadcastjoin-hint and squashes the following commits:
    
    953eec2 [Reynold Xin] Code review feedback.
    88752d8 [Reynold Xin] Fixed import.
    8187b88 [Reynold Xin] [SPARK-8300] DataFrame hint for broadcast join.
    rxin authored and marmbrus committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    6ceb169 View commit details
    Browse the repository at this point in the history
  12. [SPARK-8498] [TUNGSTEN] fix npe in errorhandling path in unsafeshuffl…

    …e writer
    
    Author: Holden Karau <holden@pigscanfly.ca>
    
    Closes apache#6918 from holdenk/SPARK-8498-fix-npe-in-errorhandling-path-in-unsafeshuffle-writer and squashes the following commits:
    
    f807832 [Holden Karau] Log error if we can't throw it
    855f9aa [Holden Karau] Spelling - not my strongest suite. Fix Propegates to Propagates.
    039d620 [Holden Karau] Add missing closeandwriteoutput
    30e558d [Holden Karau] go back to try/finally
    e503b8c [Holden Karau] Improve the test to ensure we aren't masking the underlying exception
    ae0b7a7 [Holden Karau] Fix the test
    2e6abf7 [Holden Karau] Be more cautious when cleaning up during failed write and re-throw user exceptions
    holdenk authored and JoshRosen committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    0f92be5 View commit details
    Browse the repository at this point in the history
  13. [SQL] [DOCS] updated the documentation for explode

    the syntax was incorrect in the example in explode
    
    Author: lockwobr <lockwobr@gmail.com>
    
    Closes apache#6943 from lockwobr/master and squashes the following commits:
    
    3d864d1 [lockwobr] updated the documentation for explode
    lockwobr authored and sarutak committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    4f7fbef View commit details
    Browse the repository at this point in the history
  14. [SPARK-7235] [SQL] Refactor the grouping sets

    The logical plan `Expand` takes the `output` as constructor argument, which break the references chain. We need to refactor the code, as well as the column pruning.
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes apache#5780 from chenghao-intel/expand and squashes the following commits:
    
    76e4aa4 [Cheng Hao] revert the change for case insenstive
    7c10a83 [Cheng Hao] refactor the grouping sets
    chenghao-intel authored and marmbrus committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    7b1450b View commit details
    Browse the repository at this point in the history
  15. [SPARK-8432] [SQL] fix hashCode() and equals() of BinaryType in Row

    Also added more tests in LiteralExpressionSuite
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#6876 from davies/fix_hashcode and squashes the following commits:
    
    429c2c0 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_hashcode
    32d9811 [Davies Liu] fix test
    a0626ed [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_hashcode
    89c2432 [Davies Liu] fix style
    bd20780 [Davies Liu] check with catalyst types
    41caec6 [Davies Liu] change for to while
    d96929b [Davies Liu] address comment
    6ad2a90 [Davies Liu] fix style
    5819d33 [Davies Liu] unify equals() and hashCode()
    0fff25d [Davies Liu] fix style
    53c38b1 [Davies Liu] fix hashCode() and equals() of BinaryType in Row
    Davies Liu authored and marmbrus committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    6f4cadf View commit details
    Browse the repository at this point in the history
  16. [SPARK-7888] Be able to disable intercept in linear regression in ml …

    …package
    
    Author: Holden Karau <holden@pigscanfly.ca>
    
    Closes apache#6927 from holdenk/SPARK-7888-Be-able-to-disable-intercept-in-Linear-Regression-in-ML-package and squashes the following commits:
    
    0ad384c [Holden Karau] Add MiMa excludes
    4016fac [Holden Karau] Switch to wild card import, remove extra blank lines
    ae5baa8 [Holden Karau] CR feedback, move the fitIntercept down rather than changing ymean and etc above
    f34971c [Holden Karau] Fix some more long lines
    319bd3f [Holden Karau] Fix long lines
    3bb9ee1 [Holden Karau] Update the regression suite tests
    7015b9f [Holden Karau] Our code performs the same with R, except we need more than one data point but that seems reasonable
    0b0c8c0 [Holden Karau] fix the issue with the sample R code
    e2140ba [Holden Karau] Add a test, it fails!
    5e84a0b [Holden Karau] Write out thoughts and use the correct trait
    91ffc0a [Holden Karau] more murh
    006246c [Holden Karau] murp?
    holdenk authored and DB Tsai committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    2b1111d View commit details
    Browse the repository at this point in the history
  17. [SPARK-8265] [MLLIB] [PYSPARK] Add LinearDataGenerator to pyspark.mll…

    …ib.utils
    
    It is useful to generate linear data for easy testing of linear models and in general. Scala already has it. This is just a wrapper around the Scala code.
    
    Author: MechCoder <manojkumarsivaraj334@gmail.com>
    
    Closes apache#6715 from MechCoder/generate_linear_input and squashes the following commits:
    
    6182884 [MechCoder] Minor changes
    8bda047 [MechCoder] Minor style fixes
    0f1053c [MechCoder] [SPARK-8265] Add LinearDataGenerator to pyspark.mllib.utils
    MechCoder authored and mengxr committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    f2022fa View commit details
    Browse the repository at this point in the history
  18. [SPARK-8111] [SPARKR] SparkR shell should display Spark logo and vers…

    …ion banner on startup.
    
    spark version is taken from the environment variable SPARK_VERSION
    
    Author: Alok  Singh <singhal@Aloks-MacBook-Pro.local>
    Author: Alok  Singh <singhal@aloks-mbp.usca.ibm.com>
    
    Closes apache#6944 from aloknsingh/aloknsingh_spark_jiras and squashes the following commits:
    
    ed607bd [Alok  Singh] [SPARK-8111][SparkR] As per suggestion, 1) using the version from sparkContext rather than the Sys.env. 2) change "Welcome to SparkR!" to "Welcome to" followed by Spark logo and version
    acd5b85 [Alok  Singh] fix the jira SPARK-8111 to add the spark version and logo. Currently spark version is taken from the environment variable SPARK_VERSION
    Alok Singh authored and shivaram committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    f2fb028 View commit details
    Browse the repository at this point in the history
  19. [SPARK-8525] [MLLIB] fix LabeledPoint parser when there is a whitespa…

    …ce between label and features vector
    
    fix LabeledPoint parser when there is a whitespace between label and features vector, e.g.
    (y, [x1, x2, x3])
    
    Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>
    
    Closes apache#6954 from fe2s/SPARK-8525 and squashes the following commits:
    
    0755b9d [Oleksiy Dyagilev] [SPARK-8525][MLLIB] addressing comment, removing dep on commons-lang
    c1abc2b [Oleksiy Dyagilev] [SPARK-8525][MLLIB] fix LabeledPoint parser when there is a whitespace on specific position
    fe2s authored and mengxr committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    a803118 View commit details
    Browse the repository at this point in the history
  20. [DOC] [SQL] Addes Hive metastore Parquet table conversion section

    This PR adds a section about Hive metastore Parquet table conversion. It documents:
    
    1. Schema reconciliation rules introduced in apache#5214 (see [this comment] [1] in apache#5188)
    2. Metadata refreshing requirement introduced in apache#5339
    
    [1]: apache#5188 (comment)
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits:
    
    42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet
    4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet
    756e660 [Cheng Lian] Adds Python snippet for metadata refreshing
    50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section
    liancheng committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    d96d7b5 View commit details
    Browse the repository at this point in the history
  21. [SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column i…

    …s used in booelan expression
    
    It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `|` instead.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#6961 from davies/column_bool and squashes the following commits:
    
    9f19beb [Davies Liu] update message
    af74bd6 [Davies Liu] fix tests
    07dff84 [Davies Liu] address comments, fix tests
    f70c08e [Davies Liu] raise Exception if column is used in booelan expression
    Davies Liu committed Jun 23, 2015
    Configuration menu
    Copy the full SHA
    7fb5ae5 View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2015

  1. [SPARK-8139] [SQL] Updates docs and comments of data sources and Parq…

    …uet output committer options
    
    This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0.
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#6683 from liancheng/output-committer-docs and squashes the following commits:
    
    b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option
    ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
    liancheng committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    111d6b9 View commit details
    Browse the repository at this point in the history
  2. [SPARK-7157][SQL] add sampleBy to DataFrame

    Add `sampleBy` to DataFrame. rxin
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes apache#6769 from mengxr/SPARK-7157 and squashes the following commits:
    
    991f26f [Xiangrui Meng] fix seed
    4a14834 [Xiangrui Meng] move sampleBy to stat
    832f7cc [Xiangrui Meng] add sampleBy to DataFrame
    mengxr authored and rxin committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    0401cba View commit details
    Browse the repository at this point in the history
  3. Revert "[SPARK-7157][SQL] add sampleBy to DataFrame"

    This reverts commit 0401cba.
    
    The new test case on Jenkins is failing.
    rxin committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    a458efc View commit details
    Browse the repository at this point in the history
  4. [SPARK-6749] [SQL] Make metastore client robust to underlying socket …

    …connection loss
    
    This works around a bug in the underlying RetryingMetaStoreClient (HIVE-10384) by refreshing the metastore client on thrift exceptions. We attempt to emulate the proper hive behavior by retrying only as configured by hiveconf.
    
    Author: Eric Liang <ekl@databricks.com>
    
    Closes apache#6912 from ericl/spark-6749 and squashes the following commits:
    
    2d54b55 [Eric Liang] use conf from state
    0e3a74e [Eric Liang] use shim properly
    980b3e5 [Eric Liang] Fix conf parsing hive 0.14 conf.
    92459b6 [Eric Liang] Work around RetryingMetaStoreClient bug
    ericl authored and yhuai committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    50c3a86 View commit details
    Browse the repository at this point in the history
  5. [HOTFIX] [BUILD] Fix MiMa checks in master branch; enable MiMa for la…

    …uncher project
    
    This commit changes the MiMa tests to test against the released 1.4.0 artifacts rather than 1.4.0-rc4; this change is necessary to fix a Jenkins build break since it seems that the RC4 snapshot is no longer available via Maven.
    
    I also enabled MiMa checks for the `launcher` subproject, which we should have done right after 1.4.0 was released.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#6974 from JoshRosen/mima-hotfix and squashes the following commits:
    
    4b4175a [Josh Rosen] [HOTFIX] [BUILD] Fix MiMa checks in master branch; enable MiMa for launcher project
    JoshRosen authored and rxin committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    13ae806 View commit details
    Browse the repository at this point in the history
  6. [SPARK-8371] [SQL] improve unit test for MaxOf and MinOf and fix bugs

    a follow up of apache#6813
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#6825 from cloud-fan/cg and squashes the following commits:
    
    43170cc [Wenchen Fan] fix bugs in code gen
    cloud-fan authored and Davies Liu committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    09fcf96 View commit details
    Browse the repository at this point in the history
  7. [SPARK-8138] [SQL] Improves error message when conflicting partition …

    …columns are found
    
    This PR improves the error message shown when conflicting partition column names are detected.  This can be particularly annoying and confusing when there are a large number of partitions while a handful of them happened to contain unexpected temporary file(s).  Now all suspicious directories are listed as below:
    
    ```
    java.lang.AssertionError: assertion failed: Conflicting partition column names detected:
    
            Partition column name list #0: b, c, d
            Partition column name list #1: b, c
            Partition column name list #2: b
    
    For partitioned table directories, data files should only live in leaf directories. Please check the following directories for unexpected files:
    
            file:/tmp/foo/b=0
            file:/tmp/foo/b=1
            file:/tmp/foo/b=1/c=1
            file:/tmp/foo/b=0/c=0
    ```
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#6610 from liancheng/part-errmsg and squashes the following commits:
    
    7d05f2c [Cheng Lian] Fixes Scala style issue
    a149250 [Cheng Lian] Adds test case for the error message
    6b74dd8 [Cheng Lian] Also lists suspicious non-leaf partition directories
    a935eb8 [Cheng Lian] Improves error message when conflicting partition columns are found
    liancheng committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    cc465fd View commit details
    Browse the repository at this point in the history
  8. [SPARK-8567] [SQL] Debugging flaky HiveSparkSubmitSuite

    Using similar approach used in `HiveThriftServer2Suite` to print stdout/stderr of the spawned process instead of logging them to see what happens on Jenkins. (This test suite only fails on Jenkins and doesn't spill out any log...)
    
    cc yhuai
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#6978 from liancheng/debug-hive-spark-submit-suite and squashes the following commits:
    
    b031647 [Cheng Lian] Prints process stdout/stderr instead of logging them
    liancheng authored and yhuai committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    9d36ec2 View commit details
    Browse the repository at this point in the history
  9. [SPARK-8578] [SQL] Should ignore user defined output committer when a…

    …ppending data
    
    https://issues.apache.org/jira/browse/SPARK-8578
    
    It is not very safe to use a custom output committer when append data to an existing dir. This changes adds the logic to check if we are appending data, and if so, we use the output committer associated with the file output format.
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes apache#6964 from yhuai/SPARK-8578 and squashes the following commits:
    
    43544c4 [Yin Huai] Do not use a custom output commiter when appendiing data.
    yhuai committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    bba6699 View commit details
    Browse the repository at this point in the history
  10. [SPARK-8576] Add spark-ec2 options to set IAM roles and instance-init…

    …iated shutdown behavior
    
    Both of these options are useful when spark-ec2 is being used as part of an automated pipeline and the engineers want to minimize the need to pass around AWS keys for access to things like S3 (keys are replaced by the IAM role) and to be able to launch a cluster that can terminate itself cleanly.
    
    Author: Nicholas Chammas <nicholas.chammas@gmail.com>
    
    Closes apache#6962 from nchammas/additional-ec2-options and squashes the following commits:
    
    fcf252e [Nicholas Chammas] PEP8 fixes
    efba9ee [Nicholas Chammas] add help for --instance-initiated-shutdown-behavior
    598aecf [Nicholas Chammas] option to launch instances into IAM role
    2743632 [Nicholas Chammas] add option for instance initiated shutdown
    nchammas authored and shivaram committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    31f48e5 View commit details
    Browse the repository at this point in the history
  11. [SPARK-8399] [STREAMING] [WEB UI] Overlap between histograms and axis…

    …' name in Spark Streaming UI
    
    Moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui so the histograms and the axis' name do not overlap.
    
    Author: BenFradet <benjamin.fradet@gmail.com>
    
    Closes apache#6845 from BenFradet/SPARK-8399 and squashes the following commits:
    
    b63695f [BenFradet] adjusted inner histograms
    eb610ee [BenFradet] readjusted #batches on the x axis
    dd46f98 [BenFradet] aligned all unit labels and ticks
    0564b62 [BenFradet] readjusted #batches placement
    edd0936 [BenFradet] moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui
    BenFradet authored and tdas committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    1173483 View commit details
    Browse the repository at this point in the history
  12. [SPARK-8506] Add pakages to R context created through init.

    Author: Holden Karau <holden@pigscanfly.ca>
    
    Closes apache#6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits:
    
    b60dd63 [Holden Karau] Add an example with the spark-csv package
    fa8bc92 [Holden Karau] typo: sparm -> spark
    865a90c [Holden Karau] strip spaces for comparision
    c7a4471 [Holden Karau] Add some documentation
    c1a9233 [Holden Karau] refactor for testing
    c818556 [Holden Karau] Add pakages to R
    holdenk authored and shivaram committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    43e6619 View commit details
    Browse the repository at this point in the history
  13. [SPARK-7088] [SQL] Fix analysis for 3rd party logical plan.

    ResolveReferences analysis rule now does not throw when it cannot resolve references in a self-join.
    
    Author: Santiago M. Mola <smola@stratio.com>
    
    Closes apache#6853 from smola/SPARK-7088 and squashes the following commits:
    
    af71ac7 [Santiago M. Mola] [SPARK-7088] Fix analysis for 3rd party logical plan.
    smola authored and marmbrus committed Jun 24, 2015
    1 Configuration menu
    Copy the full SHA
    b84d4b4 View commit details
    Browse the repository at this point in the history
  14. [SPARK-7289] handle project -> limit -> sort efficiently

    make the `TakeOrdered` strategy and operator more general, such that it can optionally handle a projection when necessary
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#6780 from cloud-fan/limit and squashes the following commits:
    
    34aa07b [Wenchen Fan] revert
    07d5456 [Wenchen Fan] clean closure
    20821ec [Wenchen Fan] fix
    3676a82 [Wenchen Fan] address comments
    b558549 [Wenchen Fan] address comments
    214842b [Wenchen Fan] fix style
    2d8be83 [Wenchen Fan] add LimitPushDown
    948f740 [Wenchen Fan] fix existing
    cloud-fan authored and marmbrus committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    f04b567 View commit details
    Browse the repository at this point in the history
  15. [SPARK-7633] [MLLIB] [PYSPARK] Python bindings for StreamingLogisticR…

    …egressionwithSGD
    
    Add Python bindings to StreamingLogisticRegressionwithSGD.
    
    No Java wrappers are needed as models are updated directly using train.
    
    Author: MechCoder <manojkumarsivaraj334@gmail.com>
    
    Closes apache#6849 from MechCoder/spark-3258 and squashes the following commits:
    
    b4376a5 [MechCoder] minor
    d7e5fc1 [MechCoder] Refactor into StreamingLinearAlgorithm Better docs
    9c09d4e [MechCoder] [SPARK-7633] Python bindings for StreamingLogisticRegressionwithSGD
    MechCoder authored and mengxr committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    fb32c38 View commit details
    Browse the repository at this point in the history
  16. [SPARK-6777] [SQL] Implements backwards compatibility rules in Cataly…

    …stSchemaConverter
    
    This PR introduces `CatalystSchemaConverter` for converting Parquet schema to Spark SQL schema and vice versa.  Original conversion code in `ParquetTypesConverter` is removed. Benefits of the new version are:
    
    1. When converting Spark SQL schemas, it generates standard Parquet schemas conforming to [the most updated Parquet format spec] [1]. Converting to old style Parquet schemas is also supported via feature flag `spark.sql.parquet.followParquetFormatSpec` (which is set to `false` for now, and should be set to `true` after both read and write paths are fixed).
    
       Note that although this version of Parquet format spec hasn't been officially release yet, Parquet MR 1.7.0 already sticks to it. So it should be safe to follow.
    
    1. It implements backwards-compatibility rules described in the most updated Parquet format spec. Thus can recognize more schema patterns generated by other/legacy systems/tools.
    1. Code organization follows convention used in [parquet-mr] [2], which is easier to follow. (Structure of `CatalystSchemaConverter` is similar to `AvroSchemaConverter`).
    
    To fully implement backwards-compatibility rules in both read and write path, we also need to update `CatalystRowConverter` (which is responsible for converting Parquet records to `Row`s), `RowReadSupport`, and `RowWriteSupport`. These would be done in follow-up PRs.
    
    TODO
    
    - [x] More schema conversion test cases for legacy schema patterns.
    
    [1]: https://github.com/apache/parquet-format/blob/ea095226597fdbecd60c2419d96b54b2fdb4ae6c/LogicalTypes.md
    [2]: https://github.com/apache/parquet-mr/
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#6617 from liancheng/spark-6777 and squashes the following commits:
    
    2a2062d [Cheng Lian] Don't convert decimals without precision information
    b60979b [Cheng Lian] Adds a constructor which accepts a Configuration, and fixes default value of assumeBinaryIsString
    743730f [Cheng Lian] Decimal scale shouldn't be larger than precision
    a104a9e [Cheng Lian] Fixes Scala style issue
    1f71d8d [Cheng Lian] Adds feature flag to allow falling back to old style Parquet schema conversion
    ba84f4b [Cheng Lian] Fixes MapType schema conversion bug
    13cb8d5 [Cheng Lian] Fixes MiMa failure
    81de5b0 [Cheng Lian] Fixes UDT, workaround read path, and add tests
    28ef95b [Cheng Lian] More AnalysisExceptions
    b10c322 [Cheng Lian] Replaces require() with analysisRequire() which throws AnalysisException
    cceaf3f [Cheng Lian] Implements backwards compatibility rules in CatalystSchemaConverter
    liancheng committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    8ab5076 View commit details
    Browse the repository at this point in the history
  17. [SPARK-8558] [BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS e…

    …nv var set
    
    Author: fe2s <aka.fe2s@gmail.com>
    Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>
    
    Closes apache#6956 from fe2s/fix-run-tests and squashes the following commits:
    
    31b6edc [fe2s] str is a built-in function, so using it as a variable name will lead to spurious warnings in some Python linters
    7d781a0 [fe2s] fixing for openjdk/IBM, seems like they have slightly different wording, but all have 'version' word. Surrounding with spaces for the case if version word appears in _JAVA_OPTIONS
    cd455ef [fe2s] address comment, looking for java version string rather than expecting to have on a certain line number
    ad577d7 [Oleksiy Dyagilev] [SPARK-8558][BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set
    fe2s authored and JoshRosen committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    dca21a8 View commit details
    Browse the repository at this point in the history
  18. [SPARK-8567] [SQL] Increase the timeout of HiveSparkSubmitSuite

    https://issues.apache.org/jira/browse/SPARK-8567
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes apache#6957 from yhuai/SPARK-8567 and squashes the following commits:
    
    62dff5b [Yin Huai] Increase the timeout.
    yhuai authored and Andrew Or committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    7daa702 View commit details
    Browse the repository at this point in the history
  19. [SPARK-8075] [SQL] apply type check interface to more expressions

    a follow up of apache#6405.
    Note: It's not a big change, a lot of changing is due to I swap some code in `aggregates.scala` to make aggregate functions right below its corresponding aggregate expressions.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#6723 from cloud-fan/type-check and squashes the following commits:
    
    2124301 [Wenchen Fan] fix tests
    5a658bb [Wenchen Fan] add tests
    287d3bb [Wenchen Fan] apply type check interface to more expressions
    cloud-fan authored and marmbrus committed Jun 24, 2015
    Configuration menu
    Copy the full SHA
    b71d325 View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2015

  1. Two minor SQL cleanup (compiler warning & indent).

    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7000 from rxin/minor-cleanup and squashes the following commits:
    
    046044c [Reynold Xin] Two minor SQL cleanup (compiler warning & indent).
    rxin committed Jun 25, 2015
    Configuration menu
    Copy the full SHA
    82f80c1 View commit details
    Browse the repository at this point in the history
  2. [SPARK-7884] Move block deserialization from BlockStoreShuffleFetcher…

    … to ShuffleReader
    
    This commit updates the shuffle read path to enable ShuffleReader implementations more control over the deserialization process.
    
    The BlockStoreShuffleFetcher.fetch() method has been renamed to BlockStoreShuffleFetcher.fetchBlockStreams(). Previously, this method returned a record iterator; now, it returns an iterator of (BlockId, InputStream). Deserialization of records is now handled in the ShuffleReader.read() method.
    
    This change creates a cleaner separation of concerns and allows implementations of ShuffleReader more flexibility in how records are retrieved.
    
    Author: Matt Massie <massie@cs.berkeley.edu>
    Author: Kay Ousterhout <kayousterhout@gmail.com>
    
    Closes apache#6423 from massie/shuffle-api-cleanup and squashes the following commits:
    
    8b0632c [Matt Massie] Minor Scala style fixes
    d0a1b39 [Matt Massie] Merge pull request #1 from kayousterhout/massie_shuffle-api-cleanup
    290f1eb [Kay Ousterhout] Added test for HashShuffleReader.read()
    5186da0 [Kay Ousterhout] Revert "Add test to ensure HashShuffleReader is freeing resources"
    f98a1b9 [Matt Massie] Add test to ensure HashShuffleReader is freeing resources
    a011bfa [Matt Massie] Use PrivateMethodTester on check that delegate stream is closed
    4ea1712 [Matt Massie] Small code cleanup for readability
    7429a98 [Matt Massie] Update tests to check that BufferReleasingStream is closing delegate InputStream
    f458489 [Matt Massie] Remove unnecessary map() on return Iterator
    4abb855 [Matt Massie] Consolidate metric code. Make it clear why InterrubtibleIterator is needed.
    5c30405 [Matt Massie] Return visibility of BlockStoreShuffleFetcher to private[hash]
    7eedd1d [Matt Massie] Small Scala import cleanup
    28f8085 [Matt Massie] Small import nit
    f93841e [Matt Massie] Update shuffle read metrics in ShuffleReader instead of BlockStoreShuffleFetcher.
    7e8e0fe [Matt Massie] Minor Scala style fixes
    01e8721 [Matt Massie] Explicitly cast iterator in branches for type clarity
    7c8f73e [Matt Massie] Close Block InputStream immediately after all records are read
    208b7a5 [Matt Massie] Small code style changes
    b70c945 [Matt Massie] Make BlockStoreShuffleFetcher visible to shuffle package
    19135f2 [Matt Massie] [SPARK-7884] Allow Spark shuffle APIs to be more customizable
    massie authored and kayousterhout committed Jun 25, 2015
    Configuration menu
    Copy the full SHA
    7bac2fe View commit details
    Browse the repository at this point in the history
  3. [SPARK-8604] [SQL] HadoopFsRelation subclasses should set their outpu…

    …t format class
    
    `HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers).
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#6998 from liancheng/spark-8604 and squashes the following commits:
    
    9be51d1 [Cheng Lian] Adds more comments
    6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class
    liancheng committed Jun 25, 2015
    Configuration menu
    Copy the full SHA
    c337844 View commit details
    Browse the repository at this point in the history
  4. [SPARK-5768] [WEB UI] Fix for incorrect memory in Spark UI

    Fix for incorrect memory in Spark UI as per SPARK-5768
    
    Author: Joshi <rekhajoshm@gmail.com>
    Author: Rekha Joshi <rekhajoshm@gmail.com>
    
    Closes apache#6972 from rekhajoshm/SPARK-5768 and squashes the following commits:
    
    b678a91 [Joshi] Fix for incorrect memory in Spark UI
    2fe53d9 [Joshi] Fix for incorrect memory in Spark UI
    eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI
    0be142d [Rekha Joshi] Merge pull request #3 from apache/master
    106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
    e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
    rekhajoshm authored and sarutak committed Jun 25, 2015
    Configuration menu
    Copy the full SHA
    085a721 View commit details
    Browse the repository at this point in the history
  5. [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/ta…

    …rget versions.
    
    I basically copied the compatibility rules from the top level pom.xml into here.  Someone more familiar with all the options in the top level pom may want to make sure nothing else should be copied on down.
    
    With this is allows me to build with jdk8 and run with lower versions.  Source shows compiled for jdk6 as its supposed to.
    
    Author: Tom Graves <tgraves@yahoo-inc.com>
    Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
    
    Closes apache#6989 from tgravescs/SPARK-8574 and squashes the following commits:
    
    e1ea2d4 [Thomas Graves] Change to use combine.children="append"
    150d645 [Tom Graves] [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/target versions
    Tom Graves committed Jun 25, 2015
    Configuration menu
    Copy the full SHA
    e988adb View commit details
    Browse the repository at this point in the history
  6. [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmit…

    …Suite.
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes apache#7009 from yhuai/SPARK-8567 and squashes the following commits:
    
    62fb1f9 [Yin Huai] Add sc.stop().
    b22cf7d [Yin Huai] Add logs.
    yhuai committed Jun 25, 2015
    Configuration menu
    Copy the full SHA
    f9b397f View commit details
    Browse the repository at this point in the history
  7. [MINOR] [MLLIB] rename some functions of PythonMLLibAPI

    Keep the same naming conventions for PythonMLLibAPI.
    Only the following three functions is different from others
    ```scala
    trainNaiveBayes
    trainGaussianMixture
    trainWord2Vec
    ```
    So change them to
    ```scala
    trainNaiveBayesModel
    trainGaussianMixtureModel
    trainWord2VecModel
    ```
    It does not affect any users and public APIs, only to make better understand for developer and code hacker.
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes apache#7011 from yanboliang/py-mllib-api-rename and squashes the following commits:
    
    771ffec [Yanbo Liang] rename some functions of PythonMLLibAPI
    yanboliang authored and mengxr committed Jun 25, 2015
    Configuration menu
    Copy the full SHA
    2519dcc View commit details
    Browse the repository at this point in the history
  8. [SPARK-8637] [SPARKR] [HOTFIX] Fix packages argument, sparkSubmitBinName

    cc cafreeman
    
    Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    
    Closes apache#7022 from shivaram/sparkr-init-hotfix and squashes the following commits:
    
    9178d15 [Shivaram Venkataraman] Fix packages argument, sparkSubmitBinName
    shivaram committed Jun 25, 2015
    Configuration menu
    Copy the full SHA
    c392a9e View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2015

  1. [SPARK-8237] [SQL] Add misc function sha2

    JIRA: https://issues.apache.org/jira/browse/SPARK-8237
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes apache#6934 from viirya/expr_sha2 and squashes the following commits:
    
    35e0bb3 [Liang-Chi Hsieh] For comments.
    68b5284 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2
    8573aff [Liang-Chi Hsieh] Remove unnecessary Product.
    ee61e06 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2
    59e41aa [Liang-Chi Hsieh] Add misc function: sha2.
    viirya authored and Davies Liu committed Jun 26, 2015
    Configuration menu
    Copy the full SHA
    47c874b View commit details
    Browse the repository at this point in the history
  2. [SPARK-8620] [SQL] cleanup CodeGenContext

    fix docs, remove nativeTypes , use java type to get boxed type ,default value, etc. to avoid handle `DateType` and `TimestampType` as int and long again and again.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#7010 from cloud-fan/cg and squashes the following commits:
    
    aa01cf9 [Wenchen Fan] cleanup CodeGenContext
    cloud-fan authored and Davies Liu committed Jun 26, 2015
    Configuration menu
    Copy the full SHA
    4036011 View commit details
    Browse the repository at this point in the history
  3. [SPARK-8635] [SQL] improve performance of CatalystTypeConverters

    In `CatalystTypeConverters.createToCatalystConverter`, we add special handling for primitive types. We can apply this strategy to more places to improve performance.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#7018 from cloud-fan/converter and squashes the following commits:
    
    8b16630 [Wenchen Fan] another fix
    326c82c [Wenchen Fan] optimize type converter
    cloud-fan authored and Davies Liu committed Jun 26, 2015
    Configuration menu
    Copy the full SHA
    1a79f0e View commit details
    Browse the repository at this point in the history
  4. [SPARK-8344] Add message processing time metric to DAGScheduler

    This commit adds a new metric, `messageProcessingTime`, to the DAGScheduler metrics source. This metrics tracks the time taken to process messages in the scheduler's event processing loop, which is a helpful debugging aid for diagnosing performance issues in the scheduler (such as SPARK-4961).
    
    In order to do this, I moved the creation of the DAGSchedulerSource metrics source into DAGScheduler itself, similar to how MasterSource is created and registered in Master.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7002 from JoshRosen/SPARK-8344 and squashes the following commits:
    
    57f914b [Josh Rosen] Fix import ordering
    7d6bb83 [Josh Rosen] Add message processing time metrics to DAGScheduler
    JoshRosen committed Jun 26, 2015
    Configuration menu
    Copy the full SHA
    9fed6ab View commit details
    Browse the repository at this point in the history
  5. [SPARK-8613] [ML] [TRIVIAL] add param to disable linear feature scaling

    Add a param to disable linear feature scaling (to be implemented later in linear & logistic regression). Done as a seperate PR so we can use same param & not conflict while working on the sub-tasks.
    
    Author: Holden Karau <holden@pigscanfly.ca>
    
    Closes apache#7024 from holdenk/SPARK-8522-Disable-Linear_featureScaling-Spark-8613-Add-param and squashes the following commits:
    
    ce8931a [Holden Karau] Regenerate the sharedParams code
    fa6427e [Holden Karau] update text for standardization param.
    7b24a2b [Holden Karau] generate the new standardization param
    3c190af [Holden Karau] Add the standardization param to sharedparamscodegen
    holdenk authored and DB Tsai committed Jun 26, 2015
    Configuration menu
    Copy the full SHA
    c9e05a3 View commit details
    Browse the repository at this point in the history
  6. [SPARK-8302] Support heterogeneous cluster install paths on YARN.

    Some users have Hadoop installations on different paths across
    their cluster. Currently, that makes it hard to set up some
    configuration in Spark since that requires hardcoding paths to
    jar files or native libraries, which wouldn't work on such a cluster.
    
    This change introduces a couple of YARN-specific configurations
    that instruct the backend to replace certain paths when launching
    remote processes. That way, if the configuration says the Spark
    jar is in "/spark/spark.jar", and also says that "/spark" should be
    replaced with "{{SPARK_INSTALL_DIR}}", YARN will start containers
    in the NMs with "{{SPARK_INSTALL_DIR}}/spark.jar" as the location
    of the jar.
    
    Coupled with YARN's environment whitelist (which allows certain
    env variables to be exposed to containers), this allows users to
    support such heterogeneous environments, as long as a single
    replacement is enough. (Otherwise, this feature would need to be
    extended to support multiple path replacements.)
    
    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes apache#6752 from vanzin/SPARK-8302 and squashes the following commits:
    
    4bff8d4 [Marcelo Vanzin] Add docs, rename configs.
    0aa2a02 [Marcelo Vanzin] Only do replacement for paths that need it.
    2e9cc9d [Marcelo Vanzin] Style.
    a5e1f68 [Marcelo Vanzin] [SPARK-8302] Support heterogeneous cluster install paths on YARN.
    Marcelo Vanzin authored and squito committed Jun 26, 2015
    Configuration menu
    Copy the full SHA
    37bf76a View commit details
    Browse the repository at this point in the history
  7. [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.tes…

    …tmod()
    
    This patch addresses a critical issue in the PySpark tests:
    
    Several of our Python modules' `__main__` methods call `doctest.testmod()` in order to run doctests but forget to check and handle its return value. As a result, some PySpark test failures can go unnoticed because they will not fail the build.
    
    Fortunately, there was only one test failure which was masked by this bug: a `pyspark.profiler` doctest was failing due to changes in RDD pipelining.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7032 from JoshRosen/testmod-fix and squashes the following commits:
    
    60dbdc0 [Josh Rosen] Account for int vs. long formatting change in Python 3
    8b8d80a [Josh Rosen] Fix failing test.
    e6423f9 [Josh Rosen] Check return code for all uses of doctest.testmod().
    JoshRosen authored and Davies Liu committed Jun 26, 2015
    Configuration menu
    Copy the full SHA
    41afa16 View commit details
    Browse the repository at this point in the history
  8. [SPARK-8662] SparkR Update SparkSQL Test

    Test `infer_type` using a more fine-grained approach rather than comparing environments. Since `all.equal`'s behavior has changed in R 3.2, the test became unpassable.
    
    JIRA here:
    https://issues.apache.org/jira/browse/SPARK-8662
    
    Author: cafreeman <cfreeman@alteryx.com>
    
    Closes apache#7045 from cafreeman/R32_Test and squashes the following commits:
    
    b97cc52 [cafreeman] Add `checkStructField` utility
    3381e5c [cafreeman] Update SparkSQL Test
    
    (cherry picked from commit 78b31a2)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    cafreeman authored and shivaram committed Jun 26, 2015
    Configuration menu
    Copy the full SHA
    a56516f View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2015

  1. [SPARK-8607] SparkR -- jars not being added to application classpath …

    …correctly
    
    Add `getStaticClass` method in SparkR's `RBackendHandler`
    
    This is a fix for the problem referenced in [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185).
    
    cc shivaram
    
    Author: cafreeman <cfreeman@alteryx.com>
    
    Closes apache#7001 from cafreeman/branch-1.4 and squashes the following commits:
    
    8f81194 [cafreeman] Add missing license
    31aedcf [cafreeman] Refactor test to call an external R script
    2c22073 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
    0bea809 [cafreeman] Fixed relative path issue and added smaller JAR
    ee25e60 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
    9a5c362 [cafreeman] test for including JAR when launching sparkContext
    9101223 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
    5a80844 [cafreeman] Fix style nits
    7c6bd0c [cafreeman] [SPARK-8607] SparkR
    
    (cherry picked from commit 2579948)
    Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
    cafreeman authored and shivaram committed Jun 27, 2015
    Configuration menu
    Copy the full SHA
    9d11817 View commit details
    Browse the repository at this point in the history
  2. [SPARK-8639] [DOCS] Fixed Minor Typos in Documentation

    Ticket: [SPARK-8639](https://issues.apache.org/jira/browse/SPARK-8639)
    
    fixed minor typos in docs/README.md and docs/api.md
    
    Author: Rosstin <asterazul@gmail.com>
    
    Closes apache#7046 from Rosstin/SPARK-8639 and squashes the following commits:
    
    6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
    Rosstin authored and srowen committed Jun 27, 2015
    Configuration menu
    Copy the full SHA
    b5a6663 View commit details
    Browse the repository at this point in the history
  3. [SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN…

    …" document
    
    As per the description in the JIRA, I moved the contents of the page and added a few additional content.
    
    Author: Neelesh Srinivas Salian <nsalian@cloudera.com>
    
    Closes apache#6924 from nssalian/SPARK-3629 and squashes the following commits:
    
    944b7a0 [Neelesh Srinivas Salian] Changed the lines about deploy-mode and added backticks to all parameters
    40dbc0b [Neelesh Srinivas Salian] Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line
    9cbc072 [Neelesh Srinivas Salian] Updated a few lines in the Launching Spark on YARN Section
    8e8db7f [Neelesh Srinivas Salian] Removed the changes in this commit to help clearly distinguish movement from update
    151c298 [Neelesh Srinivas Salian] SPARK-3629: Improvement of the Spark on YARN document
    Neelesh Srinivas Salian authored and srowen committed Jun 27, 2015
    Configuration menu
    Copy the full SHA
    d48e789 View commit details
    Browse the repository at this point in the history
  4. [SPARK-8623] Hadoop RDDs fail to properly serialize configuration

    Author: Sandy Ryza <sandy@cloudera.com>
    
    Closes apache#7050 from sryza/sandy-spark-8623 and squashes the following commits:
    
    58a8079 [Sandy Ryza] SPARK-8623. Hadoop RDDs fail to properly serialize configuration
    sryza authored and JoshRosen committed Jun 27, 2015
    Configuration menu
    Copy the full SHA
    4153776 View commit details
    Browse the repository at this point in the history
  5. [SPARK-8606] Prevent exceptions in RDD.getPreferredLocations() from c…

    …rashing DAGScheduler
    
    If `RDD.getPreferredLocations()` throws an exception it may crash the DAGScheduler and SparkContext. This patch addresses this by adding a try-catch block.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7023 from JoshRosen/SPARK-8606 and squashes the following commits:
    
    770b169 [Josh Rosen] Fix getPreferredLocations() DAGScheduler crash with try block.
    44a9b55 [Josh Rosen] Add test of a buggy getPartitions() method
    19aa9f7 [Josh Rosen] Add (failing) regression test for getPreferredLocations() DAGScheduler crash
    JoshRosen committed Jun 27, 2015
    Configuration menu
    Copy the full SHA
    0b5abbf View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2015

  1. [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integr…

    …ate with dev/run-tests module system
    
    This patch refactors the `python/run-tests` script:
    
    - It's now written in Python instead of Bash.
    - The descriptions of the tests to run are now stored in `dev/run-tests`'s modules.  This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes.  For example, we can now skip the PySpark Streaming test cases when only SQL files are changed.
    - `python/run-tests` now supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482):
    
      ```
    Usage: run-tests [options]
    
    Options:
      -h, --help            show this help message and exit
      --python-executables=PYTHON_EXECUTABLES
                            A comma-separated list of Python executables to test
                            against (default: python2.6,python3.4,pypy)
      --modules=MODULES     A comma-separated list of Python modules to test
                            (default: pyspark-core,pyspark-ml,pyspark-mllib
                            ,pyspark-sql,pyspark-streaming)
       ```
    - `dev/run-tests` has been split into multiple files: the module definitions and test utility functions are now stored inside of a `dev/sparktestsupport` Python module, allowing them to be re-used from the Python test runner script.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#6967 from JoshRosen/run-tests-python-modules and squashes the following commits:
    
    f578d6d [Josh Rosen] Fix print for Python 2.x
    8233d61 [Josh Rosen] Add python/run-tests.py to Python lint checks
    34c98d2 [Josh Rosen] Fix universal_newlines for Python 3
    8f65ed0 [Josh Rosen] Fix handling of  module in python/run-tests
    37aff00 [Josh Rosen] Python 3 fix
    27a389f [Josh Rosen] Skip MLLib tests for PyPy
    c364ccf [Josh Rosen] Use which() to convert PYSPARK_PYTHON to an absolute path before shelling out to run tests
    568a3fd [Josh Rosen] Fix hashbang
    3b852ae [Josh Rosen] Fall back to PYSPARK_PYTHON when sys.executable is None (fixes a test)
    f53db55 [Josh Rosen] Remove python2 flag, since the test runner script also works fine under Python 3
    9c80469 [Josh Rosen] Fix passing of PYSPARK_PYTHON
    d33e525 [Josh Rosen] Merge remote-tracking branch 'origin/master' into run-tests-python-modules
    4f8902c [Josh Rosen] Python lint fixes.
    8f3244c [Josh Rosen] Use universal_newlines to fix dev/run-tests doctest failures on Python 3.
    f542ac5 [Josh Rosen] Fix lint check for Python 3
    fff4d09 [Josh Rosen] Add dev/sparktestsupport to pep8 checks
    2efd594 [Josh Rosen] Update dev/run-tests to use new Python test runner flags
    b2ab027 [Josh Rosen] Add command-line options for running individual suites in python/run-tests
    caeb040 [Josh Rosen] Fixes to PySpark test module definitions
    d6a77d3 [Josh Rosen] Fix the tests of dev/run-tests
    def2d8a [Josh Rosen] Two minor fixes
    aec0b8f [Josh Rosen] Actually get the Kafka stuff to run properly
    04015b9 [Josh Rosen] First attempt at getting PySpark Kafka test to work in new runner script
    4c97136 [Josh Rosen] PYTHONPATH fixes
    dcc9c09 [Josh Rosen] Fix time division
    32660fc [Josh Rosen] Initial cut at Python test runner refactoring
    311c6a9 [Josh Rosen] Move shell utility functions to own module.
    1bdeb87 [Josh Rosen] Move module definitions to separate file.
    JoshRosen authored and Davies Liu committed Jun 28, 2015
    Configuration menu
    Copy the full SHA
    40648c5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    42db3a1 View commit details
    Browse the repository at this point in the history
  3. [SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all

    Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.
    
    See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:
    
    70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
    JoshRosen committed Jun 28, 2015
    Configuration menu
    Copy the full SHA
    f510045 View commit details
    Browse the repository at this point in the history
  4. [SPARK-8649] [BUILD] Mapr repository is not defined properly

    The previous commiter on this part was pwendell
    
    The previous url gives 404, the new one seems to be OK.
    
    This patch is added under the Apache License 2.0.
    
    The JIRA link: https://issues.apache.org/jira/browse/SPARK-8649
    
    Author: Thomas Szymanski <develop@tszymanski.com>
    
    Closes apache#7054 from tszym/SPARK-8649 and squashes the following commits:
    
    bfda9c4 [Thomas Szymanski] [SPARK-8649] [BUILD] Mapr repository is not defined properly
    tszym authored and pwendell committed Jun 28, 2015
    Configuration menu
    Copy the full SHA
    52d1281 View commit details
    Browse the repository at this point in the history
  5. [SPARK-8610] [SQL] Separate Row and InternalRow (part 2)

    Currently, we use GenericRow both for Row and InternalRow, which is confusing because it could contain Scala type also Catalyst types.
    
    This PR changes to use GenericInternalRow for InternalRow (contains catalyst types), GenericRow for Row (contains Scala types).
    
    Also fixes some incorrect use of InternalRow or Row.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#7003 from davies/internalrow and squashes the following commits:
    
    d05866c [Davies Liu] fix test: rollback changes for pyspark
    72878dd [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow
    efd0b25 [Davies Liu] fix copy of MutableRow
    87b13cf [Davies Liu] fix test
    d2ebd72 [Davies Liu] fix style
    eb4b473 [Davies Liu] mark expensive API as final
    bd4e99c [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow
    bdfb78f [Davies Liu] remove BaseMutableRow
    6f99a97 [Davies Liu] fix catalyst test
    defe931 [Davies Liu] remove BaseRow
    288b31f [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow
    9d24350 [Davies Liu] separate Row and InternalRow (part 2)
    Davies Liu committed Jun 28, 2015
    Configuration menu
    Copy the full SHA
    77da5be View commit details
    Browse the repository at this point in the history
  6. [SPARK-8686] [SQL] DataFrame should support where with expression r…

    …epresented by String
    
    DataFrame supports `filter` function with two types of argument, `Column` and `String`. But `where` doesn't.
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes apache#7063 from sarutak/SPARK-8686 and squashes the following commits:
    
    180f9a4 [Kousuke Saruta] Added test
    d61aec4 [Kousuke Saruta] Add "where" method with String argument to DataFrame
    sarutak authored and Davies Liu committed Jun 28, 2015
    Configuration menu
    Copy the full SHA
    ec78438 View commit details
    Browse the repository at this point in the history
  7. [SPARK-8596] [EC2] Added port for Rstudio

    This would otherwise need to be set manually by R users in AWS.
    
    https://issues.apache.org/jira/browse/SPARK-8596
    
    Author: Vincent D. Warmerdam <vincentwarmerdam@gmail.com>
    Author: vincent <vincentwarmerdam@gmail.com>
    
    Closes apache#7068 from koaning/rstudio-port-number and squashes the following commits:
    
    ac8100d [vincent] Update spark_ec2.py
    ce6ad88 [Vincent D. Warmerdam] added port number for rstudio
    koaning authored and shivaram committed Jun 28, 2015
    Configuration menu
    Copy the full SHA
    9ce78b4 View commit details
    Browse the repository at this point in the history
  8. [SPARK-8677] [SQL] Fix non-terminating decimal expansion for decimal …

    …divide operation
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-8677
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes apache#7056 from viirya/fix_decimal3 and squashes the following commits:
    
    34d7419 [Liang-Chi Hsieh] Fix Non-terminating decimal expansion for decimal divide operation.
    viirya authored and Davies Liu committed Jun 28, 2015
    Configuration menu
    Copy the full SHA
    24fda73 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2015

  1. [SPARK-7845] [BUILD] Bumping default Hadoop version used in profile h…

    …adoop-1 to 1.2.1
    
    PR apache#5694 reverted PR apache#6384 while refactoring `dev/run-tests` to `dev/run-tests.py`. Also, PR apache#6384 didn't bump Hadoop 1 version defined in POM.
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#7062 from liancheng/spark-7845 and squashes the following commits:
    
    c088b72 [Cheng Lian] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1
    liancheng authored and yhuai committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    00a9d22 View commit details
    Browse the repository at this point in the history
  2. [SPARK-7212] [MLLIB] Add sequence learning flag

    Support mining of ordered frequent item sequences.
    
    Author: Feynman Liang <fliang@databricks.com>
    
    Closes apache#6997 from feynmanliang/fp-sequence and squashes the following commits:
    
    7c14e15 [Feynman Liang] Improve scalatests with R code and Seq
    0d3e4b6 [Feynman Liang] Fix python test
    ce987cb [Feynman Liang] Backwards compatibility aux constructor
    34ef8f2 [Feynman Liang] Fix failing test due to reverse orderering
    f04bd50 [Feynman Liang] Naming, add ordered to FreqItemsets, test ordering using Seq
    648d4d4 [Feynman Liang] Test case for frequent item sequences
    252a36a [Feynman Liang] Add sequence learning flag
    Feynman Liang authored and mengxr committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    25f574e View commit details
    Browse the repository at this point in the history
  3. [SPARK-5962] [MLLIB] Python support for Power Iteration Clustering

    Python support for Power Iteration Clustering
    https://issues.apache.org/jira/browse/SPARK-5962
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes apache#6992 from yanboliang/pyspark-pic and squashes the following commits:
    
    6b03d82 [Yanbo Liang] address comments
    4be4423 [Yanbo Liang] Python support for Power Iteration Clustering
    yanboliang authored and mengxr committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    dfde31d View commit details
    Browse the repository at this point in the history
  4. [SPARK-8575] [SQL] Deprecate callUDF in favor of udf

    Follow up of [SPARK-8356](https://issues.apache.org/jira/browse/SPARK-8356) and apache#6902.
    Removes the unit test for the now deprecated ```callUdf```
    Unit test in SQLQuerySuite now uses ```udf``` instead of ```callUDF```
    Replaced ```callUDF``` by ```udf``` where possible in mllib
    
    Author: BenFradet <benjamin.fradet@gmail.com>
    
    Closes apache#6993 from BenFradet/SPARK-8575 and squashes the following commits:
    
    26f5a7a [BenFradet] 2 spaces instead of 1
    1ddb452 [BenFradet] renamed initUDF in order to be consistent in OneVsRest
    48ca15e [BenFradet] used vector type tag for udf call in VectorIndexer
    0ebd0da [BenFradet] replace the now deprecated callUDF by udf in VectorIndexer
    8013409 [BenFradet] replaced the now deprecated callUDF by udf in Predictor
    94345b5 [BenFradet] unifomized udf calls in ProbabilisticClassifier
    1305492 [BenFradet] uniformized udf calls in Classifier
    a672228 [BenFradet] uniformized udf calls in OneVsRest
    49e4904 [BenFradet] Revert "removal of the unit test for the now deprecated callUdf"
    bbdeaf3 [BenFradet] fixed syntax for init udf in OneVsRest
    fe2a10b [BenFradet] callUDF => udf in ProbabilisticClassifier
    0ea30b3 [BenFradet] callUDF => udf in Classifier where possible
    197ec82 [BenFradet] callUDF => udf in OneVsRest
    84d6780 [BenFradet] modified unit test in SQLQuerySuite to use udf instead of callUDF
    477709f [BenFradet] removal of the unit test for the now deprecated callUdf
    BenFradet authored and mengxr committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    0b10662 View commit details
    Browse the repository at this point in the history
  5. [SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala

    I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all match.
    
    I added `Option` to reader and writer and updated the `pyspark-sql` test.
    
    Author: Cheolsoo Park <cheolsoop@netflix.com>
    
    Closes apache#7078 from piaozhexiu/SPARK-8355 and squashes the following commits:
    
    c63d419 [Cheolsoo Park] Fix version
    524e0aa [Cheolsoo Park] Add option function to df reader and writer
    Cheolsoo Park authored and rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    ac2e17b View commit details
    Browse the repository at this point in the history
  6. [SPARK-8698] partitionBy in Python DataFrame reader/writer interface …

    …should not default to empty tuple.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7079 from rxin/SPARK-8698 and squashes the following commits:
    
    8513e1c [Reynold Xin] [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.
    rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    660c6ce View commit details
    Browse the repository at this point in the history
  7. [SPARK-8702] [WEBUI] Avoid massive concating strings in Javascript

    When there are massive tasks, such as `sc.parallelize(1 to 100000, 10000).count()`, the generated JS codes have a lot of string concatenations in the stage page, nearly 40 string concatenations for one task.
    
    We can generate the whole string for a task instead of execution string concatenations in the browser.
    
    Before this patch, the load time of the page is about 21 seconds.
    ![screen shot 2015-06-29 at 6 44 04 pm](https://cloud.githubusercontent.com/assets/1000778/8406644/eb55ed18-1e90-11e5-9ad5-50d27ad1dff1.png)
    
    After this patch, it reduces to about 17 seconds.
    
    ![screen shot 2015-06-29 at 6 47 34 pm](https://cloud.githubusercontent.com/assets/1000778/8406665/087003ca-1e91-11e5-80a8-3485aa9adafa.png)
    
    One disadvantage is that the generated JS codes become hard to read.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#7082 from zsxwing/js-string and squashes the following commits:
    
    b29231d [zsxwing] Avoid massive concating strings in Javascript
    zsxwing authored and sarutak committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    630bd5f View commit details
    Browse the repository at this point in the history
  8. [SPARK-8693] [PROJECT INFRA] profiles and goals are not printed in a …

    …nice way
    
    Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like:
    
    ```
    -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  streaming-kafka-assembly/assembly
    ```
    
    Author: Brennon York <brennon.york@capitalone.com>
    
    Closes apache#7085 from brennonyork/SPARK-8693 and squashes the following commits:
    
    c5575f1 [Brennon York] added commas to end of print statements for proper printing
    Brennon York authored and JoshRosen committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    5c796d5 View commit details
    Browse the repository at this point in the history
  9. [SPARK-8554] Add the SparkR document files to .rat-excludes for `./…

    …dev/check-license`
    
    [[SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8554)
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes apache#6947 from yu-iskw/SPARK-8554 and squashes the following commits:
    
    5ca240c [Yu ISHIKAWA] [SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license`
    yu-iskw authored and JoshRosen committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    715f084 View commit details
    Browse the repository at this point in the history
  10. Revert "[SPARK-8372] History server shows incorrect information for a…

    …pplication not started"
    
    This reverts commit 2837e06.
    Andrew Or committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    ea88b1a View commit details
    Browse the repository at this point in the history
  11. [SPARK-8692] [SQL] re-order the case statements that handling catalys…

    …t data types
    
    use same order: boolean, byte, short, int, date, long, timestamp, float, double, string, binary, decimal.
    
    Then we can easily check whether some data types are missing by just one glance, and make sure we handle data/timestamp just as int/long.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#7073 from cloud-fan/fix-date and squashes the following commits:
    
    463044d [Wenchen Fan] fix style
    51cd347 [Wenchen Fan] refactor handling of date and timestmap
    cloud-fan authored and liancheng committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    ed413bc View commit details
    Browse the repository at this point in the history
  12. [SPARK-8066, SPARK-8067] [hive] Add support for Hive 1.0, 1.1 and 1.2.

    Allow HiveContext to connect to metastores of those versions; some new shims
    had to be added to account for changing internal APIs.
    
    A new test was added to exercise the "reset()" path which now also requires
    a shim; and the test code was changed to use a directory under the build's
    target to store ivy dependencies. Without that, at least I consistently run
    into issues with Ivy messing up (or being confused) by my existing caches.
    
    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes apache#7026 from vanzin/SPARK-8067 and squashes the following commits:
    
    3e2e67b [Marcelo Vanzin] [SPARK-8066, SPARK-8067] [hive] Add support for Hive 1.0, 1.1 and 1.2.
    Marcelo Vanzin authored and marmbrus committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    3664ee2 View commit details
    Browse the repository at this point in the history
  13. [SPARK-8235] [SQL] misc function sha / sha1

    Jira: https://issues.apache.org/jira/browse/SPARK-8235
    
    I added the support for sha1. If I understood rxin correctly, sha and sha1 should execute the same algorithm, shouldn't they?
    
    Please take a close look on the Python part. This is adopted from apache#6934
    
    Author: Tarek Auel <tarek.auel@gmail.com>
    Author: Tarek Auel <tarek.auel@googlemail.com>
    
    Closes apache#6963 from tarekauel/SPARK-8235 and squashes the following commits:
    
    f064563 [Tarek Auel] change to shaHex
    7ce3cdc [Tarek Auel] rely on automatic cast
    a1251d6 [Tarek Auel] Merge remote-tracking branch 'upstream/master' into SPARK-8235
    68eb043 [Tarek Auel] added docstring
    be5aff1 [Tarek Auel] improved error message
    7336c96 [Tarek Auel] added type check
    cf23a80 [Tarek Auel] simplified example
    ebf75ef [Tarek Auel] [SPARK-8301] updated the python documentation. Removed sha in python and scala
    6d6ff0d [Tarek Auel] [SPARK-8233] added docstring
    ea191a9 [Tarek Auel] [SPARK-8233] fixed signatureof python function. Added expected type to misc
    e3fd7c3 [Tarek Auel] SPARK[8235] added sha to the list of __all__
    e5dad4e [Tarek Auel] SPARK[8235] sha / sha1
    tarekbecker authored and Davies Liu committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    a5c2961 View commit details
    Browse the repository at this point in the history
  14. [SPARK-8528] Expose SparkContext.applicationId in PySpark

    Use case - we want to log applicationId (YARN in hour case) to request help with troubleshooting from the DevOps
    
    Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com>
    
    Closes apache#6936 from smartkiwi/master and squashes the following commits:
    
    870338b [Vladimir Vladimirov] this would make doctest to run in python3
    0eae619 [Vladimir Vladimirov] Scala doesn't use u'...' for unicode literals
    14d77a8 [Vladimir Vladimirov] stop using ELLIPSIS
    b4ebfc5 [Vladimir Vladimirov] addressed PR feedback - updated docstring
    223a32f [Vladimir Vladimirov] fixed test - applicationId is property that returns the string
    3221f5a [Vladimir Vladimirov] [SPARK-8528] added documentation for Scala
    2cff090 [Vladimir Vladimirov] [SPARK-8528] add applicationId property for SparkContext object in pyspark
    Vladimir Vladimirov authored and JoshRosen committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    492dca3 View commit details
    Browse the repository at this point in the history
  15. [SQL][DOCS] Remove wrong example from DataFrame.scala

    In DataFrame.scala, there are examples like as follows.
    
    ```
     * // The following are equivalent:
     * peopleDf.filter($"age" > 15)
     * peopleDf.where($"age" > 15)
     * peopleDf($"age" > 15)
    ```
    
    But, I think the last example doesn't work.
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes apache#6977 from sarutak/fix-dataframe-example and squashes the following commits:
    
    46efbd7 [Kousuke Saruta] Removed wrong example
    sarutak authored and rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    94e040d View commit details
    Browse the repository at this point in the history
  16. [SPARK-8214] [SQL] Add function hex

    cc chenghao-intel  adrian-wang
    
    Author: zhichao.li <zhichao.li@intel.com>
    
    Closes apache#6976 from zhichao-li/hex and squashes the following commits:
    
    e218d1b [zhichao.li] turn off scalastyle for non-ascii
    de3f5ea [zhichao.li] non-ascii char
    cf9c936 [zhichao.li] give separated buffer for each hex method
    967ec90 [zhichao.li] Make 'value' as a feild of Hex
    3b2fa13 [zhichao.li] tiny fix
    a647641 [zhichao.li] remove duplicate null check
    7cab020 [zhichao.li] tiny refactoring
    35ecfe5 [zhichao.li] add function hex
    zhichao-li authored and Davies Liu committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    637b4ee View commit details
    Browse the repository at this point in the history
  17. [SPARK-7862] [SQL] Disable the error message redirect to stderr

    This is a follow up of apache#6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log.
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes apache#6882 from chenghao-intel/verbose and squashes the following commits:
    
    bfedd77 [Cheng Hao] revert the write
    76ff46b [Cheng Hao] update the CircularBuffer
    692b19e [Cheng Hao] check the process exitValue for ScriptTransform
    47e0970 [Cheng Hao] Use the RedirectThread instead
    1de771d [Cheng Hao] naming the threads in ScriptTransformation
    8536e81 [Cheng Hao] disable the error message redirection for stderr
    chenghao-intel authored and marmbrus committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    c6ba2ea View commit details
    Browse the repository at this point in the history
  18. [SPARK-8681] fixed wrong ordering of columns in crosstab

    I specifically randomized the test. What crosstab does is equivalent to a countByKey, therefore if this test fails again for any reason, we will know that we hit a corner case or something.
    
    cc rxin marmbrus
    
    Author: Burak Yavuz <brkyvz@gmail.com>
    
    Closes apache#7060 from brkyvz/crosstab-fixes and squashes the following commits:
    
    0a65234 [Burak Yavuz] addressed comments v1
    d96da7e [Burak Yavuz] fixed wrong ordering of columns in crosstab
    brkyvz authored and rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    be7ef06 View commit details
    Browse the repository at this point in the history
  19. [SPARK-8070] [SQL] [PYSPARK] avoid spark jobs in createDataFrame

    Avoid the unnecessary jobs when infer schema from list.
    
    cc yhuai mengxr
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#6606 from davies/improve_create and squashes the following commits:
    
    a5928bf [Davies Liu] Update MimaExcludes.scala
    62da911 [Davies Liu] fix mima
    bab4d7d [Davies Liu] Merge branch 'improve_create' of github.com:davies/spark into improve_create
    eee44a8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_create
    8d9292d [Davies Liu] Update context.py
    eb24531 [Davies Liu] Update context.py
    c969997 [Davies Liu] bug fix
    d5a8ab0 [Davies Liu] fix tests
    8c3f10d [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_create
    6ea5925 [Davies Liu] address comments
    6ceaeff [Davies Liu] avoid spark jobs in createDataFrame
    Davies Liu authored and rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    afae976 View commit details
    Browse the repository at this point in the history
  20. [SPARK-8709] Exclude hadoop-client's mockito-all dependency

    This patch excludes `hadoop-client`'s dependency on `mockito-all`.  As of apache#7061, Spark depends on `mockito-core` instead of `mockito-all`, so the dependency from Hadoop was leading to test compilation failures for some of the Hadoop 2 SBT builds.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7090 from JoshRosen/SPARK-8709 and squashes the following commits:
    
    e190122 [Josh Rosen] [SPARK-8709] Exclude hadoop-client's mockito-all dependency.
    JoshRosen committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    27ef854 View commit details
    Browse the repository at this point in the history
  21. [SPARK-8056][SQL] Design an easier way to construct schema for both S…

    …cala and Python
    
    I've added functionality to create new StructType similar to how we add parameters to a new SparkContext.
    
    I've also added tests for this type of creation.
    
    Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
    
    Closes apache#6686 from ilganeli/SPARK-8056B and squashes the following commits:
    
    27c1de1 [Ilya Ganelin] Rename
    467d836 [Ilya Ganelin] Removed from_string in favor of _parse_Datatype_json_value
    5fef5a4 [Ilya Ganelin] Updates for type parsing
    4085489 [Ilya Ganelin] Style errors
    3670cf5 [Ilya Ganelin] added string to DataType conversion
    8109e00 [Ilya Ganelin] Fixed error in tests
    41ab686 [Ilya Ganelin] Fixed style errors
    e7ba7e0 [Ilya Ganelin] Moved some python tests to tests.py. Added cleaner handling of null data type and added test for correctness of input format
    15868fa [Ilya Ganelin] Fixed python errors
    b79b992 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-8056B
    a3369fc [Ilya Ganelin] Fixing space errors
    e240040 [Ilya Ganelin] Style
    bab7823 [Ilya Ganelin] Constructor error
    73d4677 [Ilya Ganelin] Style
    4ed00d9 [Ilya Ganelin] Fixed default arg
    67df57a [Ilya Ganelin] Removed Foo
    04cbf0c [Ilya Ganelin] Added comments for single object
    0484d7a [Ilya Ganelin] Restored second method
    6aeb740 [Ilya Ganelin] Style
    689e54d [Ilya Ganelin] Style
    f497e9e [Ilya Ganelin] Got rid of old code
    e3c7a88 [Ilya Ganelin] Fixed doctest failure
    a62ccde [Ilya Ganelin] Style
    966ac06 [Ilya Ganelin] style checks
    dabb7e6 [Ilya Ganelin] Added Python tests
    a3f4152 [Ilya Ganelin] added python bindings and better comments
    e6e536c [Ilya Ganelin] Added extra space
    7529a2e [Ilya Ganelin] Fixed formatting
    d388f86 [Ilya Ganelin] Fixed small bug
    c4e3bf5 [Ilya Ganelin] Reverted to using parse. Updated parse to support long
    d7634b6 [Ilya Ganelin] Reverted to fromString to properly support types
    22c39d5 [Ilya Ganelin] replaced FromString with DataTypeParser.parse. Replaced empty constructor initializing a null to have it instead create a new array to allow appends to it.
    faca398 [Ilya Ganelin] [SPARK-8056] Replaced default argument usage. Updated usage and code for DataType.fromString
    1acf76e [Ilya Ganelin] Scala style
    e31c674 [Ilya Ganelin] Fixed bug in test
    8dc0795 [Ilya Ganelin] Added tests for creation of StructType object with new methods
    fdf7e9f [Ilya Ganelin] [SPARK-8056] Created add methods to facilitate building new StructType objects.
    Ilya Ganelin authored and rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    f6fc254 View commit details
    Browse the repository at this point in the history
  22. [SPARK-7810] [PYSPARK] solve python rdd socket connection problem

    Method "_load_from_socket" in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New modification should work around both two protocols.
    
    Author: Ai He <ai.he@ussuning.com>
    Author: AiHe <ai.he@ussuning.com>
    
    Closes apache#6338 from AiHe/pyspark-networking-issue and squashes the following commits:
    
    d4fc9c4 [Ai He] handle code review 2
    e75c5c8 [Ai He] handle code review
    5644953 [AiHe] solve python rdd socket connection problem to jvm
    Ai He authored and Davies Liu committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    ecd3aac View commit details
    Browse the repository at this point in the history
  23. [SPARK-8660][ML] Convert JavaDoc style comments inLogisticRegressionS…

    …uite.scala to regular multiline comments, to make copy-pasting R commands easier
    
    Converted JavaDoc style comments in mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala to regular multiline comments, to make copy-pasting R commands easier.
    
    Author: Rosstin <asterazul@gmail.com>
    
    Closes apache#7096 from Rosstin/SPARK-8660 and squashes the following commits:
    
    242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala
    2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
    21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
    6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
    Rosstin authored and rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    c8ae887 View commit details
    Browse the repository at this point in the history
  24. [SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF in…

    …stead of Udf
    
    Follow-up of apache#6902 for being coherent between ```Udf``` and ```UDF```
    
    Author: BenFradet <benjamin.fradet@gmail.com>
    
    Closes apache#6920 from BenFradet/SPARK-8478 and squashes the following commits:
    
    c500f29 [BenFradet] renamed a few variables in functions to use UDF
    8ab0f2d [BenFradet] renamed idUdf to idUDF in SQLQuerySuite
    98696c2 [BenFradet] renamed originalUdfs in TestHive to originalUDFs
    7738f74 [BenFradet] modified HiveUDFSuite to use only UDF
    c52608d [BenFradet] renamed HiveUdfSuite to HiveUDFSuite
    e51b9ac [BenFradet] renamed ExtractPythonUdfs to ExtractPythonUDFs
    8c756f1 [BenFradet] renamed Hive UDF related code
    2a1ca76 [BenFradet] renamed pythonUdfs to pythonUDFs
    261e6fb [BenFradet] renamed ScalaUdf to ScalaUDF
    BenFradet authored and marmbrus committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    931da5c View commit details
    Browse the repository at this point in the history
  25. [SPARK-8579] [SQL] support arbitrary object in UnsafeRow

    This PR brings arbitrary object support in UnsafeRow (both in grouping key and aggregation buffer).
    
    Two object pools will be created to hold those non-primitive objects, and put the index of them into UnsafeRow. In order to compare the grouping key as bytes, the objects in key will be stored in a unique object pool, to make sure same objects will have same index (used as hashCode).
    
    For StringType and BinaryType, we still put them as var-length in UnsafeRow when initializing for better performance. But for update, they will be an object inside object pools (there will be some garbages left in the buffer).
    
    BTW: Will create a JIRA once issue.apache.org is available.
    
    cc JoshRosen rxin
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#6959 from davies/unsafe_obj and squashes the following commits:
    
    5ce39da [Davies Liu] fix comment
    5e797bf [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj
    5803d64 [Davies Liu] fix conflict
    461d304 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj
    2f41c90 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj
    b04d69c [Davies Liu] address comments
    4859b80 [Davies Liu] fix comments
    f38011c [Davies Liu] add a test for grouping by decimal
    d2cf7ab [Davies Liu] add more tests for null checking
    71983c5 [Davies Liu] add test for timestamp
    e8a1649 [Davies Liu] reuse buffer for string
    39f09ca [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj
    035501e [Davies Liu] fix style
    236d6de [Davies Liu] support arbitrary object in UnsafeRow
    Davies Liu committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    ed359de View commit details
    Browse the repository at this point in the history
  26. [SPARK-8661][ML] for LinearRegressionSuite.scala, changed javadoc-sty…

    …le comments to regular multiline comments, to make copy-pasting R code more simple
    
    for mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments, to make copy-pasting R code more simple
    
    Author: Rosstin <asterazul@gmail.com>
    
    Closes apache#7098 from Rosstin/SPARK-8661 and squashes the following commits:
    
    5a05dee [Rosstin] SPARK-8661 for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments to make it easier to copy-paste the R code.
    bb9a4b1 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8660
    242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala
    2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
    21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
    6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
    Rosstin authored and rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    4e880cf View commit details
    Browse the repository at this point in the history
  27. [SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def.

    jira: https://issues.apache.org/jira/browse/SPARK-8710
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes apache#7094 from yhuai/SPARK-8710 and squashes the following commits:
    
    c854baa [Yin Huai] Change ScalaReflection.mirror from a val to a def.
    yhuai authored and rxin committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    4b497a7 View commit details
    Browse the repository at this point in the history
  28. [SPARK-8589] [SQL] cleanup DateTimeUtils

    move date time related operations into `DateTimeUtils` and rename some methods to make it more clear.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#6980 from cloud-fan/datetime and squashes the following commits:
    
    9373a9d [Wenchen Fan] cleanup DateTimeUtil
    cloud-fan authored and marmbrus committed Jun 29, 2015
    Configuration menu
    Copy the full SHA
    881662e View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2015

  1. [SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuit…

    …e "receiver info reporting"
    
    As per the unit test log in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35754/
    
    ```
    15/06/24 23:09:10.210 Thread-3495 INFO ReceiverTracker: Starting 1 receivers
    15/06/24 23:09:10.270 Thread-3495 INFO SparkContext: Starting job: apply at Transformer.scala:22
    ...
    15/06/24 23:09:14.259 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Started receiver and sleeping
    15/06/24 23:09:14.270 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Reporting error and sleeping
    ```
    
    it needs at least 4 seconds to receive all receiver events in this slow machine, but `timeout` for `eventually` is only 2 seconds.
    This PR increases `timeout` to make this test stable.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#7017 from zsxwing/SPARK-8634 and squashes the following commits:
    
    719cae4 [zsxwing] Fix flaky test StreamingListenerSuite "receiver info reporting"
    zsxwing authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    cec9852 View commit details
    Browse the repository at this point in the history
  2. [SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in Spark…

    …SubmitSuite
    
    Hopefully, this suite will not be flaky anymore.
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes apache#7027 from yhuai/SPARK-8567 and squashes the following commits:
    
    c0167e2 [Yin Huai] Add sc.stop().
    yhuai authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    fbf7573 View commit details
    Browse the repository at this point in the history
  3. [SPARK-8437] [DOCS] Using directory path without wildcard for filenam…

    …e slow for large number of files with wholeTextFiles and binaryFiles
    
    Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes apache#7036 from srowen/SPARK-8437 and squashes the following commits:
    
    0e813ae [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
    srowen authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    5d30eae View commit details
    Browse the repository at this point in the history
  4. [SPARK-8410] [SPARK-8475] remove previous ivy resolution when using s…

    …park-submit
    
    This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be prioritized.
    
    cc andrewor14
    
    Author: Burak Yavuz <brkyvz@gmail.com>
    
    Closes apache#7089 from brkyvz/delete-prev-ivy-resolution and squashes the following commits:
    
    a21f95a [Burak Yavuz] remove previous ivy resolution when using spark-submit
    brkyvz authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    d7f796d View commit details
    Browse the repository at this point in the history
  5. [SPARK-8019] [SPARKR] Support SparkR spawning worker R processes with…

    … a command other then Rscript
    
    This is a simple change to add a new environment variable
    "spark.sparkr.r.command" that specifies the command that SparkR will
    use when creating an R engine process.  If this is not specified,
    "Rscript" will be used by default.
    
    I did not add any documentation, since I couldn't find any place where
    environment variables (such as "spark.sparkr.use.daemon") are
    documented.
    
    I also did not add a unit test.  The only test that would work
    generally would be one starting SparkR with
    sparkR.init(sparkEnvir=list(spark.sparkr.r.command="Rscript")), just
    using the default value.  I think that this is a low-risk change.
    
    Likely committers: shivaram
    
    Author: Michael Sannella x268 <msannell@tibco.com>
    
    Closes apache#6557 from msannell/altR and squashes the following commits:
    
    7eac142 [Michael Sannella x268] add spark.sparkr.r.command config parameter
    msannell authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    4a9e03f View commit details
    Browse the repository at this point in the history
  6. Revert "[SPARK-8437] [DOCS] Using directory path without wildcard for…

    … filename slow for large number of files with wholeTextFiles and binaryFiles"
    
    This reverts commit 5d30eae.
    Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    4c1808b View commit details
    Browse the repository at this point in the history
  7. [SPARK-8456] [ML] Ngram featurizer python

    Python API for N-gram feature transformer
    
    Author: Feynman Liang <fliang@databricks.com>
    
    Closes apache#6960 from feynmanliang/ngram-featurizer-python and squashes the following commits:
    
    f9e37c9 [Feynman Liang] Remove debugging code
    4dd81f4 [Feynman Liang] Fix typo and doctest
    06c79ac [Feynman Liang] Style guide
    26c1175 [Feynman Liang] Add python NGram API
    Feynman Liang authored and jkbradley committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    620605a View commit details
    Browse the repository at this point in the history
  8. [SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.c…

    …rosstab
    
    cc yhuai
    
    Author: Burak Yavuz <brkyvz@gmail.com>
    
    Closes apache#7100 from brkyvz/ct-flakiness-fix and squashes the following commits:
    
    abc299a [Burak Yavuz] change 'to' to until
    7e96d7c [Burak Yavuz] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab
    brkyvz authored and yhuai committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    ecacb1e View commit details
    Browse the repository at this point in the history
  9. [SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7

    Patch to fix crash with BINARY fields with ENUM original types.
    
    Author: Steven She <steven@canopylabs.com>
    
    Closes apache#7048 from stevencanopy/SPARK-8669 and squashes the following commits:
    
    2e72979 [Steven She] [SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7
    stshe authored and marmbrus committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    4915e9e View commit details
    Browse the repository at this point in the history
  10. [SPARK-7667] [MLLIB] MLlib Python API consistency check

    MLlib Python API consistency check
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes apache#6856 from yanboliang/spark-7667 and squashes the following commits:
    
    21bae35 [Yanbo Liang] remove duplicate code
    eb12f95 [Yanbo Liang] fix doc inherit problem
    9e7ec3c [Yanbo Liang] address comments
    e763d32 [Yanbo Liang] MLlib Python API consistency check
    yanboliang authored and jkbradley committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    f9b6bf2 View commit details
    Browse the repository at this point in the history
  11. [SPARK-5161] Parallelize Python test execution

    This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times.  Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism).
    
    To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7031 from JoshRosen/parallelize-python-tests and squashes the following commits:
    
    feb3763 [Josh Rosen] Re-enable other tests
    f87ea81 [Josh Rosen] Only log output from failed tests
    d4ded73 [Josh Rosen] Logging improvements
    a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests
    1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests
    110cd9d [Josh Rosen] Fix universal_newlines for Python 3
    cd13db8 [Josh Rosen] Also log python_implementation
    9e31127 [Josh Rosen] Log Python --version output for each executable.
    a2b9094 [Josh Rosen] Bump up parallelism.
    5552380 [Josh Rosen] Python 3 fix
    866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks
    87cb988 [Josh Rosen] Skip MLLib tests for PyPy
    8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure
    9129027 [Josh Rosen] Disable Spark UI in Python tests
    037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins.
    af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution
    JoshRosen authored and Davies Liu committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    7bbbe38 View commit details
    Browse the repository at this point in the history
  12. MAINTENANCE: Automated closing of pull requests.

    This commit exists to close the following pull requests on Github:
    
    Closes apache#1767 (close requested by 'andrewor14')
    Closes apache#6952 (close requested by 'andrewor14')
    Closes apache#7051 (close requested by 'andrewor14')
    Closes apache#5357 (close requested by 'marmbrus')
    Closes apache#5233 (close requested by 'andrewor14')
    Closes apache#6930 (close requested by 'JoshRosen')
    Closes apache#5502 (close requested by 'andrewor14')
    Closes apache#6778 (close requested by 'andrewor14')
    Closes apache#7006 (close requested by 'andrewor14')
    pwendell committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    ea775b0 View commit details
    Browse the repository at this point in the history
  13. [SPARK-8721][SQL] Rename ExpectsInputTypes => AutoCastInputTypes.

    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7109 from rxin/auto-cast and squashes the following commits:
    
    a914cc3 [Reynold Xin] [SPARK-8721][SQL] Rename ExpectsInputTypes => AutoCastInputTypes.
    rxin committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    f79410c View commit details
    Browse the repository at this point in the history
  14. [SPARK-8650] [SQL] Use the user-specified app name priority in SparkS…

    …QLCLIDriver or HiveThriftServer2
    
    When run `./bin/spark-sql --name query1.sql`
    [Before]
    ![before](https://cloud.githubusercontent.com/assets/1400819/8370336/fa20b75a-1bf8-11e5-9171-040049a53240.png)
    
    [After]
    ![after](https://cloud.githubusercontent.com/assets/1400819/8370189/dcc35cb4-1bf6-11e5-8796-a0694140bffb.png)
    
    Author: Yadong Qi <qiyadong2010@gmail.com>
    
    Closes apache#7030 from watermen/SPARK-8650 and squashes the following commits:
    
    51b5134 [Yadong Qi] Improve code and add comment.
    e3d7647 [Yadong Qi] use spark.app.name priority.
    watermen authored and yhuai committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    e6c3f74 View commit details
    Browse the repository at this point in the history
  15. [SPARK-5161] [HOTFIX] Fix bug in Python test failure reporting

    This patch fixes a bug introduced in apache#7031 which can cause Jenkins to incorrectly report a build with failed Python tests as passing if an error occurred while printing the test failure message.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7112 from JoshRosen/python-tests-hotfix and squashes the following commits:
    
    c3f2961 [Josh Rosen] Hotfix for bug in Python test failure reporting
    JoshRosen committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    6c5a6db View commit details
    Browse the repository at this point in the history
  16. [SPARK-8434][SQL]Add a "pretty" parameter to the "show" method to dis…

    …play long strings
    
    Sometimes the user may want to show the complete content of cells. Now `sql("set -v").show()` displays:
    
    ![screen shot 2015-06-18 at 4 34 51 pm](https://cloud.githubusercontent.com/assets/1000778/8227339/14d3c5ea-15d9-11e5-99b9-f00b7e93beef.png)
    
    The user needs to use something like `sql("set -v").collect().foreach(r => r.toSeq.mkString("\t"))` to show the complete content.
    
    This PR adds a `pretty` parameter to show. If `pretty` is false, `show` won't truncate strings or align cells right.
    
    ![screen shot 2015-06-18 at 4 21 44 pm](https://cloud.githubusercontent.com/assets/1000778/8227407/b6f8dcac-15d9-11e5-8219-8079280d76fc.png)
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#6877 from zsxwing/show and squashes the following commits:
    
    22e28e9 [zsxwing] pretty -> truncate
    e582628 [zsxwing] Add pretty parameter to the show method in R
    a3cd55b [zsxwing] Fix calling showString in R
    923cee4 [zsxwing] Add a "pretty" parameter to show to display long strings
    zsxwing authored and rxin committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    12671dd View commit details
    Browse the repository at this point in the history
  17. [SPARK-8551] [ML] Elastic net python code example

    Author: Shuo Xiang <shuoxiangpub@gmail.com>
    
    Closes apache#6946 from coderxiang/en-java-code-example and squashes the following commits:
    
    7a4bdf8 [Shuo Xiang] address comments
    cddb02b [Shuo Xiang] add elastic net python example code
    f4fa534 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
    6ad4865 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
    180b496 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
    aa0717d [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
    5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
    c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
    98804c9 [Shuo Xiang] fix bug in topBykey and update test
    coderxiang authored and DB Tsai committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    5452457 View commit details
    Browse the repository at this point in the history
  18. [SPARK-7756] [CORE] More robust SSL options processing.

    Subset the enabled algorithms in an SSLOptions to the elements that are supported by the protocol provider.
    
    Update the list of ciphers in the sample config to include modern algorithms, and specify both Oracle and IBM names.  In practice the user would either specify their own chosen cipher suites, or specify none, and delegate the decision to the provider.
    
    Author: Tim Ellison <t.p.ellison@gmail.com>
    
    Closes apache#7043 from tellison/SSLEnhancements and squashes the following commits:
    
    034efa5 [Tim Ellison] Ensure Java imports are grouped and ordered by package.
    3797f8b [Tim Ellison] Remove unnecessary use of Option to improve clarity, and fix import style ordering.
    4b5c89f [Tim Ellison] More robust SSL options processing.
    tellison authored and srowen committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    2ed0c0a View commit details
    Browse the repository at this point in the history
  19. [SPARK-8590] [SQL] add code gen for ExtractValue

    TODO:  use array instead of Seq as internal representation for `ArrayType`
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#6982 from cloud-fan/extract-value and squashes the following commits:
    
    e203bc1 [Wenchen Fan] address comments
    4da0f0b [Wenchen Fan] some clean up
    f679969 [Wenchen Fan] fix bug
    e64f942 [Wenchen Fan] remove generic
    e3f8427 [Wenchen Fan] fix style and address comments
    fc694e8 [Wenchen Fan] add code gen for extract value
    cloud-fan authored and Davies Liu committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    08fab48 View commit details
    Browse the repository at this point in the history
  20. [SPARK-8723] [SQL] improve divide and remainder code gen

    We can avoid execution of both left and right expression by null and zero check.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#7111 from cloud-fan/cg and squashes the following commits:
    
    d6b12ef [Wenchen Fan] improve divide and remainder code gen
    cloud-fan authored and Davies Liu committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    865a834 View commit details
    Browse the repository at this point in the history
  21. [SPARK-8680] [SQL] Slightly improve PropagateTypes

    JIRA: https://issues.apache.org/jira/browse/SPARK-8680
    
    This PR slightly improve `PropagateTypes` in `HiveTypeCoercion`. It moves `q.inputSet` outside `q transformExpressions` instead calling `inputSet` multiple times. It also builds a map of attributes for looking attribute easily.
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes apache#7087 from viirya/improve_propagatetypes and squashes the following commits:
    
    5c314c1 [Liang-Chi Hsieh] For comments.
    913f6ad [Liang-Chi Hsieh] Slightly improve PropagateTypes.
    viirya authored and Davies Liu committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    a48e619 View commit details
    Browse the repository at this point in the history
  22. [SPARK-8236] [SQL] misc functions: crc32

    https://issues.apache.org/jira/browse/SPARK-8236
    
    Author: Shilei <shilei.qian@intel.com>
    
    Closes apache#7108 from qiansl127/Crc32 and squashes the following commits:
    
    5477352 [Shilei] Change to AutoCastInputTypes
    5f16e5d [Shilei] Add misc function crc32
    qiansl127 authored and Davies Liu committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    722aa5f View commit details
    Browse the repository at this point in the history
  23. [SPARK-8592] [CORE] CoarseGrainedExecutorBackend: Cannot register wit…

    …h driver => NPE
    
    Look detail of this issue at [SPARK-8592](https://issues.apache.org/jira/browse/SPARK-8592)
    
    **CoarseGrainedExecutorBackend** should exit when **RegisterExecutor** failed
    
    Author: xuchenCN <chenxu198511@gmail.com>
    
    Closes apache#7110 from xuchenCN/SPARK-8592 and squashes the following commits:
    
    71e0077 [xuchenCN] [SPARK-8592] [CORE] CoarseGrainedExecutorBackend: Cannot register with driver => NPE
    xuchenCN authored and JoshRosen committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    689da28 View commit details
    Browse the repository at this point in the history
  24. [SPARK-8437] [DOCS] Corrected: Using directory path without wildcard …

    …for filename slow for large number of files with wholeTextFiles and binaryFiles
    
    Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes apache#7126 from srowen/SPARK-8437.2 and squashes the following commits:
    
    7bb45da [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)
    srowen authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    ada384b View commit details
    Browse the repository at this point in the history
  25. [SPARK-4127] [MLLIB] [PYSPARK] Python bindings for StreamingLinearReg…

    …ressionWithSGD
    
    Python bindings for StreamingLinearRegressionWithSGD
    
    Author: MechCoder <manojkumarsivaraj334@gmail.com>
    
    Closes apache#6744 from MechCoder/spark-4127 and squashes the following commits:
    
    d8f6457 [MechCoder] Moved StreamingLinearAlgorithm to pyspark.mllib.regression
    d47cc24 [MechCoder] Inherit from StreamingLinearAlgorithm
    1b4ddd6 [MechCoder] minor
    4de6c68 [MechCoder] Minor refactor
    5e85a3b [MechCoder] Add tests for simultaneous training and prediction
    fb27889 [MechCoder] Add example and docs
    505380b [MechCoder] Add tests
    d42bdae [MechCoder] [SPARK-4127] Python bindings for StreamingLinearRegressionWithSGD
    MechCoder authored and mengxr committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    4528166 View commit details
    Browse the repository at this point in the history
  26. [SPARK-8679] [PYSPARK] [MLLIB] Default values in Pipeline API should …

    …be immutable
    
    It might be dangerous to have a mutable as value for default param. (http://stackoverflow.com/a/11416002/1170730)
    
    e.g
    
        def func(example, f={}):
            f[example] = 1
            return f
    
        func(2)
    
        {2: 1}
        func(3)
        {2:1, 3:1}
    
    mengxr
    
    Author: MechCoder <manojkumarsivaraj334@gmail.com>
    
    Closes apache#7058 from MechCoder/pipeline_api_playground and squashes the following commits:
    
    40a5eb2 [MechCoder] copy
    95f7ff2 [MechCoder] [SPARK-8679] [PySpark] [MLlib] Default values in Pipeline API should be immutable
    MechCoder authored and mengxr committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    5fa0863 View commit details
    Browse the repository at this point in the history
  27. [SPARK-8713] Make codegen thread safe

    Codegen takes three steps:
    
    1. Take a list of expressions, convert them into Java source code and a list of expressions that don't not support codegen (fallback to interpret mode).
    2. Compile the Java source into Java class (bytecode)
    3. Using the Java class and the list of expression to build a Projection.
    
    Currently, we cache the whole three steps, the key is a list of expression, result is projection. Because some of expressions (which may not thread-safe, for example, Random) will be hold by the Projection, the projection maybe not thread safe.
    
    This PR change to only cache the second step, then we can build projection using codegen even some expressions are not thread-safe, because the cache will not hold any expression anymore.
    
    cc marmbrus rxin JoshRosen
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#7101 from davies/codegen_safe and squashes the following commits:
    
    7dd41f1 [Davies Liu] Merge branch 'master' of github.com:apache/spark into codegen_safe
    847bd08 [Davies Liu] don't use scala.refect
    4ddaaed [Davies Liu] Merge branch 'master' of github.com:apache/spark into codegen_safe
    1793cf1 [Davies Liu] make codegen thread safe
    Davies Liu committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    fbb267e View commit details
    Browse the repository at this point in the history
  28. [SPARK-8615] [DOCUMENTATION] Fixed Sample deprecated code

    Modified the deprecated jdbc api in the documentation.
    
    Author: Tijo Thomas <tijoparacka@gmail.com>
    
    Closes apache#7039 from tijoparacka/JIRA_8615 and squashes the following commits:
    
    6e73b8a [Tijo Thomas] Reverted new lines
    4042fcf [Tijo Thomas] updated to sql documentation
    a27949c [Tijo Thomas] Fixed Sample deprecated code
    tijoparacka authored and liancheng committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    9213f73 View commit details
    Browse the repository at this point in the history
  29. [SPARK-7988] [STREAMING] Round-robin scheduling of receivers by default

    Minimal PR for round-robin scheduling of receivers. Dense scheduling can be enabled by setting preferredLocation, so a new config parameter isn't really needed. Tested this on a cluster of 6 nodes and noticed 20-25% gain in throughput compared to random scheduling.
    
    tdas pwendell
    
    Author: nishkamravi2 <nishkamravi@gmail.com>
    Author: Nishkam Ravi <nravi@cloudera.com>
    
    Closes apache#6607 from nishkamravi2/master_nravi and squashes the following commits:
    
    1918819 [Nishkam Ravi] Update ReceiverTrackerSuite.scala
    f747739 [Nishkam Ravi] Update ReceiverTrackerSuite.scala
    6127e58 [Nishkam Ravi] Update ReceiverTracker and ReceiverTrackerSuite
    9f1abc2 [nishkamravi2] Update ReceiverTrackerSuite.scala
    ae29152 [Nishkam Ravi] Update test suite with TD's suggestions
    48a4a97 [nishkamravi2] Update ReceiverTracker.scala
    bc23907 [nishkamravi2] Update ReceiverTracker.scala
    68e8540 [nishkamravi2] Update SchedulerSuite.scala
    4604f28 [nishkamravi2] Update SchedulerSuite.scala
    179b90f [nishkamravi2] Update ReceiverTracker.scala
    242e677 [nishkamravi2] Update SchedulerSuite.scala
    7f3e028 [Nishkam Ravi] Update ReceiverTracker.scala, add unit test cases in SchedulerSuite
    f8a3e05 [nishkamravi2] Update ReceiverTracker.scala
    4cf97b6 [nishkamravi2] Update ReceiverTracker.scala
    16e84ec [Nishkam Ravi] Update ReceiverTracker.scala
    45e3a99 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
    02dbdb8 [Nishkam Ravi] Update ReceiverTracker.scala
    07b9dfa [nishkamravi2] Update ReceiverTracker.scala
    6caeefe [nishkamravi2] Update ReceiverTracker.scala
    7888257 [nishkamravi2] Update ReceiverTracker.scala
    6e3515c [Nishkam Ravi] Minor changes
    975b8d8 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
    3cac21b [Nishkam Ravi] Generalize the scheduling algorithm
    b05ee2f [nishkamravi2] Update ReceiverTracker.scala
    bb5e09b [Nishkam Ravi] Add a new var in receiver to store location information for round-robin scheduling
    41705de [nishkamravi2] Update ReceiverTracker.scala
    fff1b2e [Nishkam Ravi] Round-robin scheduling of streaming receivers
    nishkamravi2 authored and tdas committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    ca7e460 View commit details
    Browse the repository at this point in the history
  30. [SPARK-8630] [STREAMING] Prevent from checkpointing QueueInputDStream

    This PR throws an exception in `QueueInputDStream.writeObject` so that it can fail the application when calling `StreamingContext.start` rather than failing it during recovering QueueInputDStream.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#7016 from zsxwing/queueStream-checkpoint and squashes the following commits:
    
    89a3d73 [zsxwing] Fix JavaAPISuite.testQueueStream
    cc40fd7 [zsxwing] Prevent from checkpointing QueueInputDStream
    zsxwing authored and tdas committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    5726440 View commit details
    Browse the repository at this point in the history
  31. [SPARK-8619] [STREAMING] Don't recover keytab and principal configura…

    …tion within Streaming checkpoint
    
    [Client.scala](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L786) will change these configurations, so this would cause the problem that the Streaming recover logic can't find the local keytab file(since configuration was changed)
    ```scala
          sparkConf.set("spark.yarn.keytab", keytabFileName)
          sparkConf.set("spark.yarn.principal", args.principal)
    ```
    
    Problem described at [Jira](https://issues.apache.org/jira/browse/SPARK-8619)
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes apache#7008 from SaintBacchus/SPARK-8619 and squashes the following commits:
    
    d50dbdf [huangzhaowei] Delect one blank space
    9b8e92c [huangzhaowei] Fix code style and add a short comment.
    0d8f800 [huangzhaowei] Don't recover keytab and principal configuration within Streaming checkpoint.
    SaintBacchus authored and tdas committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    d16a944 View commit details
    Browse the repository at this point in the history
  32. [SPARK-6785] [SQL] fix DateTimeUtils for dates before 1970

    Hi Michael,
    this Pull-Request is a follow-up to [PR-6242](apache#6242). I removed the two obsolete test cases from the HiveQuerySuite and deleted the corresponding golden answer files.
    Thanks for your review!
    
    Author: Christian Kadner <ckadner@us.ibm.com>
    
    Closes apache#6983 from ckadner/SPARK-6785 and squashes the following commits:
    
    ab1e79b [Christian Kadner] Merge remote-tracking branch 'origin/SPARK-6785' into SPARK-6785
    1fed877 [Christian Kadner] [SPARK-6785][SQL] failed Scala style test, remove spaces on empty line DateTimeUtils.scala:61
    9d8021d [Christian Kadner] [SPARK-6785][SQL] merge recent changes in DateTimeUtils & MiscFunctionsSuite
    b97c3fb [Christian Kadner] [SPARK-6785][SQL] move test case for DateTimeUtils to DateTimeUtilsSuite
    a451184 [Christian Kadner] [SPARK-6785][SQL] fix DateTimeUtils.fromJavaDate(java.util.Date) for Dates before 1970
    ckadner authored and marmbrus committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    1e1f339 View commit details
    Browse the repository at this point in the history
  33. [SPARK-8664] [ML] Add PCA transformer

    Add PCA transformer for ML pipeline
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes apache#7065 from yanboliang/spark-8664 and squashes the following commits:
    
    4afae45 [Yanbo Liang] address comments
    e9effd7 [Yanbo Liang] Add PCA transformer
    yanboliang authored and mengxr committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    c1befd7 View commit details
    Browse the repository at this point in the history
  34. [SPARK-8628] [SQL] Race condition in AbstractSparkSQLParser.parse

    Made lexical iniatialization as lazy val
    
    Author: Vinod K C <vinod.kc@huawei.com>
    
    Closes apache#7015 from vinodkc/handle_lexical_initialize_schronization and squashes the following commits:
    
    b6d1c74 [Vinod K C] Avoided repeated lexical  initialization
    5863cf7 [Vinod K C] Removed space
    e27c66c [Vinod K C] Avoid reinitialization of lexical in parse method
    ef4f60f [Vinod K C] Reverted import order
    e9fc49a [Vinod K C] handle  synchronization in SqlLexical.initialize
    Vinod K C authored and marmbrus committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    b8e5bb6 View commit details
    Browse the repository at this point in the history
  35. [SPARK-8471] [ML] Discrete Cosine Transform Feature Transformer

    Implementation and tests for Discrete Cosine Transformer.
    
    Author: Feynman Liang <fliang@databricks.com>
    
    Closes apache#6894 from feynmanliang/dct-features and squashes the following commits:
    
    433dbc7 [Feynman Liang] Test refactoring
    91e9636 [Feynman Liang] Style guide and test helper refactor
    b5ac19c [Feynman Liang] Use Vector types, add Java test
    530983a [Feynman Liang] Tests for other numeric datatypes
    195d7aa [Feynman Liang] Implement support for arbitrary numeric types
    95d4939 [Feynman Liang] Working DCT for 1D Doubles
    Feynman Liang authored and jkbradley committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    74cc16d View commit details
    Browse the repository at this point in the history
  36. [SPARK-7514] [MLLIB] Add MinMaxScaler to feature transformation

    jira: https://issues.apache.org/jira/browse/SPARK-7514
    Add a popular scaling method to feature component, which is commonly known as min-max normalization or Rescaling.
    
    Core function is,
    Normalized(x) = (x - min) / (max - min) * scale + newBase
    
    where `newBase` and `scale` are parameters (type Double) of the `VectorTransformer`. `newBase` is the new minimum number for the features, and `scale` controls the ranges after transformation. This is a little complicated than the basic MinMax normalization, yet it provides flexibility so that users can control the range more specifically. like [0.1, 0.9] in some NN application.
    
    For case that `max == min`, 0.5 is used as the raw value. (0.5 * scale + newBase)
    I'll add UT once the design got settled ( and this is not considered as too naive)
    
    reference:
     http://en.wikipedia.org/wiki/Feature_scaling
    http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm
    
    Author: Yuhao Yang <hhbyyh@gmail.com>
    
    Closes apache#6039 from hhbyyh/minMaxNorm and squashes the following commits:
    
    f942e9f [Yuhao Yang] add todo for metadata
    8b37bbc [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm
    4894dbc [Yuhao Yang] add copy
    fa2989f [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm
    29db415 [Yuhao Yang] add clue and minor adjustment
    5b8f7cc [Yuhao Yang] style fix
    9b133d0 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm
    22f20f2 [Yuhao Yang] style change and bug fix
    747c9bb [Yuhao Yang] add ut and remove mllib version
    a5ba0aa [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm
    585cc07 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm
    1c6dcb1 [Yuhao Yang] minor change
    0f1bc80 [Yuhao Yang] add MinMaxScaler to ml
    8e7436e [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm
    3663165 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm
    1247c27 [Yuhao Yang] some comments improvement
    d285a19 [Yuhao Yang] initial checkin for minMaxNorm
    hhbyyh authored and jkbradley committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    61d7b53 View commit details
    Browse the repository at this point in the history
  37. [SPARK-8560] [UI] The Executors page will have negative if having res…

    …ubmitted tasks
    
     when the ```taskEnd.reason``` is ```Resubmitted```, it shouldn't  do statistics. Because this tasks has a ```SUCCESS``` taskEnd before.
    
    Author: xutingjun <xutingjun@huawei.com>
    
    Closes apache#6950 from XuTingjun/pageError and squashes the following commits:
    
    af35dc3 [xutingjun] When taskEnd is Resubmitted, don't do statistics
    XuTingjun authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    79f0b37 View commit details
    Browse the repository at this point in the history
  38. [SPARK-2645] [CORE] Allow SparkEnv.stop() to be called multiple times…

    … without side effects.
    
    Fix for SparkContext stop behavior - Allow sc.stop() to be called multiple times without side effects.
    
    Author: Joshi <rekhajoshm@gmail.com>
    Author: Rekha Joshi <rekhajoshm@gmail.com>
    
    Closes apache#6973 from rekhajoshm/SPARK-2645 and squashes the following commits:
    
    277043e [Joshi] Fix for SparkContext stop behavior
    446b0a4 [Joshi] Fix for SparkContext stop behavior
    2ce5760 [Joshi] Fix for SparkContext stop behavior
    c97839a [Joshi] Fix for SparkContext stop behavior
    1aff39c [Joshi] Fix for SparkContext stop behavior
    12f66b5 [Joshi] Fix for SparkContext stop behavior
    72bb484 [Joshi] Fix for SparkContext stop behavior
    a5a7d7f [Joshi] Fix for SparkContext stop behavior
    9193a0c [Joshi] Fix for SparkContext stop behavior
    58dba70 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
    380c5b0 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
    b566b66 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
    0be142d [Rekha Joshi] Merge pull request #3 from apache/master
    106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
    e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
    rekhajoshm authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    7dda084 View commit details
    Browse the repository at this point in the history
  39. [SPARK-8372] Do not show applications that haven't recorded their app…

    … ID yet.
    
    Showing these applications may lead to weird behavior in the History Server. For old logs, if
    the app ID is recorded later, you may end up with a duplicate entry. For new logs, the app might
    be listed with a ".inprogress" suffix.
    
    So ignore those, but still allow old applications that don't record app IDs at all (1.0 and 1.1) to be shown.
    
    Author: Marcelo Vanzin <vanzin@cloudera.com>
    Author: Carson Wang <carson.wang@intel.com>
    
    Closes apache#7097 from vanzin/SPARK-8372 and squashes the following commits:
    
    a24eab2 [Marcelo Vanzin] Feedback.
    112ae8f [Marcelo Vanzin] Merge branch 'master' into SPARK-8372
    7b91b74 [Marcelo Vanzin] Handle logs generated by 1.0 and 1.1.
    1eca3fe [Carson Wang] [SPARK-8372] History server shows incorrect information for application not started
    Marcelo Vanzin authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    4bb8375 View commit details
    Browse the repository at this point in the history
  40. [SPARK-8736] [ML] GBTRegressor should not threshold prediction

    Changed GBTRegressor so it does NOT threshold the prediction.  Added test which fails with bug but works after fix.
    
    CC: feynmanliang  mengxr
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes apache#7134 from jkbradley/gbrt-fix and squashes the following commits:
    
    613b90e [Joseph K. Bradley] Changed GBTRegressor so it does NOT threshold the prediction
    jkbradley authored and mengxr committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    3ba23ff View commit details
    Browse the repository at this point in the history
  41. [SPARK-8705] [WEBUI] Don't display rects when totalExecutionTime is 0

    Because `System.currentTimeMillis()` is not accurate for tasks that only need several milliseconds, sometimes `totalExecutionTime` in `makeTimeline` will be 0. If `totalExecutionTime` is 0, there will the following error in the console.
    
    ![screen shot 2015-06-29 at 7 08 55 pm](https://cloud.githubusercontent.com/assets/1000778/8406776/5cd38e04-1e92-11e5-89f2-0c5134fe4b6b.png)
    
    This PR fixes it by using an empty svg tag when `totalExecutionTime` is 0. This is a screenshot for a task that its totalExecutionTime is 0 after fixing it.
    
    ![screen shot 2015-06-30 at 12 26 52 am](https://cloud.githubusercontent.com/assets/1000778/8412896/7b33b4be-1ebf-11e5-9100-d6d656af3747.png)
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#7088 from zsxwing/SPARK-8705 and squashes the following commits:
    
    9ee4ef5 [zsxwing] Address comments
    ef2ecfa [zsxwing] Don't display rects when totalExecutionTime is 0
    zsxwing authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    8c89896 View commit details
    Browse the repository at this point in the history
  42. [SPARK-8563] [MLLIB] Fixed a bug so that IndexedRowMatrix.computeSVD(…

    …).U.numCols = k
    
    I'm sorry that I made apache#6949 closed by mistake.
    I pushed codes again.
    
    And, I added a test code.
    
    >
    There is a bug that `U.numCols() = self.nCols` in `IndexedRowMatrix.computeSVD()`
    It should have been `U.numCols() = k = svd.U.numCols()`
    
    >
    ```
    self = U * sigma * V.transpose
    (m x n) = (m x n) * (k x k) * (k x n) //ASIS
    -->
    (m x n) = (m x k) * (k x k) * (k x n) //TOBE
    ```
    
    Author: lee19 <lee19@live.co.kr>
    
    Closes apache#6953 from lee19/MLlibBugfix and squashes the following commits:
    
    c1812a0 [lee19] [SPARK-8563] [MLlib] Used nRows instead of numRows() to reduce a burden.
    4b9803b [lee19] [SPARK-8563] [MLlib] Fixed a build error.
    c2ccd89 [lee19] Added a unit test that validates matrix sizes of svd for [SPARK-8563][MLlib]
    8373424 [lee19] [SPARK-8563][MLlib] Fixed a bug so that IndexedRowMatrix.computeSVD().U.numCols = k
    lee19 authored and mengxr committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    e725262 View commit details
    Browse the repository at this point in the history
  43. [SPARK-8739] [WEB UI] [WINDOWS] A illegal character \r can be conta…

    …ined in StagePage.
    
    This issue was reported by saurfang. Thanks!
    
    There is a following code in StagePage.scala.
    
    ```
                       |width="$serializationTimeProportion%"></rect>
                     |<rect class="getting-result-time-proportion"
                       |x="$gettingResultTimeProportionPos%" y="0px" height="26px"
                       |width="$gettingResultTimeProportion%"></rect></svg>',
                   |'start': new Date($launchTime),
                   |'end': new Date($finishTime)
                 |}
               |""".stripMargin.replaceAll("\n", " ")
    ```
    
    The last `replaceAll("\n", "")` doesn't work when we checkout and build source code on Windows and deploy on Linux.
    It's because when we checkout the source code on Windows, new-line-code is replaced with `"\r\n"` and `replaceAll("\n", "")` replaces only `"\n"`.
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes apache#7133 from sarutak/SPARK-8739 and squashes the following commits:
    
    17fb044 [Kousuke Saruta] Fixed a new-line-code issue
    sarutak authored and Andrew Or committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    d2495f7 View commit details
    Browse the repository at this point in the history
  44. [SPARK-8738] [SQL] [PYSPARK] capture SQL AnalysisException in Python API

    Capture the AnalysisException in SQL, hide the long java stack trace, only show the error message.
    
    cc rxin
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#7135 from davies/ananylis and squashes the following commits:
    
    dad7ae7 [Davies Liu] add comment
    ec0c0e8 [Davies Liu] Update utils.py
    cdd7edd [Davies Liu] add doc
    7b044c2 [Davies Liu] fix python 3
    f84d3bd [Davies Liu] capture SQL AnalysisException in Python API
    Davies Liu committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    58ee2a2 View commit details
    Browse the repository at this point in the history
  45. [SPARK-7739] [MLLIB] Improve ChiSqSelector example code in user guide

    Author: sethah <seth.hendrickson16@gmail.com>
    
    Closes apache#7029 from sethah/working_on_SPARK-7739 and squashes the following commits:
    
    ef96916 [sethah] Fixing some style issues
    efea1f8 [sethah] adding clarification to ChiSqSelector example
    sethah authored and jkbradley committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    8d23587 View commit details
    Browse the repository at this point in the history
  46. [SPARK-8741] [SQL] Remove e and pi from DataFrame functions.

    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7137 from rxin/SPARK-8741 and squashes the following commits:
    
    32c7e75 [Reynold Xin] [SPARK-8741][SQL] Remove e and pi from DataFrame functions.
    rxin authored and Davies Liu committed Jun 30, 2015
    Configuration menu
    Copy the full SHA
    8133125 View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2015

  1. [SPARK-8727] [SQL] Missing python api; md5, log2

    Jira: https://issues.apache.org/jira/browse/SPARK-8727
    
    Author: Tarek Auel <tarek.auel@gmail.com>
    Author: Tarek Auel <tarek.auel@googlemail.com>
    
    Closes apache#7114 from tarekauel/missing-python and squashes the following commits:
    
    ef4c61b [Tarek Auel] [SPARK-8727] revert dataframe change
    4029d4d [Tarek Auel] removed dataframe pi and e unit test
    66f0d2b [Tarek Auel] removed pi and e from python api and dataframe api; added _to_java_column(col) for strlen
    4d07318 [Tarek Auel] fixed python unit test
    45f2bee [Tarek Auel] fixed result of pi and e
    c39f47b [Tarek Auel] add python api
    bd50a3a [Tarek Auel] add missing python functions
    tarekbecker authored and Davies Liu committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    ccdb052 View commit details
    Browse the repository at this point in the history
  2. [SPARK-6602][Core] Update Master, Worker, Client, AppClient and relat…

    …ed classes to use RpcEndpoint
    
    This PR updates the rest Actors in core to RpcEndpoint.
    
    Because there is no `ActorSelection` in RpcEnv, I changes the logic of `registerWithMaster` in Worker and AppClient to avoid blocking the message loop. These changes need to be reviewed carefully.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#5392 from zsxwing/rpc-rewrite-part3 and squashes the following commits:
    
    2de7bed [zsxwing] Merge branch 'master' into rpc-rewrite-part3
    f12d943 [zsxwing] Address comments
    9137b82 [zsxwing] Fix the code style
    e734c71 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
    2d24fb5 [zsxwing] Fix the code style
    5a82374 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
    fa47110 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
    72304f0 [zsxwing] Update the error strategy for AkkaRpcEnv
    e56cb16 [zsxwing] Always send failure back to the sender
    a7b86e6 [zsxwing] Use JFuture for java.util.concurrent.Future
    aa34b9b [zsxwing] Fix the code style
    bd541e7 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
    25a84d8 [zsxwing] Use ThreadUtils
    060ff31 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
    dbfc916 [zsxwing] Improve the docs and comments
    837927e [zsxwing] Merge branch 'master' into rpc-rewrite-part3
    5c27f97 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
    fadbb9e [zsxwing] Fix the code style
    6637e3c [zsxwing] Merge remote-tracking branch 'origin/master' into rpc-rewrite-part3
    7fdee0e [zsxwing] Fix the return type to ExecutorService and ScheduledExecutorService
    e8ad0a5 [zsxwing] Fix the code style
    6b2a104 [zsxwing] Log error and use SparkExitCode.UNCAUGHT_EXCEPTION exit code
    fbf3194 [zsxwing] Add Utils.newDaemonSingleThreadExecutor and newDaemonSingleThreadScheduledExecutor
    b776817 [zsxwing] Update Master, Worker, Client, AppClient and related classes to use RpcEndpoint
    zsxwing authored and rxin committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    3bee0f1 View commit details
    Browse the repository at this point in the history
  3. [SPARK-8471] [ML] Rename DiscreteCosineTransformer to DCT

    Rename DiscreteCosineTransformer and related classes to DCT.
    
    Author: Feynman Liang <fliang@databricks.com>
    
    Closes apache#7138 from feynmanliang/dct-features and squashes the following commits:
    
    e547b3e [Feynman Liang] Fix renaming bug
    9d5c9e4 [Feynman Liang] Lowercase JavaDCTSuite variable
    f9a8958 [Feynman Liang] Remove old files
    f8fe794 [Feynman Liang] Merge branch 'master' into dct-features
    894d0b2 [Feynman Liang] Rename DiscreteCosineTransformer to DCT
    433dbc7 [Feynman Liang] Test refactoring
    91e9636 [Feynman Liang] Style guide and test helper refactor
    b5ac19c [Feynman Liang] Use Vector types, add Java test
    530983a [Feynman Liang] Tests for other numeric datatypes
    195d7aa [Feynman Liang] Implement support for arbitrary numeric types
    95d4939 [Feynman Liang] Working DCT for 1D Doubles
    Feynman Liang authored and jkbradley committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    f457569 View commit details
    Browse the repository at this point in the history
  4. [SPARK-8535] [PYSPARK] PySpark : Can't create DataFrame from Pandas d…

    …ataframe with no explicit column name
    
    Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`.
    So I think `pandas.columns` are should be convert to `String`.
    
    ### issue
    
    * [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535)
    
    Author: x1- <viva008@gmail.com>
    
    Closes apache#7124 from x1-/SPARK-8535 and squashes the following commits:
    
    d68fd38 [x1-] modify unit-test using pandas.
    ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String.
    x1- authored and Davies Liu committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    b6e76ed View commit details
    Browse the repository at this point in the history
  5. [SPARK-6602][Core]Remove unnecessary synchronized

    A follow-up pr to address apache#5392 (comment)
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#7141 from zsxwing/pr5392-follow-up and squashes the following commits:
    
    fcf7b50 [zsxwing] Remove unnecessary synchronized
    zsxwing authored and rxin committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    64c1461 View commit details
    Browse the repository at this point in the history
  6. [SPARK-8748][SQL] Move castability test out from Cast case class into…

    … Cast object.
    
    This patch moved resolve function in Cast case class into the companion object, and renamed it canCast. We can then use this in the analyzer without a Cast expr.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7145 from rxin/cast and squashes the following commits:
    
    cd086a9 [Reynold Xin] Whitespace changes.
    4d2d989 [Reynold Xin] [SPARK-8748][SQL] Move castability test out from Cast case class into Cast object.
    rxin committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    365c140 View commit details
    Browse the repository at this point in the history
  7. [SPARK-8749][SQL] Remove HiveTypeCoercion trait.

    Moved all the rules into the companion object.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7147 from rxin/SPARK-8749 and squashes the following commits:
    
    c1c6dc0 [Reynold Xin] [SPARK-8749][SQL] Remove HiveTypeCoercion trait.
    rxin committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    fc3a6fe View commit details
    Browse the repository at this point in the history
  8. [SQL] [MINOR] remove internalRowRDD in DataFrame

    Developers have already familiar with `queryExecution.toRDD` as internal row RDD, and we should not add new concept.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#7116 from cloud-fan/internal-rdd and squashes the following commits:
    
    24756ca [Wenchen Fan] remove internalRowRDD
    cloud-fan authored and marmbrus committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    0eee061 View commit details
    Browse the repository at this point in the history
  9. [SPARK-8750][SQL] Remove the closure in functions.callUdf.

    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7148 from rxin/calludf-closure and squashes the following commits:
    
    00df372 [Reynold Xin] Fixed index out of bound exception.
    4beba76 [Reynold Xin] [SPARK-8750][SQL] Remove the closure in functions.callUdf.
    rxin committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    9765241 View commit details
    Browse the repository at this point in the history
  10. [SPARK-8763] [PYSPARK] executing run-tests.py with Python 2.6 fails w…

    …ith absence of subprocess.check_output function
    
    Running run-tests.py with Python 2.6 cause following error:
    
    ```
    Running PySpark tests. Output is in python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
    Will test against the following Python executables: ['python2.6', 'python3.4', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Traceback (most recent call last):
      File "./python/run-tests.py", line 196, in <module>
        main()
      File "./python/run-tests.py", line 159, in main
        python_implementation = subprocess.check_output(
    AttributeError: 'module' object has no attribute 'check_output'
    ...
    ```
    
    The cause of this error is using subprocess.check_output function, which exists since Python 2.7.
    (ref. https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)
    
    Author: cocoatomo <cocoatomo77@gmail.com>
    
    Closes apache#7161 from cocoatomo/issues/8763-test-fails-py26 and squashes the following commits:
    
    cf4f901 [cocoatomo] [SPARK-8763] backport process.check_output function from Python 2.7
    cocoatomo authored and Davies Liu committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    fdcad6e View commit details
    Browse the repository at this point in the history
  11. [SPARK-7714] [SPARKR] SparkR tests should use more specific expectati…

    …ons than expect_true
    
    1. Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'.
    2. Update the pattern 'expect_true(inherits(a, b))' to 'expect_is(a, b)'.
    3. Update the pattern 'expect_true(identical(a, b))' to 'expect_identical(a, b)'.
    
    Author: Sun Rui <rui.sun@intel.com>
    
    Closes apache#7152 from sun-rui/SPARK-7714 and squashes the following commits:
    
    8ad2440 [Sun Rui] Fix test case errors.
    8fe9f0c [Sun Rui] Update the pattern 'expect_true(identical(a, b))' to 'expect_identical(a, b)'.
    f1b8005 [Sun Rui] Update the pattern 'expect_true(inherits(a, b))' to 'expect_is(a, b)'.
    f631e94 [Sun Rui] Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'.
    Sun Rui authored and shivaram committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    69c5dee View commit details
    Browse the repository at this point in the history
  12. [SPARK-8752][SQL] Add ExpectsInputTypes trait for defining expected i…

    …nput types.
    
    This patch doesn't actually introduce any code that uses the new ExpectsInputTypes. It just adds the trait so others can use it. Also renamed the old expectsInputTypes function to just inputTypes.
    
    We should add implicit type casting also in the future.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7151 from rxin/expects-input-types and squashes the following commits:
    
    16cf07b [Reynold Xin] [SPARK-8752][SQL] Add ExpectsInputTypes trait for defining expected input types.
    rxin committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    4137f76 View commit details
    Browse the repository at this point in the history
  13. [SPARK-8621] [SQL] support empty string as column name

    improve the empty check in `parseAttributeName` so that we can allow empty string as column name.
    Close apache#7117
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#7149 from cloud-fan/8621 and squashes the following commits:
    
    efa9e3e [Wenchen Fan] support empty string
    cloud-fan authored and rxin committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    31b4a3d View commit details
    Browse the repository at this point in the history
  14. [SPARK-6263] [MLLIB] Python MLlib API missing items: Utils

    Implement missing API in pyspark.
    
    MLUtils
    * appendBias
    * loadVectors
    
    `kFold` is also missing however I am not sure `ClassTag` can be passed or restored through python.
    
    Author: lewuathe <lewuathe@me.com>
    
    Closes apache#5707 from Lewuathe/SPARK-6263 and squashes the following commits:
    
    16863ea [lewuathe] Merge master
    3fc27e7 [lewuathe] Merge branch 'master' into SPARK-6263
    6084e9c [lewuathe] Resolv conflict
    d2aa2a0 [lewuathe] Resolv conflict
    9c329d8 [lewuathe] Fix efficiency
    3a12a2d [lewuathe] Merge branch 'master' into SPARK-6263
    1d4714b [lewuathe] Fix style
    b29e2bc [lewuathe] Remove scipy dependencies
    e32eb40 [lewuathe] Merge branch 'master' into SPARK-6263
    25d3c9d [lewuathe] Remove unnecessary imports
    7ec04db [lewuathe] Resolv conflict
    1502d13 [lewuathe] Resolv conflict
    d6bd416 [lewuathe] Check existence of scipy.sparse
    5d555b1 [lewuathe] Construct scipy.sparse matrix
    c345a44 [lewuathe] Merge branch 'master' into SPARK-6263
    b8b5ef7 [lewuathe] Fix unnecessary sort method
    d254be7 [lewuathe] Merge branch 'master' into SPARK-6263
    62a9c7e [lewuathe] Fix appendBias return type
    454c73d [lewuathe] Merge branch 'master' into SPARK-6263
    a353354 [lewuathe] Remove unnecessary appendBias implementation
    44295c2 [lewuathe] Merge branch 'master' into SPARK-6263
    64f72ad [lewuathe] Merge branch 'master' into SPARK-6263
    c728046 [lewuathe] Fix style
    2980569 [lewuathe] [SPARK-6263] Python MLlib API missing items: Utils
    Lewuathe authored and jkbradley committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    184de91 View commit details
    Browse the repository at this point in the history
  15. [SPARK-8308] [MLLIB] add missing save load for python example

    jira: https://issues.apache.org/jira/browse/SPARK-8308
    
    1. add some missing save/load in python examples. , LogisticRegression, LinearRegression and NaiveBayes
    2. tune down iterations for MatrixFactorization, since current number will trigger StackOverflow for default java configuration (>1M)
    
    Author: Yuhao Yang <hhbyyh@gmail.com>
    
    Closes apache#6760 from hhbyyh/docUpdate and squashes the following commits:
    
    9bd3383 [Yuhao Yang] update scala example
    8a44692 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate
    077cbb8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate
    3e948dc [Yuhao Yang] add missing save load for python example
    hhbyyh authored and jkbradley committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    2012913 View commit details
    Browse the repository at this point in the history
  16. [SPARK-8765] [MLLIB] [PYTHON] removed flaky python PIC test

    See failure: [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
    
    CC yanboliang  mengxr
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes apache#7164 from jkbradley/pic-python-test and squashes the following commits:
    
    156d55b [Joseph K. Bradley] removed flaky python PIC test
    jkbradley authored and mengxr committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    b8faa32 View commit details
    Browse the repository at this point in the history
  17. [SPARK-8378] [STREAMING] Add the Python API for Flume

    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#6830 from zsxwing/flume-python and squashes the following commits:
    
    78dfdac [zsxwing] Fix the compile error in the test code
    f1bf3c0 [zsxwing] Address TD's comments
    0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly
    e93736b [zsxwing] Fix the test case for determine_modules_to_test
    9d5821e [zsxwing] Fix pyspark_core dependencies
    f9ee681 [zsxwing] Merge branch 'master' into flume-python
    7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py
    b96b0de [zsxwing] Merge branch 'master' into flume-python
    ce85e83 [zsxwing] Fix incompatible issues for Python 3
    01cbb3d [zsxwing] Add import sys
    152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3
    14ba0ff [zsxwing] Add flume-assembly for sbt building
    b8d5551 [zsxwing] Merge branch 'master' into flume-python
    4762c34 [zsxwing] Fix the doc
    0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API
    9f33873 [zsxwing] Add the Python API for Flume
    zsxwing authored and tdas committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    75b9fe4 View commit details
    Browse the repository at this point in the history
  18. [SPARK-7820] [BUILD] Fix Java8-tests suite compile and test error und…

    …er sbt
    
    Author: jerryshao <saisai.shao@intel.com>
    
    Closes apache#7120 from jerryshao/SPARK-7820 and squashes the following commits:
    
    6902439 [jerryshao] fix Java8-tests suite compile error under sbt
    jerryshao authored and JoshRosen committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    9f7db34 View commit details
    Browse the repository at this point in the history
  19. [QUICKFIX] [SQL] fix copy of generated row

    copy() of generated Row doesn't check nullability of columns
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#7163 from davies/fix_copy and squashes the following commits:
    
    661a206 [Davies Liu] fix copy of generated row
    Davies Liu committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    3083e17 View commit details
    Browse the repository at this point in the history
  20. [SPARK-3444] [CORE] Restore INFO level after log4j test.

    Otherwise other tests don't log anything useful...
    
    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes apache#7140 from vanzin/SPARK-3444 and squashes the following commits:
    
    de14836 [Marcelo Vanzin] Better fix.
    6cff13a [Marcelo Vanzin] [SPARK-3444] [core] Restore INFO level after log4j test.
    Marcelo Vanzin authored and srowen committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    1ce6428 View commit details
    Browse the repository at this point in the history
  21. [SPARK-8766] support non-ascii character in column names

    Use UTF-8 to encode the name of column in Python 2, or it may failed to encode with default encoding ('ascii').
    
    This PR also fix a bug when there is Java exception without error message.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#7165 from davies/non_ascii and squashes the following commits:
    
    02cb61a [Davies Liu] fix tests
    3b09d31 [Davies Liu] add encoding in header
    867754a [Davies Liu] support non-ascii character in column names
    Davies Liu committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    f958f27 View commit details
    Browse the repository at this point in the history
  22. [SPARK-8770][SQL] Create BinaryOperator abstract class.

    Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression.
    
    This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7170 from rxin/binaryoperator and squashes the following commits:
    
    51264a5 [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
    rxin committed Jul 1, 2015
    Configuration menu
    Copy the full SHA
    2727789 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    3a342de View commit details
    Browse the repository at this point in the history

Commits on Jul 2, 2015

  1. [SPARK-8770][SQL] Create BinaryOperator abstract class.

    Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression.
    
    This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7174 from rxin/binary-opterator and squashes the following commits:
    
    f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
    fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator
    d8518cf [Reynold Xin] Updated Python tests.
    rxin committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    9fd13d5 View commit details
    Browse the repository at this point in the history
  2. [SPARK-8660] [MLLIB] removed > symbols from comments in LogisticRegre…

    …ssionSuite.scala for ease of copypaste
    
    '>' symbols removed from comments in LogisticRegressionSuite.scala, for ease of copypaste
    
    also single-lined the multiline commands (is this desirable, or does it violate style?)
    
    Author: Rosstin <asterazul@gmail.com>
    
    Closes apache#7167 from Rosstin/SPARK-8660-2 and squashes the following commits:
    
    f4b9bc8 [Rosstin] SPARK-8660 restored character limit on multiline comments in LogisticRegressionSuite.scala
    fe6b112 [Rosstin] SPARK-8660 > symbols removed from LogisticRegressionSuite.scala for easy of copypaste
    39ddd50 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8661
    5a05dee [Rosstin] SPARK-8661 for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments to make it easier to copy-paste the R code.
    bb9a4b1 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8660
    242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala
    2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
    21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
    6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
    Rosstin authored and mengxr committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    4e4f74b View commit details
    Browse the repository at this point in the history
  3. [SPARK-8227] [SQL] Add function unhex

    cc chenghao-intel  adrian-wang
    
    Author: zhichao.li <zhichao.li@intel.com>
    
    Closes apache#7113 from zhichao-li/unhex and squashes the following commits:
    
    379356e [zhichao.li] remove exception checking
    a4ae6dc [zhichao.li] add udf_unhex to whitelist
    fe5c14a [zhichao.li] add todigit
    607d7a3 [zhichao.li] use checkInputTypes
    bffd37f [zhichao.li] change to use Hex in apache common package
    cde73f5 [zhichao.li] update to use AutoCastInputTypes
    11945c7 [zhichao.li] style
    c852d46 [zhichao.li] Add function unhex
    zhichao-li authored and Davies Liu committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    b285ac5 View commit details
    Browse the repository at this point in the history
  4. [SPARK-8754] [YARN] YarnClientSchedulerBackend doesn't stop gracefull…

    …y in failure conditions
    
    In YarnClientSchedulerBackend.stop(), added a check for monitorThread.
    
    Author: Devaraj K <devaraj@apache.org>
    
    Closes apache#7153 from devaraj-kavali/master and squashes the following commits:
    
    66be9ad [Devaraj K] https://issues.apache.org/jira/browse/SPARK-8754 YarnClientSchedulerBackend doesn't stop gracefully in failure conditions
    Devaraj K authored and Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    792fcd8 View commit details
    Browse the repository at this point in the history
  5. [SPARK-8688] [YARN] Bug fix: disable the cache fs to gain the HDFS co…

    …nnection.
    
    If `fs.hdfs.impl.disable.cache` was `false`(default), `FileSystem` will use the cached `DFSClient` which use old token.
    [AMDelegationTokenRenewer](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala#L196)
    ```scala
        val credentials = UserGroupInformation.getCurrentUser.getCredentials
        credentials.writeTokenStorageFile(tempTokenPath, discachedConfiguration)
    ```
    Although the `credentials` had the new Token, but it still use the cached client and old token.
    So It's better to set the `fs.hdfs.impl.disable.cache`  as `true` to avoid token expired.
    
    [Jira](https://issues.apache.org/jira/browse/SPARK-8688)
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes apache#7069 from SaintBacchus/SPARK-8688 and squashes the following commits:
    
    f94cd0b [huangzhaowei] modify function parameter
    8fb9eb9 [huangzhaowei] explicit  the comment
    0cd55c9 [huangzhaowei] Rename function name to be an accurate one
    cf776a1 [huangzhaowei] [SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection.
    SaintBacchus authored and Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    646366b View commit details
    Browse the repository at this point in the history
  6. [SPARK-8771] [TRIVIAL] Add a version to the deprecated annotation for…

    … the actorSystem
    
    Author: Holden Karau <holden@pigscanfly.ca>
    
    Closes apache#7172 from holdenk/SPARK-8771-actor-system-deprecation-tag-uses-deprecated-deprecation-tag and squashes the following commits:
    
    7f1455b [Holden Karau] Add .0s to the versions for the derpecated anotations in SparkEnv.scala
    ca13c9d [Holden Karau] Add a version to the deprecated annotation for the actorSystem in SparkEnv
    holdenk authored and Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    d14338e View commit details
    Browse the repository at this point in the history
  7. [SPARK-8769] [TRIVIAL] [DOCS] toLocalIterator should mention it resul…

    …ts in many jobs
    
    Author: Holden Karau <holden@pigscanfly.ca>
    
    Closes apache#7171 from holdenk/SPARK-8769-toLocalIterator-documentation-improvement and squashes the following commits:
    
    97ddd99 [Holden Karau] Add note
    holdenk authored and Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    15d41cc View commit details
    Browse the repository at this point in the history
  8. [SPARK-8740] [PROJECT INFRA] Support GitHub OAuth tokens in dev/merge…

    …_spark_pr.py
    
    This commit allows `dev/merge_spark_pr.py` to use personal GitHub OAuth tokens in order to make authenticated requests. This is necessary to work around per-IP rate limiting issues.
    
    To use a token, just set the `GITHUB_OAUTH_KEY` environment variable.  You can create a personal token at https://github.com/settings/tokens; we only require `public_repo` scope.
    
    If the script fails due to a rate-limit issue, it now logs a useful message directing the user to the OAuth token instructions.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7136 from JoshRosen/pr-merge-script-oauth-authentication and squashes the following commits:
    
    4d011bd [Josh Rosen] Fix error message
    23d92ff [Josh Rosen] Support GitHub OAuth tokens in dev/merge_spark_pr.py
    JoshRosen authored and Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    377ff4c View commit details
    Browse the repository at this point in the history
  9. [SPARK-3071] Increase default driver memory

    I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place.
    
    Please let me know if I've missed anything.
    
    Will the spark-shell use the value within the command line builder during instantiation?
    
    Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
    
    Closes apache#7132 from ilganeli/SPARK-3071 and squashes the following commits:
    
    4074164 [Ilya Ganelin] String fix
    271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071
    273b6e9 [Ilya Ganelin] Test fix
    fd67721 [Ilya Ganelin] Update JavaUtils.java
    26cc177 [Ilya Ganelin] test fix
    e5db35d [Ilya Ganelin] Fixed test failure
    39732a1 [Ilya Ganelin] merge fix
    a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each
    09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala
    19b6f25 [Ilya Ganelin] Missed one doc update
    2698a3d [Ilya Ganelin] Updated default value for driver memory
    Ilya Ganelin authored and Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    3697232 View commit details
    Browse the repository at this point in the history
  10. [SPARK-8687] [YARN] Fix bug: Executor can't fetch the new set configu…

    …ration in yarn-client
    
    Spark initi the properties CoarseGrainedSchedulerBackend.start
    ```scala
        // TODO (prashant) send conf instead of properties
        driverEndpoint = rpcEnv.setupEndpoint(
          CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new DriverEndpoint(rpcEnv, properties))
    ```
    Then the yarn logic will set some configuration but not update in this `properties`.
    So `Executor` won't gain the `properties`.
    
    [Jira](https://issues.apache.org/jira/browse/SPARK-8687)
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes apache#7066 from SaintBacchus/SPARK-8687 and squashes the following commits:
    
    1de4f48 [huangzhaowei] Ensure all necessary properties have already been set before startup ExecutorLaucher
    SaintBacchus authored and Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    1b0c8e6 View commit details
    Browse the repository at this point in the history
  11. [DOCS] Fix minor wrong lambda expression example.

    It's a really minor issue but there is an example with wrong lambda-expression usage in `SQLContext.scala` like as follows.
    
    ```
    sqlContext.udf().register("myUDF",
           (Integer arg1, String arg2) -> arg2 + arg1),  <- We have an extra `)` here.
           DataTypes.StringType);
    ```
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes apache#7187 from sarutak/fix-minor-wrong-lambda-expression and squashes the following commits:
    
    a13196d [Kousuke Saruta] Fixed minor wrong lambda expression example.
    sarutak committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    4158836 View commit details
    Browse the repository at this point in the history
  12. [SPARK-8787] [SQL] Changed parameter order of @deprecated in package …

    …object sql
    
    Parameter order of deprecated annotation in package object sql is wrong
    >>deprecated("1.3.0", "use DataFrame") .
    
    This has to be changed to deprecated("use DataFrame", "1.3.0")
    
    Author: Vinod K C <vinod.kc@huawei.com>
    
    Closes apache#7183 from vinodkc/fix_deprecated_param_order and squashes the following commits:
    
    1cbdbe8 [Vinod K C] Modified the message
    700911c [Vinod K C] Changed order of parameters
    Vinod K C authored and srowen committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    c572e25 View commit details
    Browse the repository at this point in the history
  13. [SPARK-8746] [SQL] update download link for Hive 0.13.1

    updated the [Hive 0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) download link in `sql/README.md`
    
    Author: Christian Kadner <ckadner@us.ibm.com>
    
    Closes apache#7144 from ckadner/SPARK-8746 and squashes the following commits:
    
    65d80f7 [Christian Kadner] [SPARK-8746][SQL] update download link for Hive 0.13.1
    ckadner authored and srowen committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    1bbdf9e View commit details
    Browse the repository at this point in the history
  14. [SPARK-8690] [SQL] Add a setting to disable SparkSQL parquet schema m…

    …erge by using datasource API
    
    The detail problem story is in https://issues.apache.org/jira/browse/SPARK-8690
    
    General speaking, I add a config spark.sql.parquet.mergeSchema to achieve the  sqlContext.load("parquet" , Map( "path" -> "..." , "mergeSchema" -> "false" ))
    
    It will become a simple flag and without any side affect.
    
    Author: Wisely Chen <wiselychen@appier.com>
    
    Closes apache#7070 from thegiive/SPARK8690 and squashes the following commits:
    
    c6f3e86 [Wisely Chen] Refactor some code style and merge the test case to ParquetSchemaMergeConfigSuite
    94c9307 [Wisely Chen] Remove some style problem
    db8ef1b [Wisely Chen] Change config to SQLConf and add test case
    b6806fb [Wisely Chen] remove text
    c0edb8c [Wisely Chen] [SPARK-8690] add a config spark.sql.parquet.mergeSchema to disable datasource API schema merge feature.
    Wisely Chen authored and liancheng committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    246265f View commit details
    Browse the repository at this point in the history
  15. [SPARK-8647] [MLLIB] Potential issue with constant hashCode

    I added the code,
      // see [SPARK-8647], this achieves the needed constant hash code without constant no.
      override def hashCode(): Int = this.getClass.getName.hashCode()
    
    does getting the constant hash code as per jira
    
    Author: Alok  Singh <singhal@Aloks-MacBook-Pro.local>
    
    Closes apache#7146 from aloknsingh/aloknsingh_SPARK-8647 and squashes the following commits:
    
    e58bccf [Alok  Singh] [SPARK-8647][MLlib] to avoid the class derivation issues, change the constant hashCode to override def hashCode(): Int = classOf[MatrixUDT].getName.hashCode()
    43cdb89 [Alok  Singh] [SPARK-8647][MLlib] Potential issue with constant hashCode
    Alok Singh authored and mengxr committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    99c40cd View commit details
    Browse the repository at this point in the history
  16. [SPARK-8758] [MLLIB] Add Python user guide for PowerIterationClustering

    Add Python user guide for PowerIterationClustering
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes apache#7155 from yanboliang/spark-8758 and squashes the following commits:
    
    18d803b [Yanbo Liang] address comments
    dd29577 [Yanbo Liang] Add Python user guide for PowerIterationClustering
    yanboliang authored and mengxr committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    0a468a4 View commit details
    Browse the repository at this point in the history
  17. [SPARK-8223] [SPARK-8224] [SQL] shift left and shift right

    Jira:
    https://issues.apache.org/jira/browse/SPARK-8223
    https://issues.apache.org/jira/browse/SPARK-8224
    
    ~~I am aware of apache#7174 and will update this pr, if it's merged.~~ Done
    I don't know if apache#7034 can simplify this, but we can have a look on it, if it gets merged
    
    rxin In the Jira ticket the function as no second argument. I added a `numBits` argument that allows to specify the number of bits. I guess this improves the usability. I wanted to add `shiftleft(value)` as well, but the `selectExpr` dataframe tests crashes, if I have both. I order to do this, I added the following to the functions.scala `def shiftRight(e: Column): Column = ShiftRight(e.expr, lit(1).expr)`, but as I mentioned this doesn't pass tests like `df.selectExpr("shiftRight(a)", ...` (not enough arguments exception).
    
    If we need the bitwise shift in order to be hive compatible, I suggest to add `shiftLeft` and something like `shiftLeftX`
    
    Author: Tarek Auel <tarek.auel@googlemail.com>
    
    Closes apache#7178 from tarekauel/8223 and squashes the following commits:
    
    8023bb5 [Tarek Auel] [SPARK-8223][SPARK-8224] fixed test
    f3f64e6 [Tarek Auel] [SPARK-8223][SPARK-8224] Integer -> Int
    f628706 [Tarek Auel] [SPARK-8223][SPARK-8224] removed toString; updated function description
    3b56f2a [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223
    5189690 [Tarek Auel] [SPARK-8223][SPARK-8224] minor fix and style fix
    9434a28 [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223
    44ee324 [Tarek Auel] [SPARK-8223][SPARK-8224] docu fix
    ac7fe9d [Tarek Auel] [SPARK-8223][SPARK-8224] right and left bit shift
    tarekbecker authored and Davies Liu committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    5b33381 View commit details
    Browse the repository at this point in the history
  18. [SPARK-8747] [SQL] fix EqualNullSafe for binary type

    also improve tests for binary comparison.
    
    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#7143 from cloud-fan/binary and squashes the following commits:
    
    28a5b76 [Wenchen Fan] improve test
    04ef4b0 [Wenchen Fan] fix equalNullSafe
    cloud-fan authored and Davies Liu committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    afa021e View commit details
    Browse the repository at this point in the history
  19. [SPARK-8407] [SQL] complex type constructors: struct and named_struct

    This is a follow up of [SPARK-8283](https://issues.apache.org/jira/browse/SPARK-8283) ([PR-6828](apache#6828)), to support both `struct` and `named_struct` in Spark SQL.
    
    After [apache#6725](apache#6828), the semantic of [`CreateStruct`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala#L56) methods have changed a little and do not limited to cols of `NamedExpressions`, it will name non-NamedExpression fields following the hive convention, col1, col2 ...
    
    This PR would both loosen [`struct`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L723) to take children of `Expression` type and add `named_struct` support.
    
    Author: Yijie Shen <henry.yijieshen@gmail.com>
    
    Closes apache#6874 from yijieshen/SPARK-8283 and squashes the following commits:
    
    4cd3375 [Yijie Shen] change struct documentation
    d599d0b [Yijie Shen] rebase code
    9a7039e [Yijie Shen] fix reviews and regenerate golden answers
    b487354 [Yijie Shen] replace assert using checkAnswer
    f07e114 [Yijie Shen] tiny fix
    9613be9 [Yijie Shen] review fix
    7fef712 [Yijie Shen] Fix checkInputTypes' implementation using foldable and nullable
    60812a7 [Yijie Shen] Fix type check
    828d694 [Yijie Shen] remove unnecessary resolved assertion inside dataType method
    fd3cd8e [Yijie Shen] remove type check from eval
    7a71255 [Yijie Shen] tiny fix
    ccbbd86 [Yijie Shen] Fix reviews
    47da332 [Yijie Shen] remove nameStruct API from DataFrame
    917e680 [Yijie Shen] Fix reviews
    4bd75ad [Yijie Shen] loosen struct method in functions.scala to take Expression children
    0acb7be [Yijie Shen] Add CreateNamedStruct in both DataFrame function API and FunctionRegistery
    yjshen authored and marmbrus committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    52302a8 View commit details
    Browse the repository at this point in the history
  20. [SPARK-8708] [MLLIB] Paritition ALS ratings based on both users and p…

    …roducts
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-8708
    
    Previously the partitions of ratings are only based on the given products. So if the `usersProducts` given for prediction contains only few products or even one product, the generated ratings will be pushed into few or single partition and can't use high parallelism.
    
    The following codes are the example reported in the JIRA. Because it asks the predictions for users on product 2. There is only one partition in the result.
    
        >>> r1 = (1, 1, 1.0)
        >>> r2 = (1, 2, 2.0)
        >>> r3 = (2, 1, 2.0)
        >>> r4 = (2, 2, 2.0)
        >>> r5 = (3, 1, 1.0)
        >>> ratings = sc.parallelize([r1, r2, r3, r4, r5], 5)
        >>> users = ratings.map(itemgetter(0)).distinct()
        >>> model = ALS.trainImplicit(ratings, 1, seed=10)
        >>> predictions_for_2 = model.predictAll(users.map(lambda u: (u, 2)))
        >>> predictions_for_2.glom().map(len).collect()
        [0, 0, 3, 0, 0]
    
    This PR uses user and product instead of only product to partition the ratings.
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    Author: Liang-Chi Hsieh <viirya@appier.com>
    
    Closes apache#7121 from viirya/mfm_fix_partition and squashes the following commits:
    
    779946d [Liang-Chi Hsieh] Calculate approximate numbers of users and products in one pass.
    4336dc2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into mfm_fix_partition
    83e56c1 [Liang-Chi Hsieh] Instead of additional join, use the numbers of users and products to decide how to perform join.
    b534dc8 [Liang-Chi Hsieh] Paritition ratings based on both users and products.
    viirya authored and mengxr committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    0e553a3 View commit details
    Browse the repository at this point in the history
  21. [SPARK-8581] [SPARK-8584] Simplify checkpointing code + better error …

    …message
    
    This patch rewrites the old checkpointing code in a way that is easier to understand. It also adds a guard against an invalid specification of checkpoint directory to provide a clearer error message. Most of the changes here are relatively minor.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes apache#6968 from andrewor14/checkpoint-cleanup and squashes the following commits:
    
    4ef8263 [Andrew Or] Use global synchronized instead
    6f6fd84 [Andrew Or] Merge branch 'master' of github.com:apache/spark into checkpoint-cleanup
    b1437ad [Andrew Or] Warn instead of throw
    5484293 [Andrew Or] Merge branch 'master' of github.com:apache/spark into checkpoint-cleanup
    7fb4af5 [Andrew Or] Guard against bad settings of checkpoint directory
    691da98 [Andrew Or] Simplify checkpoint code / code style / comments
    Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    2e2f326 View commit details
    Browse the repository at this point in the history
  22. [SPARK-8479] [MLLIB] Add numNonzeros and numActives to linalg.Matrices

    Matrices allow zeros to be stored in values. Sometimes a method is handy to check if the numNonZeros are same as number of Active values.
    
    Author: MechCoder <manojkumarsivaraj334@gmail.com>
    
    Closes apache#6904 from MechCoder/nnz_matrix and squashes the following commits:
    
    252c6b7 [MechCoder] Add to MiMa excludes
    e2390f5 [MechCoder] Use count instead of foreach
    2f62b2f [MechCoder] Add to MiMa excludes
    d6e96ef [MechCoder] [SPARK-8479] Add numNonzeros and numActives to linalg.Matrices
    MechCoder authored and mengxr committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    34d448d View commit details
    Browse the repository at this point in the history
  23. [SPARK-8781] Fix variables in published pom.xml are not resolved

    The issue is summarized in the JIRA and is caused by this commit: 984ad60.
    
    This patch reverts that commit and fixes the maven build in a different way. We limit the dependencies of `KinesisReceiverSuite` to avoid having to deal with the complexities in how maven deals with transitive test dependencies.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes apache#7193 from andrewor14/fix-kinesis-pom and squashes the following commits:
    
    ca3d5d4 [Andrew Or] Limit kinesis test dependencies
    f24e09c [Andrew Or] Revert "[BUILD] Fix Maven build for Kinesis"
    Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    82cf331 View commit details
    Browse the repository at this point in the history
  24. [SPARK-1564] [DOCS] Added Javascript to Javadocs to create badges for…

    … tags like :: Experimental ::
    
    Modified copy_api_dirs.rb and created api-javadocs.js and api-javadocs.css files in order to add badges to javadoc files for :: Experimental ::, :: DeveloperApi ::, and :: AlphaComponent :: tags
    
    Author: Deron Eriksson <deron@us.ibm.com>
    
    Closes apache#7169 from deroneriksson/SPARK-1564_JavaDocs_badges and squashes the following commits:
    
    a8353db [Deron Eriksson] added license headers to api-docs.css and api-javadocs.css
    07feb07 [Deron Eriksson] added linebreaks to make jquery more readable when adding html badge tags
    65b4930 [Deron Eriksson] Modified copy_api_dirs.rb and created api-javadocs.js and api-javadocs.css files in order to add badges to javadoc files for :: Experimental ::, :: DeveloperApi ::, and :: AlphaComponent :: tags
    deroneriksson authored and Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    fcbcba6 View commit details
    Browse the repository at this point in the history
  25. [SPARK-7835] Refactor HeartbeatReceiverSuite for coverage + cleanup

    The existing test suite has a lot of duplicate code and doesn't even cover the most fundamental feature of the HeartbeatReceiver, which is expiring hosts that have not responded in a while.
    
    This introduces manual clocks in `HeartbeatReceiver` and makes it respond to heartbeats only for registered executors. A few internal messages are moved to `receiveAndReply` to increase determinism of the tests so we don't have to rely on flaky constructs like `eventually`.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes apache#7173 from andrewor14/heartbeat-receiver-tests and squashes the following commits:
    
    4a903d6 [Andrew Or] Increase HeartReceiverSuite coverage and clean up
    Andrew Or committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    cd20355 View commit details
    Browse the repository at this point in the history
  26. [SPARK-8772][SQL] Implement implicit type cast for expressions that d…

    …efine input types.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7175 from rxin/implicitCast and squashes the following commits:
    
    88080a2 [Reynold Xin] Clearer definition of implicit type cast.
    f0ff97f [Reynold Xin] Added missing file.
    c65e532 [Reynold Xin] [SPARK-8772][SQL] Implement implicit type cast for expressions that defines input types.
    rxin committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    52508be View commit details
    Browse the repository at this point in the history
  27. [SPARK-3382] [MLLIB] GradientDescent convergence tolerance

    GrandientDescent can receive convergence tolerance value. Default value is 0.0.
    When loss value becomes less than the tolerance which is set by user, iteration is terminated.
    
    Author: lewuathe <lewuathe@me.com>
    
    Closes apache#3636 from Lewuathe/gd-convergence-tolerance and squashes the following commits:
    
    0b8a9a8 [lewuathe] Update doc
    ce91b15 [lewuathe] Merge branch 'master' into gd-convergence-tolerance
    4f22c2b [lewuathe] Modify based on SPARK-1503
    5e47b82 [lewuathe] Merge branch 'master' into gd-convergence-tolerance
    abadb7e [lewuathe] Fix LassoSuite
    8fadebd [lewuathe] Fix failed unit tests
    ee5de46 [lewuathe] Merge branch 'master' into gd-convergence-tolerance
    8313ba2 [lewuathe] Fix styles
    0ead94c [lewuathe] Merge branch 'master' into gd-convergence-tolerance
    a94cfd5 [lewuathe] Modify some styles
    3aef0a2 [lewuathe] Modify converged logic to do relative comparison
    f7b19d5 [lewuathe] [SPARK-3382] Clarify comparison logic
    e6c9cd2 [lewuathe] [SPARK-3382] Compare with the diff of solution vector
    4b125d2 [lewuathe] [SPARK3382] Fix scala style
    e7c10dd [lewuathe] [SPARK-3382] format improvements
    f867eea [lewuathe] [SPARK-3382] Modify warning message statements
    b9d5e61 [lewuathe] [SPARK-3382] should compare diff inside loss history and convergence tolerance
    5433f71 [lewuathe] [SPARK-3382] GradientDescent convergence tolerance
    Lewuathe authored and jkbradley committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    7d9cc96 View commit details
    Browse the repository at this point in the history
  28. [SPARK-8784] [SQL] Add Python API for hex and unhex

    Also improve the performance of hex/unhex
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#7181 from davies/hex and squashes the following commits:
    
    f032fbb [Davies Liu] Merge branch 'hex' of github.com:davies/spark into hex
    49e325f [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
    b31fc9a [Davies Liu] Update math.scala
    25156b7 [Davies Liu] address comments and fix test
    c3af78c [Davies Liu] address commments
    1a24082 [Davies Liu] Add Python API for hex and unhex
    Davies Liu authored and rxin committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    fc7aebd View commit details
    Browse the repository at this point in the history
  29. [SPARK-7104] [MLLIB] Support model save/load in Python's Word2Vec

    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes apache#6821 from yu-iskw/SPARK-7104 and squashes the following commits:
    
    975136b [Yu ISHIKAWA] Organize import
    0ef58b6 [Yu ISHIKAWA] Use rmtree, instead of removedirs
    cb21653 [Yu ISHIKAWA] Add an explicit type for `Word2VecModelWrapper.save`
    1d468ef [Yu ISHIKAWA] [SPARK-7104][MLlib] Support model save/load in Python's Word2Vec
    yu-iskw authored and jkbradley committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    488bad3 View commit details
    Browse the repository at this point in the history
  30. Revert "[SPARK-8784] [SQL] Add Python API for hex and unhex"

    This reverts commit fc7aebd.
    rxin committed Jul 2, 2015
    Configuration menu
    Copy the full SHA
    e589e71 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2015

  1. [SPARK-8782] [SQL] Fix code generation for ORDER BY NULL

    This fixes code generation for queries containing `ORDER BY NULL`.  Previously, the generated code would fail to compile.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7179 from JoshRosen/generate-order-fixes and squashes the following commits:
    
    6ef49a6 [Josh Rosen] Fix ORDER BY NULL
    0036696 [Josh Rosen] Add regression test for SPARK-8782 (ORDER BY NULL)
    JoshRosen authored and rxin committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    d983819 View commit details
    Browse the repository at this point in the history
  2. [SPARK-6980] [CORE] Akka timeout exceptions indicate which conf contr…

    …ols them (RPC Layer)
    
    Latest changes after refactoring to the RPC layer.  I rebased against trunk to make sure to get any recent changes since it had been a while.  I wasn't crazy about the name `ConfigureTimeout` and `RpcTimeout` seemed to fit better, but I'm open to suggestions!
    
    I ran most of the tests and they pass, but others would get stuck with "WARN TaskSchedulerImpl: Initial job has not accepted any resources".  I think its just my machine, so I'd though I would push what I have anyway.
    
    Still left to do:
    * I only added a couple unit tests so far, there are probably some more cases to test
    * Make sure all uses require a `RpcTimeout`
    * Right now, both the `ask` and `Await.result` use the same timeout, should we differentiate between these in the TimeoutException message?
    * I wrapped `Await.result` in `RpcTimeout`, should we also wrap `Await.ready`?
    * Proper scoping of classes and methods
    
    hardmettle, feel free to help out with any of these!
    
    Author: Bryan Cutler <bjcutler@us.ibm.com>
    Author: Harsh Gupta <harsh@Harshs-MacBook-Pro.local>
    Author: BryanCutler <cutlerb@gmail.com>
    
    Closes apache#6205 from BryanCutler/configTimeout-6980 and squashes the following commits:
    
    46c8d48 [Bryan Cutler] [SPARK-6980] Changed RpcEnvSuite test to never reply instead of just sleeping, to avoid possible sync issues
    06afa53 [Bryan Cutler] [SPARK-6980] RpcTimeout class extends Serializable, was causing error in MasterSuite
    7bb70f1 [Bryan Cutler] Merge branch 'master' into configTimeout-6980
    dbd5f73 [Bryan Cutler] [SPARK-6980] Changed RpcUtils askRpcTimeout and lookupRpcTimeout scope to private[spark] and improved deprecation warning msg
    4e89c75 [Bryan Cutler] [SPARK-6980] Missed one usage of deprecated RpcUtils.askTimeout in YarnSchedulerBackend although it is not being used, and fixed SparkConfSuite UT to not use deprecated RpcUtils functions
    6a1c50d [Bryan Cutler] [SPARK-6980] Minor cleanup of test case
    7f4d78e [Bryan Cutler] [SPARK-6980] Fixed scala style checks
    287059a [Bryan Cutler] [SPARK-6980] Removed extra import in AkkaRpcEnvSuite
    3d8b1ff [Bryan Cutler] [SPARK-6980] Cleaned up imports in AkkaRpcEnvSuite
    3a168c7 [Bryan Cutler] [SPARK-6980] Rewrote Akka RpcTimeout UTs in RpcEnvSuite
    7636189 [Bryan Cutler] [SPARK-6980] Fixed call to askWithReply in DAGScheduler to use RpcTimeout - this was being compiled by auto-tupling and changing the message type of BlockManagerHeartbeat
    be11c4e [Bryan Cutler] Merge branch 'master' into configTimeout-6980
    039afed [Bryan Cutler] [SPARK-6980] Corrected import organization
    218aa50 [Bryan Cutler] [SPARK-6980] Corrected issues from feedback
    fadaf6f [Bryan Cutler] [SPARK-6980] Put back in deprecated RpcUtils askTimeout and lookupTimout to fix MiMa errors
    fa6ed82 [Bryan Cutler] [SPARK-6980] Had to increase timeout on positive test case because a processor slowdown could trigger an Future TimeoutException
    b05d449 [Bryan Cutler] [SPARK-6980] Changed constructor to use val duration instead of getter function, changed name of string property from conf to timeoutProp for consistency
    c6cfd33 [Bryan Cutler] [SPARK-6980] Changed UT ask message timeout to explicitly intercept a SparkException
    1394de6 [Bryan Cutler] [SPARK-6980] Moved MessagePrefix to createRpcTimeoutException directly
    1517721 [Bryan Cutler] [SPARK-6980] RpcTimeout object scope should be private[spark]
    2206b4d [Bryan Cutler] [SPARK-6980] Added unit test for ask then immediat awaitReply
    1b9beab [Bryan Cutler] [SPARK-6980] Cleaned up import ordering
    08f5afc [Bryan Cutler] [SPARK-6980] Added UT for constructing RpcTimeout with default value
    d3754d1 [Bryan Cutler] [SPARK-6980] Added akkaConf to prevent dead letter logging
    995d196 [Bryan Cutler] [SPARK-6980] Cleaned up import ordering, comments, spacing from PR feedback
    7774d56 [Bryan Cutler] [SPARK-6980] Cleaned up UT imports
    4351c48 [Bryan Cutler] [SPARK-6980] Added UT for addMessageIfTimeout, cleaned up UTs
    1607a5f [Bryan Cutler] [SPARK-6980] Changed addMessageIfTimeout to PartialFunction, cleanup from PR comments
    2f94095 [Bryan Cutler] [SPARK-6980] Added addMessageIfTimeout for when a Future is completed with TimeoutException
    235919b [Bryan Cutler] [SPARK-6980] Resolved conflicts after master merge
    c07d05c [Bryan Cutler] Merge branch 'master' into configTimeout-6980-tmp
    b7fb99f [BryanCutler] Merge pull request #2 from hardmettle/configTimeoutUpdates_6980
    4be3a8d [Harsh Gupta] Modifying loop condition to find property match
    0ee5642 [Harsh Gupta] Changing the loop condition to halt at the first match in the property list for RpcEnv exception catch
    f74064d [Harsh Gupta] Retrieving properties from property list using iterator and while loop instead of chained functions
    a294569 [Bryan Cutler] [SPARK-6980] Added creation of RpcTimeout with Seq of property keys
    23d2f26 [Bryan Cutler] [SPARK-6980] Fixed await result not being handled by RpcTimeout
    49f9f04 [Bryan Cutler] [SPARK-6980] Minor cleanup and scala style fix
    5b59a44 [Bryan Cutler] [SPARK-6980] Added some RpcTimeout unit tests
    78a2c0a [Bryan Cutler] [SPARK-6980] Using RpcTimeout.awaitResult for future in AppClient now
    97523e0 [Bryan Cutler] [SPARK-6980] Akka ask timeout description refactored to RPC layer
    BryanCutler authored and squito committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    aa7bbc1 View commit details
    Browse the repository at this point in the history
  3. [SPARK-8213][SQL]Add function factorial

    Author: zhichao.li <zhichao.li@intel.com>
    
    Closes apache#6822 from zhichao-li/factorial and squashes the following commits:
    
    26edf4f [zhichao.li] add factorial
    zhichao-li authored and rxin committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    1a7a7d7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    dfd8bac View commit details
    Browse the repository at this point in the history
  5. [SPARK-8501] [SQL] Avoids reading schema from empty ORC files

    ORC writes empty schema (`struct<>`) to ORC files containing zero rows.  This is OK for Hive since the table schema is managed by the metastore. But it causes trouble when reading raw ORC files via Spark SQL since we have to discover the schema from the files.
    
    Notice that the ORC data source always avoids writing empty ORC files, but it's still problematic when reading Hive tables which contain empty part-files.
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#7199 from liancheng/spark-8501 and squashes the following commits:
    
    bb8cd95 [Cheng Lian] Addresses comments
    a290221 [Cheng Lian] Avoids reading schema from empty ORC files
    liancheng committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    20a4d7d View commit details
    Browse the repository at this point in the history
  6. [SPARK-8801][SQL] Support TypeCollection in ExpectsInputTypes

    This patch adds a new TypeCollection AbstractDataType that can be used by expressions to specify more than one expected input types.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7202 from rxin/type-collection and squashes the following commits:
    
    c714ca1 [Reynold Xin] Fixed style.
    a0c0d12 [Reynold Xin] Fixed bugs and unit tests.
    d8b8ae7 [Reynold Xin] Added TypeCollection.
    rxin committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    a59d14f View commit details
    Browse the repository at this point in the history
  7. [SPARK-8776] Increase the default MaxPermSize

    I am increasing the perm gen size to 256m.
    
    https://issues.apache.org/jira/browse/SPARK-8776
    
    Author: Yin Huai <yhuai@databricks.com>
    
    Closes apache#7196 from yhuai/SPARK-8776 and squashes the following commits:
    
    60901b4 [Yin Huai] Fix test.
    d44b713 [Yin Huai] Make sparkShell and hiveConsole use 256m PermGen size.
    30aaf8e [Yin Huai] Increase the default PermGen size to 256m.
    yhuai committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    f743c79 View commit details
    Browse the repository at this point in the history
  8. [SPARK-8803] handle special characters in elements in crosstab

    cc rxin
    
    Having back ticks or null as elements causes problems.
    Since elements become column names, we have to drop them from the element as back ticks are special characters.
    Having null throws exceptions, we could replace them with empty strings.
    
    Handling back ticks should be improved for 1.5
    
    Author: Burak Yavuz <brkyvz@gmail.com>
    
    Closes apache#7201 from brkyvz/weird-ct-elements and squashes the following commits:
    
    e06b840 [Burak Yavuz] fix scalastyle
    93a0d3f [Burak Yavuz] added tests for NaN and Infinity
    9dba6ce [Burak Yavuz] address cr1
    db71dbd [Burak Yavuz] handle special characters in elements in crosstab
    brkyvz authored and rxin committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    9b23e92 View commit details
    Browse the repository at this point in the history
  9. [SPARK-8809][SQL] Remove ConvertNaNs analyzer rule.

    "NaN" from string to double is already handled by Cast expression itself.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7206 from rxin/convertnans and squashes the following commits:
    
    3d99c33 [Reynold Xin] [SPARK-8809][SQL] Remove ConvertNaNs analyzer rule.
    rxin committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    2848f4d View commit details
    Browse the repository at this point in the history
  10. [SPARK-8226] [SQL] Add function shiftrightunsigned

    Author: zhichao.li <zhichao.li@intel.com>
    
    Closes apache#7035 from zhichao-li/shiftRightUnsigned and squashes the following commits:
    
    6bcca5a [zhichao.li] change coding style
    3e9f5ae [zhichao.li] python style
    d85ae0b [zhichao.li] add shiftrightunsigned
    zhichao-li authored and davies committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    ab535b9 View commit details
    Browse the repository at this point in the history
  11. [SPARK-7401] [MLLIB] [PYSPARK] Vectorize dot product and sq_dist betw…

    …een SparseVector and DenseVector
    
    Currently we iterate over indices which can be vectorized.
    
    Author: MechCoder <manojkumarsivaraj334@gmail.com>
    
    Closes apache#5946 from MechCoder/spark-7203 and squashes the following commits:
    
    034d086 [MechCoder] Vectorize dot calculation for numpy arrays for ndim=2
    bce2b07 [MechCoder] fix doctest
    fcad0a3 [MechCoder] Remove type checks for list, pyarray etc
    0ee5dd4 [MechCoder] Add tests and other isinstance changes
    e5f1de0 [MechCoder] [SPARK-7401] Vectorize dot product and sq_dist
    MechCoder authored and davies committed Jul 3, 2015
    Configuration menu
    Copy the full SHA
    f0fac2a View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2015

  1. [SPARK-8810] [SQL] Added several UDF unit tests for Spark SQL

    One test for each of the GROUP BY, WHERE and HAVING clauses, and one that combines all three with an additional UDF in the SELECT.
    
    (Since this is my first attempt at contributing to SPARK, meta-level guidance on anything I've screwed up would be greatly appreciated, whether important or minor.)
    
    Author: Spiro Michaylov <spiro@michaylov.com>
    
    Closes apache#7207 from spirom/udf-test-branch and squashes the following commits:
    
    6bbba9e [Spiro Michaylov] Responded to review comments on UDF unit tests
    1a3c5ff [Spiro Michaylov] Added several UDF unit tests for Spark SQL
    spirom authored and rxin committed Jul 4, 2015
    Configuration menu
    Copy the full SHA
    e92c24d View commit details
    Browse the repository at this point in the history
  2. [SPARK-8572] [SQL] Type coercion for ScalaUDFs

    Implemented type coercion for udf arguments in Scala. The changes include-
    * Add `with ExpectsInputTypes ` to `ScalaUDF` class.
    * Pass down argument types info from `UDFRegistration` and `functions`.
    
    With this patch, the example query in [SPARK-8572](https://issues.apache.org/jira/browse/SPARK-8572) no longer throws a type cast error at runtime.
    
    Also added a unit test to `UDFSuite` in which a decimal type is passed to a udf that expects an int.
    
    Author: Cheolsoo Park <cheolsoop@netflix.com>
    
    Closes apache#7203 from piaozhexiu/SPARK-8572 and squashes the following commits:
    
    2d0ed15 [Cheolsoo Park] Incorporate comments
    dce1efd [Cheolsoo Park] Fix unit tests and update the codegen script
    066deed [Cheolsoo Park] Type coercion for udf inputs
    Cheolsoo Park authored and rxin committed Jul 4, 2015
    Configuration menu
    Copy the full SHA
    4a22bce View commit details
    Browse the repository at this point in the history
  3. [SPARK-8192] [SPARK-8193] [SQL] udf current_date, current_timestamp

    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes apache#6985 from adrian-wang/udfcurrent and squashes the following commits:
    
    6a20b64 [Daoyuan Wang] remove codegen and add lazy in testsuite
    27c9f95 [Daoyuan Wang] refine tests..
    e11ae75 [Daoyuan Wang] refine tests
    61ed3d5 [Daoyuan Wang] add in functions
    98e8550 [Daoyuan Wang] fix sytle
    427d9dc [Daoyuan Wang] add tests and codegen
    0b69a1f [Daoyuan Wang] udf current
    adrian-wang authored and rxin committed Jul 4, 2015
    Configuration menu
    Copy the full SHA
    9fb6b83 View commit details
    Browse the repository at this point in the history
  4. [SPARK-8777] [SQL] Add random data generator test utilities to Spark SQL

    This commit adds a set of random data generation utilities to Spark SQL, for use in its own unit tests.
    
    - `RandomDataGenerator.forType(DataType)` returns an `Option[() => Any]` that, if defined, contains a function for generating random values for the given DataType.  The random values use the external representations for the given DataType (for example, for DateType we return `java.sql.Date` instances instead of longs).
    - `DateTypeTestUtilities` defines some convenience fields for looping over instances of data types.  For example, `numericTypes` holds `DataType` instances for all supported numeric types.  These constants will help us to raise the level of abstraction in our tests.  For example, it's now very easy to write a test which is parameterized by all common data types.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#7176 from JoshRosen/sql-random-data-generators and squashes the following commits:
    
    f71634d [Josh Rosen] Roll back ScalaCheck usage
    e0d7d49 [Josh Rosen] Bump ScalaCheck version in LICENSE
    89d86b1 [Josh Rosen] Bump ScalaCheck version.
    0c20905 [Josh Rosen] Initial attempt at using ScalaCheck.
    b55875a [Josh Rosen] Generate doubles and floats over entire possible range.
    5acdd5c [Josh Rosen] Infinity and NaN are interesting.
    ab76cbd [Josh Rosen] Move code to Catalyst package.
    d2b4a4a [Josh Rosen] Add random data generator test utilities to Spark SQL.
    JoshRosen authored and rxin committed Jul 4, 2015
    Configuration menu
    Copy the full SHA
    f32487b View commit details
    Browse the repository at this point in the history
  5. [SPARK-8238][SPARK-8239][SPARK-8242][SPARK-8243][SPARK-8268][SQL]Add …

    …ascii/base64/unbase64/encode/decode functions
    
    Add `ascii`,`base64`,`unbase64`,`encode` and `decode` expressions.
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes apache#6843 from chenghao-intel/str_funcs2 and squashes the following commits:
    
    78dee7d [Cheng Hao] base 64 -> base64
    9d6f9f4 [Cheng Hao] remove the toString method for expressions
    ed5c19c [Cheng Hao] update code as comments
    96170fc [Cheng Hao] scalastyle issues
    e2df768 [Cheng Hao] remove the unused import
    491ce7b [Cheng Hao] add ascii/base64/unbase64/encode/decode functions
    chenghao-intel authored and rxin committed Jul 4, 2015
    Configuration menu
    Copy the full SHA
    f35b0c3 View commit details
    Browse the repository at this point in the history
  6. [SPARK-8270][SQL] levenshtein distance

    Jira: https://issues.apache.org/jira/browse/SPARK-8270
    
    Info: I can not build the latest master, it stucks during the build process: `[INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml`
    
    Author: Tarek Auel <tarek.auel@googlemail.com>
    
    Closes apache#7214 from tarekauel/SPARK-8270 and squashes the following commits:
    
    ab348b9 [Tarek Auel] Merge branch 'master' into SPARK-8270
    a2ad318 [Tarek Auel] [SPARK-8270] changed order of fields
    d91b12c [Tarek Auel] [SPARK-8270] python fix
    adbd075 [Tarek Auel] [SPARK-8270] fixed typo
    23185c9 [Tarek Auel] [SPARK-8270] levenshtein distance
    tarekbecker authored and rxin committed Jul 4, 2015
    Configuration menu
    Copy the full SHA
    6b3574e View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    48f7aed View commit details
    Browse the repository at this point in the history
  8. [SQL] More unit tests for implicit type cast & add simpleString to Ab…

    …stractDataType.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7221 from rxin/implicit-cast-tests and squashes the following commits:
    
    64b13bd [Reynold Xin] Fixed a bug ..
    489b732 [Reynold Xin] [SQL] More unit tests for implicit type cast & add simpleString to AbstractDataType.
    rxin committed Jul 4, 2015
    Configuration menu
    Copy the full SHA
    347cab8 View commit details
    Browse the repository at this point in the history
  9. [SPARK-8822][SQL] clean up type checking in math.scala.

    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7220 from rxin/SPARK-8822 and squashes the following commits:
    
    0cda076 [Reynold Xin] Test cases.
    22d0463 [Reynold Xin] Fixed type precedence.
    beb2a97 [Reynold Xin] [SPARK-8822][SQL] clean up type checking in math.scala.
    rxin committed Jul 4, 2015
    Configuration menu
    Copy the full SHA
    c991ef5 View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2015

  1. [MINOR] [SQL] Minor fix for CatalystSchemaConverter

    ping liancheng
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes apache#7224 from viirya/few_fix_catalystschema and squashes the following commits:
    
    d994330 [Liang-Chi Hsieh] Minor fix for CatalystSchemaConverter.
    viirya authored and liancheng committed Jul 5, 2015
    Configuration menu
    Copy the full SHA
    2b820f2 View commit details
    Browse the repository at this point in the history
  2. [SPARK-7137] [ML] Update SchemaUtils checkInputColumn to print more i…

    …nfo if needed
    
    Author: Joshi <rekhajoshm@gmail.com>
    Author: Rekha Joshi <rekhajoshm@gmail.com>
    
    Closes apache#5992 from rekhajoshm/fix/SPARK-7137 and squashes the following commits:
    
    8c42b57 [Joshi] update checkInputColumn to print more info if needed
    33ddd2e [Joshi] update checkInputColumn to print more info if needed
    acf3e17 [Joshi] update checkInputColumn to print more info if needed
    8993c0e [Joshi] SPARK-7137: Add checkInputColumn back to Params and print more info
    e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
    rekhajoshm authored and jkbradley committed Jul 5, 2015
    Configuration menu
    Copy the full SHA
    f9c448d View commit details
    Browse the repository at this point in the history

Commits on Jul 6, 2015

  1. [SPARK-8549] [SPARKR] Fix the line length of SparkR

    [[SPARK-8549] Fix the line length of SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8549)
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes apache#7204 from yu-iskw/SPARK-8549 and squashes the following commits:
    
    6fb131a [Yu ISHIKAWA] Fix the typo
    1737598 [Yu ISHIKAWA] [SPARK-8549][SparkR] Fix the line length of SparkR
    yu-iskw authored and shivaram committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    a0cb111 View commit details
    Browse the repository at this point in the history
  2. [SQL][Minor] Update the DataFrame API for encode/decode

    This is a the follow up of apache#6843.
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes apache#7230 from chenghao-intel/str_funcs2_followup and squashes the following commits:
    
    52cc553 [Cheng Hao] update the code as comment
    chenghao-intel authored and rxin committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    6d0411b View commit details
    Browse the repository at this point in the history
  3. [SPARK-8831][SQL] Support AbstractDataType in TypeCollection.

    Otherwise it is impossible to declare an expression supporting DecimalType.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes apache#7232 from rxin/typecollection-adt and squashes the following commits:
    
    934d3d1 [Reynold Xin] [SPARK-8831][SQL] Support AbstractDataType in TypeCollection.
    rxin committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    86768b7 View commit details
    Browse the repository at this point in the history
  4. [SPARK-8841] [SQL] Fix partition pruning percentage log message

    When pruning partitions for a query plan, a message is logged indicating what how many partitions were selected based on predicate criteria, and what percent were pruned.
    
    The current release erroneously uses `1 - total/selected` to compute this quantity, leading to nonsense messages like "pruned -1000% partitions". The fix is simple and obvious.
    
    Author: Steve Lindemann <steve.lindemann@engineersgatelp.com>
    
    Closes apache#7227 from srlindemann/master and squashes the following commits:
    
    c788061 [Steve Lindemann] fix percentPruned log message
    eglp-slindemann authored and liancheng committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    39e4e7e View commit details
    Browse the repository at this point in the history
  5. [SPARK-8124] [SPARKR] Created more examples on SparkR DataFrames

    Here are more examples on SparkR DataFrames including creating a Spark Contect and a SQL
    context, loading data and simple data manipulation.
    
    Author: Daniel Emaasit (PhD Student) <daniel.emaasit@gmail.com>
    
    Closes apache#6668 from Emaasit/dan-dev and squashes the following commits:
    
    3a97867 [Daniel Emaasit (PhD Student)] Used fewer rows for createDataFrame
    f7227f9 [Daniel Emaasit (PhD Student)] Using command line arguments
    a550f70 [Daniel Emaasit (PhD Student)] Used base R functions
    33f9882 [Daniel Emaasit (PhD Student)] Renamed file
    b6603e3 [Daniel Emaasit (PhD Student)] changed "Describe" function to "describe"
    90565dd [Daniel Emaasit (PhD Student)] Deleted the getting-started file
    b95a103 [Daniel Emaasit (PhD Student)] Deleted this file
    cc55cd8 [Daniel Emaasit (PhD Student)] combined all the code into one .R file
    c6933af [Daniel Emaasit (PhD Student)] changed variable name to SQLContext
    8e0fe14 [Daniel Emaasit (PhD Student)] provided two options for creating DataFrames
    2653573 [Daniel Emaasit (PhD Student)] Updates to a comment and variable name
    275b787 [Daniel Emaasit (PhD Student)] Added the Apache License at the top of the file
    2e8f724 [Daniel Emaasit (PhD Student)] Added the Apache License at the top of the file
    486f44e [Daniel Emaasit (PhD Student)] Added the Apache License at the file
    d705112 [Daniel Emaasit (PhD Student)] Created more examples on SparkR DataFrames
    Emaasit authored and shivaram committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    293225e View commit details
    Browse the repository at this point in the history
  6. [SPARK-8837][SPARK-7114][SQL] support using keyword in column name

    Author: Wenchen Fan <cloud0fan@outlook.com>
    
    Closes apache#7237 from cloud-fan/parser and squashes the following commits:
    
    e7b49bb [Wenchen Fan] support using keyword in column name
    cloud-fan authored and rxin committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    0e19464 View commit details
    Browse the repository at this point in the history
  7. Small update in the readme file

    Just change the attribute from -PsparkR to -Psparkr
    
    Author: Dirceu Semighini Filho <dirceu.semighini@gmail.com>
    
    Closes apache#7242 from dirceusemighini/patch-1 and squashes the following commits:
    
    fad5991 [Dirceu Semighini Filho] Small update in the readme file
    Dirceu Semighini Filho authored and rxin committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    57c72fc View commit details
    Browse the repository at this point in the history
  8. [SPARK-8784] [SQL] Add Python API for hex and unhex

    Add Python API for hex/unhex,  also cleanup Hex/Unhex
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes apache#7223 from davies/hex and squashes the following commits:
    
    6f1249d [Davies Liu] no explicit rule to cast string into binary
    711a6ed [Davies Liu] fix test
    f9fe5a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
    f032fbb [Davies Liu] Merge branch 'hex' of github.com:davies/spark into hex
    49e325f [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
    b31fc9a [Davies Liu] Update math.scala
    25156b7 [Davies Liu] address comments and fix test
    c3af78c [Davies Liu] address commments
    1a24082 [Davies Liu] Add Python API for hex and unhex
    Davies Liu authored and rxin committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    37e4d92 View commit details
    Browse the repository at this point in the history
  9. [SPARK-4485] [SQL] 1) Add broadcast hash outer join, (2) Fix SparkPla…

    …nTest
    
    This pull request
    (1) extracts common functions used by hash outer joins and put it in interface HashOuterJoin
    (2) adds ShuffledHashOuterJoin and BroadcastHashOuterJoin
    (3) adds test cases for shuffled and broadcast hash outer join
    (3) makes SparkPlanTest to support binary or more complex operators, and fixes bugs in plan composition in SparkPlanTest
    
    Author: kai <kaizeng@eecs.berkeley.edu>
    
    Closes apache#7162 from kai-zeng/outer and squashes the following commits:
    
    3742359 [kai] Fix not-serializable exception for code-generated keys in broadcasted relations
    14e4bf8 [kai] Use CanBroadcast in broadcast outer join planning
    dc5127e [kai] code style fixes
    b5a4efa [kai] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest
    kai authored and marmbrus committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    2471c0b View commit details
    Browse the repository at this point in the history
  10. [MINOR] [SQL] remove unused code in Exchange

    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes apache#7234 from adrian-wang/exchangeclean and squashes the following commits:
    
    b093ec9 [Daoyuan Wang] remove unused code
    adrian-wang authored and JoshRosen committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    132e7fc View commit details
    Browse the repository at this point in the history
  11. [SPARK-8656] [WEBUI] Fix the webUI and JSON API number is not synced

    Spark standalone master web UI show "Alive Workers" total core, total used cores and "Alive workers" total memory, memory used.
    But the JSON API page "http://MASTERURL:8088/json" shows "ALL workers"  core, memory number.
    This webUI data is not sync with the JSON API.
    The proper way is to sync the number with webUI and JSON API.
    
    Author: Wisely Chen <wiselychen@appier.com>
    
    Closes apache#7038 from thegiive/SPARK-8656 and squashes the following commits:
    
    9e54bf0 [Wisely Chen] Change variable name to camel case
    2c8ea89 [Wisely Chen] Change some styling and add local variable
    431d2b0 [Wisely Chen] Worker List should contain DEAD node also
    8b3b8e8 [Wisely Chen] [SPARK-8656] Fix the webUI and JSON API number is not synced
    Wisely Chen authored and Andrew Or committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    9ff2033 View commit details
    Browse the repository at this point in the history
  12. [SPARK-6707] [CORE] [MESOS] Mesos Scheduler should allow the user to …

    …specify constraints based on slave attributes
    
    Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources when trying to determine the usablility of a resource offer from a mesos slave node. It may be preferable for the user to be able to ensure that the spark jobs are only started on a certain set of nodes (based on attributes).
    
    For example, If the user sets a property, let's say `spark.mesos.constraints` is set to `tachyon=true;us-east-1=false`, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors.
    
    Author: Ankur Chauhan <achauhan@brightcove.com>
    
    Closes apache#5563 from ankurcha/mesos_attribs and squashes the following commits:
    
    902535b [Ankur Chauhan] Fix line length
    d83801c [Ankur Chauhan] Update code as per code review comments
    8b73f2d [Ankur Chauhan] Fix imports
    c3523e7 [Ankur Chauhan] Added docs
    1a24d0b [Ankur Chauhan] Expand scope of attributes matching to include all data types
    482fd71 [Ankur Chauhan] Update access modifier to private[this] for offer constraints
    5ccc32d [Ankur Chauhan] Fix nit pick whitespace
    1bce782 [Ankur Chauhan] Fix nit pick whitespace
    c0cbc75 [Ankur Chauhan] Use offer id value for debug message
    7fee0ea [Ankur Chauhan] Add debug statements
    fc7eb5b [Ankur Chauhan] Fix import codestyle
    00be252 [Ankur Chauhan] Style changes as per code review comments
    662535f [Ankur Chauhan] Incorporate code review comments + use SparkFunSuite
    fdc0937 [Ankur Chauhan] Decline offers that did not meet criteria
    67b58a0 [Ankur Chauhan] Add documentation for spark.mesos.constraints
    63f53f4 [Ankur Chauhan] Update codestyle - uniform style for config values
    02031e4 [Ankur Chauhan] Fix scalastyle warnings in tests
    c09ed84 [Ankur Chauhan] Fixed the access modifier on offerConstraints val to private[mesos]
    0c64df6 [Ankur Chauhan] Rename overhead fractions to memory_*, fix spacing
    8cc1e8f [Ankur Chauhan] Make exception message more explicit about the source of the error
    addedba [Ankur Chauhan] Added test case for malformed constraint string
    ec9d9a6 [Ankur Chauhan] Add tests for parse constraint string
    72fe88a [Ankur Chauhan] Fix up tests + remove redundant method override, combine utility class into new mesos scheduler util trait
    92b47fd [Ankur Chauhan] Add attributes based constraints support to MesosScheduler
    Ankur Chauhan authored and Andrew Or committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    1165b17 View commit details
    Browse the repository at this point in the history
  13. Revert "[SPARK-7212] [MLLIB] Add sequence learning flag"

    This reverts commit 25f574e. After speaking to some users and developers, we realized that FP-growth doesn't meet the requirement for frequent sequence mining. PrefixSpan (SPARK-6487) would be the correct algorithm for it. feynmanliang
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes apache#7240 from mengxr/SPARK-7212.revert and squashes the following commits:
    
    2b3d66b [Xiangrui Meng] Revert "[SPARK-7212] [MLLIB] Add sequence learning flag"
    mengxr committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    96c5eee View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    ee232db View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    93e3d4e View commit details
    Browse the repository at this point in the history
  16. Apply review comments

    maropu committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    6984bf4 View commit details
    Browse the repository at this point in the history
  17. Fix code-style errors

    maropu committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    7f812fd View commit details
    Browse the repository at this point in the history
  18. Remove a new type

    maropu committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    af61f2e View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    fdb2ae4 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    7114a47 View commit details
    Browse the repository at this point in the history
  21. Apply comments

    maropu committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    2844a8e View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    92ed7a6 View commit details
    Browse the repository at this point in the history
  23. Fix conflicts

    maropu committed Jul 6, 2015
    Configuration menu
    Copy the full SHA
    feb1129 View commit details
    Browse the repository at this point in the history