-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6747] [SQL] Support List<> as a return type in Hive UDF #6179
Commits on Jun 22, 2015
-
[SPARK-8482] Added M4 instances to the list.
AWS recently added M4 instances (https://aws.amazon.com/blogs/aws/the-new-m4-instance-type-bonus-price-reduction-on-m3-c4/). Author: Pradeep Chhetri <pradeep.chhetri89@gmail.com> Closes apache#6899 from pradeepchhetri/master and squashes the following commits: 4f4ea79 [Pradeep Chhetri] Added t2.large instance 3d2bb6c [Pradeep Chhetri] Added M4 instances to the list
Configuration menu - View commit details
-
Copy full SHA for ba8a453 - Browse repository at this point
Copy the full SHA ba8a453View commit details -
[SPARK-8511] [PYSPARK] Modify a test to remove a saved model in `regr…
…ession.py` [[SPARK-8511] Modify a test to remove a saved model in `regression.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8511) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes apache#6926 from yu-iskw/SPARK-8511 and squashes the following commits: 7cd0948 [Yu ISHIKAWA] Use `shutil.rmtree()` to temporary directories for saving model testings, instead of `os.removedirs()` 4a01c9e [Yu ISHIKAWA] [SPARK-8511][pyspark] Modify a test to remove a saved model in `regression.py`
Configuration menu - View commit details
-
Copy full SHA for 5d89d9f - Browse repository at this point
Copy the full SHA 5d89d9fView commit details -
[SPARK-8104] [SQL] auto alias expressions in analyzer
Currently we auto alias expression in parser. However, during parser phase we don't have enough information to do the right alias. For example, Generator that has more than 1 kind of element need MultiAlias, ExtractValue don't need Alias if it's in middle of a ExtractValue chain. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6647 from cloud-fan/alias and squashes the following commits: 552eba4 [Wenchen Fan] fix python 5b5786d [Wenchen Fan] fix agg 73a90cb [Wenchen Fan] fix case-preserve of ExtractValue 4cfd23c [Wenchen Fan] fix order by d18f401 [Wenchen Fan] refine 9f07359 [Wenchen Fan] address comments 39c1aef [Wenchen Fan] small fix 33640ec [Wenchen Fan] auto alias expressions in analyzer
Configuration menu - View commit details
-
Copy full SHA for da7bbb9 - Browse repository at this point
Copy the full SHA da7bbb9View commit details -
[SPARK-8532] [SQL] In Python's DataFrameWriter, save/saveAsTable/json…
…/parquet/jdbc always override mode https://issues.apache.org/jira/browse/SPARK-8532 This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`. Author: Yin Huai <yhuai@databricks.com> Closes apache#6937 from yhuai/SPARK-8532 and squashes the following commits: f972d5d [Yin Huai] davies's comment. d37abd2 [Yin Huai] style. d21290a [Yin Huai] Python doc. 889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet. 7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value. d696dff [Yin Huai] Python style. 88eb6c4 [Yin Huai] If mode is "error", do not call mode method. c40c461 [Yin Huai] Regression test.
Configuration menu - View commit details
-
Copy full SHA for 5ab9fcf - Browse repository at this point
Copy the full SHA 5ab9fcfView commit details -
[SPARK-8455] [ML] Implement n-gram feature transformer
Implementation of n-gram feature transformer for ML. Author: Feynman Liang <fliang@databricks.com> Closes apache#6887 from feynmanliang/ngram-featurizer and squashes the following commits: d2c839f [Feynman Liang] Make n > input length yield empty output 9fadd36 [Feynman Liang] Add empty and corner test cases, fix names and spaces fe93873 [Feynman Liang] Implement n-gram feature transformer
Configuration menu - View commit details
-
Copy full SHA for afe35f0 - Browse repository at this point
Copy the full SHA afe35f0View commit details -
[SPARK-8537] [SPARKR] Add a validation rule about the curly braces in…
… SparkR to `.lintr` [[SPARK-8537] Add a validation rule about the curly braces in SparkR to `.lintr` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8537) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes apache#6940 from yu-iskw/SPARK-8537 and squashes the following commits: 7eec1a0 [Yu ISHIKAWA] [SPARK-8537][SparkR] Add a validation rule about the curly braces in SparkR to `.lintr`
Configuration menu - View commit details
-
Copy full SHA for b1f3a48 - Browse repository at this point
Copy the full SHA b1f3a48View commit details -
[SPARK-8356] [SQL] Reconcile callUDF and callUdf
Deprecates ```callUdf``` in favor of ```callUDF```. Author: BenFradet <benjamin.fradet@gmail.com> Closes apache#6902 from BenFradet/SPARK-8356 and squashes the following commits: ef4e9d8 [BenFradet] deprecated callUDF, use udf instead 9b1de4d [BenFradet] reinstated unit test for the deprecated callUdf cbd80a5 [BenFradet] deprecated callUdf in favor of callUDF
Configuration menu - View commit details
-
Copy full SHA for 50d3242 - Browse repository at this point
Copy the full SHA 50d3242View commit details -
[SPARK-8492] [SQL] support binaryType in UnsafeRow
Support BinaryType in UnsafeRow, just like StringType. Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM). Author: Davies Liu <davies@databricks.com> Closes apache#6911 from davies/unsafe_bin and squashes the following commits: d68706f [Davies Liu] update comment 519f698 [Davies Liu] address comment 98a964b [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_bin 180b49d [Davies Liu] fix zero-out 22e4c0a [Davies Liu] zero-out padding bytes 6abfe93 [Davies Liu] fix style 447dea0 [Davies Liu] support binaryType in UnsafeRow
Davies Liu committedJun 22, 2015 Configuration menu - View commit details
-
Copy full SHA for 96aa013 - Browse repository at this point
Copy the full SHA 96aa013View commit details -
[HOTFIX] [TESTS] Typo mqqt -> mqtt
This was introduced in apache#6866.
Andrew Or committedJun 22, 2015 Configuration menu - View commit details
-
Copy full SHA for 1dfb0f7 - Browse repository at this point
Copy the full SHA 1dfb0f7View commit details
Commits on Jun 23, 2015
-
[SPARK-7153] [SQL] support all integral type ordinal in GetArrayItem
first convert `ordinal` to `Number`, then convert to int type. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#5706 from cloud-fan/7153 and squashes the following commits: 915db79 [Wenchen Fan] fix 7153
Configuration menu - View commit details
-
Copy full SHA for 860a49e - Browse repository at this point
Copy the full SHA 860a49eView commit details -
[SPARK-8307] [SQL] improve timestamp from parquet
This PR change to convert julian day to unix timestamp directly (without Calendar and Timestamp). cc adrian-wang rxin Author: Davies Liu <davies@databricks.com> Closes apache#6759 from davies/improve_ts and squashes the following commits: 849e301 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts b0e4cad [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 8e2d56f [Davies Liu] address comments 634b9f5 [Davies Liu] fix mima 4891efb [Davies Liu] address comment bfc437c [Davies Liu] fix build ae5979c [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 602b969 [Davies Liu] remove jodd 2f2e48c [Davies Liu] fix test 8ace611 [Davies Liu] fix mima 212143b [Davies Liu] fix mina c834108 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts a3171b8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 5233974 [Davies Liu] fix scala style 361fd62 [Davies Liu] address comments ea196d4 [Davies Liu] improve timestamp from parquet
Davies Liu committedJun 23, 2015 Configuration menu - View commit details
-
Copy full SHA for 6b7f2ce - Browse repository at this point
Copy the full SHA 6b7f2ceView commit details -
[SPARK-7859] [SQL] Collect_set() behavior differences which fails the…
… unit test under jdk8 To reproduce that: ``` JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 | build/sbt -Phadoop-2.3 -Phive 'test-only org.apache.spark.sql.hive.execution.HiveWindowFunctionQueryWithoutCodeGenSuite' ``` A simple workaround to fix that is update the original query, for getting the output size instead of the exact elements of the array (output by collect_set()) Author: Cheng Hao <hao.cheng@intel.com> Closes apache#6402 from chenghao-intel/windowing and squashes the following commits: 99312ad [Cheng Hao] add order by for the select clause edf8ce3 [Cheng Hao] update the code as suggested 7062da7 [Cheng Hao] fix the collect_set() behaviour differences under different versions of JDK
Configuration menu - View commit details
-
Copy full SHA for 13321e6 - Browse repository at this point
Copy the full SHA 13321e6View commit details -
MAINTENANCE: Automated closing of pull requests.
This commit exists to close the following pull requests on Github: Closes apache#2849 (close requested by 'srowen') Closes apache#2786 (close requested by 'andrewor14') Closes apache#4678 (close requested by 'JoshRosen') Closes apache#5457 (close requested by 'andrewor14') Closes apache#3346 (close requested by 'andrewor14') Closes apache#6518 (close requested by 'andrewor14') Closes apache#5403 (close requested by 'pwendell') Closes apache#2110 (close requested by 'srowen')
Configuration menu - View commit details
-
Copy full SHA for c4d2343 - Browse repository at this point
Copy the full SHA c4d2343View commit details -
[SPARK-8548] [SPARKR] Remove the trailing whitespaces from the SparkR…
… files [[SPARK-8548] Remove the trailing whitespaces from the SparkR files - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8548) - This is the result of `lint-r` https://gist.github.com/yu-iskw/0019b37a2c1167f33986 Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes apache#6945 from yu-iskw/SPARK-8548 and squashes the following commits: 0bd567a [Yu ISHIKAWA] [SPARK-8548][SparkR] Remove the trailing whitespaces from the SparkR files
Configuration menu - View commit details
-
Copy full SHA for 44fa7df - Browse repository at this point
Copy the full SHA 44fa7dfView commit details -
[SPARK-7781] [MLLIB] gradient boosted trees.train regressor missing m…
…ax bins Author: Holden Karau <holden@pigscanfly.ca> Closes apache#6331 from holdenk/SPARK-7781-GradientBoostedTrees.trainRegressor-missing-max-bins and squashes the following commits: 2894695 [Holden Karau] remove extra blank line 2573e8d [Holden Karau] Update the scala side of the pythonmllibapi and make the test a bit nicer too 3a09170 [Holden Karau] add maxBins to to the train method as well af7f274 [Holden Karau] Add maxBins to GradientBoostedTrees.trainRegressor and correctly mention the default of 32 in other places where it mentioned 100
Configuration menu - View commit details
-
Copy full SHA for 164fe2a - Browse repository at this point
Copy the full SHA 164fe2aView commit details -
[SPARK-8431] [SPARKR] Add in operator to DataFrame Column in SparkR
[[SPARK-8431] Add in operator to DataFrame Column in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8431) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes apache#6941 from yu-iskw/SPARK-8431 and squashes the following commits: 1f64423 [Yu ISHIKAWA] Modify the comment f4309a7 [Yu ISHIKAWA] Make a `setMethod` for `%in%` be independent 6e37936 [Yu ISHIKAWA] Modify a variable name c196173 [Yu ISHIKAWA] [SPARK-8431][SparkR] Add in operator to DataFrame Column in SparkR
Configuration menu - View commit details
-
Copy full SHA for d4f6335 - Browse repository at this point
Copy the full SHA d4f6335View commit details -
[SPARK-8359] [SQL] Fix incorrect decimal precision after multiplication
JIRA: https://issues.apache.org/jira/browse/SPARK-8359 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#6814 from viirya/fix_decimal2 and squashes the following commits: 071a757 [Liang-Chi Hsieh] Remove maximum precision and use MathContext.UNLIMITED. df217d4 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal2 a43bfc3 [Liang-Chi Hsieh] Add MathContext with maximum supported precision. 72eeb3f [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal2 44c9348 [Liang-Chi Hsieh] Fix incorrect decimal precision after multiplication.
5Configuration menu - View commit details
-
Copy full SHA for 31bd306 - Browse repository at this point
Copy the full SHA 31bd306View commit details -
[SPARK-8483] [STREAMING] Remove commons-lang3 dependency from Flume Si…
…nk. Also bump Flume version to 1.6.0 Author: Hari Shreedharan <hshreedharan@apache.org> Closes apache#6910 from harishreedharan/remove-commons-lang3 and squashes the following commits: 9875f7d [Hari Shreedharan] Revert back to Flume 1.4.0 ca35eb0 [Hari Shreedharan] [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0
Configuration menu - View commit details
-
Copy full SHA for 9b618fb - Browse repository at this point
Copy the full SHA 9b618fbView commit details -
[SPARK-8541] [PYSPARK] test the absolute error in approx doctests
A minor change but one which is (presumably) visible on the public api docs webpage. Author: Scott Taylor <github@megatron.me.uk> Closes apache#6942 from megatron-me-uk/patch-3 and squashes the following commits: fbed000 [Scott Taylor] test the absolute error in approx doctests
Configuration menu - View commit details
-
Copy full SHA for f0dcbe8 - Browse repository at this point
Copy the full SHA f0dcbe8View commit details -
[SPARK-8300] DataFrame hint for broadcast join.
Users can now do ```scala left.join(broadcast(right), "joinKey") ``` to give the query planner a hint that "right" DataFrame is small and should be broadcasted. Author: Reynold Xin <rxin@databricks.com> Closes apache#6751 from rxin/broadcastjoin-hint and squashes the following commits: 953eec2 [Reynold Xin] Code review feedback. 88752d8 [Reynold Xin] Fixed import. 8187b88 [Reynold Xin] [SPARK-8300] DataFrame hint for broadcast join.
Configuration menu - View commit details
-
Copy full SHA for 6ceb169 - Browse repository at this point
Copy the full SHA 6ceb169View commit details -
[SPARK-8498] [TUNGSTEN] fix npe in errorhandling path in unsafeshuffl…
…e writer Author: Holden Karau <holden@pigscanfly.ca> Closes apache#6918 from holdenk/SPARK-8498-fix-npe-in-errorhandling-path-in-unsafeshuffle-writer and squashes the following commits: f807832 [Holden Karau] Log error if we can't throw it 855f9aa [Holden Karau] Spelling - not my strongest suite. Fix Propegates to Propagates. 039d620 [Holden Karau] Add missing closeandwriteoutput 30e558d [Holden Karau] go back to try/finally e503b8c [Holden Karau] Improve the test to ensure we aren't masking the underlying exception ae0b7a7 [Holden Karau] Fix the test 2e6abf7 [Holden Karau] Be more cautious when cleaning up during failed write and re-throw user exceptions
Configuration menu - View commit details
-
Copy full SHA for 0f92be5 - Browse repository at this point
Copy the full SHA 0f92be5View commit details -
[SQL] [DOCS] updated the documentation for explode
the syntax was incorrect in the example in explode Author: lockwobr <lockwobr@gmail.com> Closes apache#6943 from lockwobr/master and squashes the following commits: 3d864d1 [lockwobr] updated the documentation for explode
Configuration menu - View commit details
-
Copy full SHA for 4f7fbef - Browse repository at this point
Copy the full SHA 4f7fbefView commit details -
[SPARK-7235] [SQL] Refactor the grouping sets
The logical plan `Expand` takes the `output` as constructor argument, which break the references chain. We need to refactor the code, as well as the column pruning. Author: Cheng Hao <hao.cheng@intel.com> Closes apache#5780 from chenghao-intel/expand and squashes the following commits: 76e4aa4 [Cheng Hao] revert the change for case insenstive 7c10a83 [Cheng Hao] refactor the grouping sets
Configuration menu - View commit details
-
Copy full SHA for 7b1450b - Browse repository at this point
Copy the full SHA 7b1450bView commit details -
[SPARK-8432] [SQL] fix hashCode() and equals() of BinaryType in Row
Also added more tests in LiteralExpressionSuite Author: Davies Liu <davies@databricks.com> Closes apache#6876 from davies/fix_hashcode and squashes the following commits: 429c2c0 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_hashcode 32d9811 [Davies Liu] fix test a0626ed [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_hashcode 89c2432 [Davies Liu] fix style bd20780 [Davies Liu] check with catalyst types 41caec6 [Davies Liu] change for to while d96929b [Davies Liu] address comment 6ad2a90 [Davies Liu] fix style 5819d33 [Davies Liu] unify equals() and hashCode() 0fff25d [Davies Liu] fix style 53c38b1 [Davies Liu] fix hashCode() and equals() of BinaryType in Row
Configuration menu - View commit details
-
Copy full SHA for 6f4cadf - Browse repository at this point
Copy the full SHA 6f4cadfView commit details -
[SPARK-7888] Be able to disable intercept in linear regression in ml …
…package Author: Holden Karau <holden@pigscanfly.ca> Closes apache#6927 from holdenk/SPARK-7888-Be-able-to-disable-intercept-in-Linear-Regression-in-ML-package and squashes the following commits: 0ad384c [Holden Karau] Add MiMa excludes 4016fac [Holden Karau] Switch to wild card import, remove extra blank lines ae5baa8 [Holden Karau] CR feedback, move the fitIntercept down rather than changing ymean and etc above f34971c [Holden Karau] Fix some more long lines 319bd3f [Holden Karau] Fix long lines 3bb9ee1 [Holden Karau] Update the regression suite tests 7015b9f [Holden Karau] Our code performs the same with R, except we need more than one data point but that seems reasonable 0b0c8c0 [Holden Karau] fix the issue with the sample R code e2140ba [Holden Karau] Add a test, it fails! 5e84a0b [Holden Karau] Write out thoughts and use the correct trait 91ffc0a [Holden Karau] more murh 006246c [Holden Karau] murp?
Configuration menu - View commit details
-
Copy full SHA for 2b1111d - Browse repository at this point
Copy the full SHA 2b1111dView commit details -
[SPARK-8265] [MLLIB] [PYSPARK] Add LinearDataGenerator to pyspark.mll…
…ib.utils It is useful to generate linear data for easy testing of linear models and in general. Scala already has it. This is just a wrapper around the Scala code. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes apache#6715 from MechCoder/generate_linear_input and squashes the following commits: 6182884 [MechCoder] Minor changes 8bda047 [MechCoder] Minor style fixes 0f1053c [MechCoder] [SPARK-8265] Add LinearDataGenerator to pyspark.mllib.utils
Configuration menu - View commit details
-
Copy full SHA for f2022fa - Browse repository at this point
Copy the full SHA f2022faView commit details -
[SPARK-8111] [SPARKR] SparkR shell should display Spark logo and vers…
…ion banner on startup. spark version is taken from the environment variable SPARK_VERSION Author: Alok Singh <singhal@Aloks-MacBook-Pro.local> Author: Alok Singh <singhal@aloks-mbp.usca.ibm.com> Closes apache#6944 from aloknsingh/aloknsingh_spark_jiras and squashes the following commits: ed607bd [Alok Singh] [SPARK-8111][SparkR] As per suggestion, 1) using the version from sparkContext rather than the Sys.env. 2) change "Welcome to SparkR!" to "Welcome to" followed by Spark logo and version acd5b85 [Alok Singh] fix the jira SPARK-8111 to add the spark version and logo. Currently spark version is taken from the environment variable SPARK_VERSION
Configuration menu - View commit details
-
Copy full SHA for f2fb028 - Browse repository at this point
Copy the full SHA f2fb028View commit details -
[SPARK-8525] [MLLIB] fix LabeledPoint parser when there is a whitespa…
…ce between label and features vector fix LabeledPoint parser when there is a whitespace between label and features vector, e.g. (y, [x1, x2, x3]) Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com> Closes apache#6954 from fe2s/SPARK-8525 and squashes the following commits: 0755b9d [Oleksiy Dyagilev] [SPARK-8525][MLLIB] addressing comment, removing dep on commons-lang c1abc2b [Oleksiy Dyagilev] [SPARK-8525][MLLIB] fix LabeledPoint parser when there is a whitespace on specific position
Configuration menu - View commit details
-
Copy full SHA for a803118 - Browse repository at this point
Copy the full SHA a803118View commit details -
[DOC] [SQL] Addes Hive metastore Parquet table conversion section
This PR adds a section about Hive metastore Parquet table conversion. It documents: 1. Schema reconciliation rules introduced in apache#5214 (see [this comment] [1] in apache#5188) 2. Metadata refreshing requirement introduced in apache#5339 [1]: apache#5188 (comment) Author: Cheng Lian <lian@databricks.com> Closes apache#5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits: 42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet 4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet 756e660 [Cheng Lian] Adds Python snippet for metadata refreshing 50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section
Configuration menu - View commit details
-
Copy full SHA for d96d7b5 - Browse repository at this point
Copy the full SHA d96d7b5View commit details -
[SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column i…
…s used in booelan expression It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `|` instead. Author: Davies Liu <davies@databricks.com> Closes apache#6961 from davies/column_bool and squashes the following commits: 9f19beb [Davies Liu] update message af74bd6 [Davies Liu] fix tests 07dff84 [Davies Liu] address comments, fix tests f70c08e [Davies Liu] raise Exception if column is used in booelan expression
Davies Liu committedJun 23, 2015 Configuration menu - View commit details
-
Copy full SHA for 7fb5ae5 - Browse repository at this point
Copy the full SHA 7fb5ae5View commit details
Commits on Jun 24, 2015
-
[SPARK-8139] [SQL] Updates docs and comments of data sources and Parq…
…uet output committer options This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0. Author: Cheng Lian <lian@databricks.com> Closes apache#6683 from liancheng/output-committer-docs and squashes the following commits: b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
Configuration menu - View commit details
-
Copy full SHA for 111d6b9 - Browse repository at this point
Copy the full SHA 111d6b9View commit details -
[SPARK-7157][SQL] add sampleBy to DataFrame
Add `sampleBy` to DataFrame. rxin Author: Xiangrui Meng <meng@databricks.com> Closes apache#6769 from mengxr/SPARK-7157 and squashes the following commits: 991f26f [Xiangrui Meng] fix seed 4a14834 [Xiangrui Meng] move sampleBy to stat 832f7cc [Xiangrui Meng] add sampleBy to DataFrame
Configuration menu - View commit details
-
Copy full SHA for 0401cba - Browse repository at this point
Copy the full SHA 0401cbaView commit details -
Revert "[SPARK-7157][SQL] add sampleBy to DataFrame"
This reverts commit 0401cba. The new test case on Jenkins is failing.
Configuration menu - View commit details
-
Copy full SHA for a458efc - Browse repository at this point
Copy the full SHA a458efcView commit details -
[SPARK-6749] [SQL] Make metastore client robust to underlying socket …
…connection loss This works around a bug in the underlying RetryingMetaStoreClient (HIVE-10384) by refreshing the metastore client on thrift exceptions. We attempt to emulate the proper hive behavior by retrying only as configured by hiveconf. Author: Eric Liang <ekl@databricks.com> Closes apache#6912 from ericl/spark-6749 and squashes the following commits: 2d54b55 [Eric Liang] use conf from state 0e3a74e [Eric Liang] use shim properly 980b3e5 [Eric Liang] Fix conf parsing hive 0.14 conf. 92459b6 [Eric Liang] Work around RetryingMetaStoreClient bug
Configuration menu - View commit details
-
Copy full SHA for 50c3a86 - Browse repository at this point
Copy the full SHA 50c3a86View commit details -
[HOTFIX] [BUILD] Fix MiMa checks in master branch; enable MiMa for la…
…uncher project This commit changes the MiMa tests to test against the released 1.4.0 artifacts rather than 1.4.0-rc4; this change is necessary to fix a Jenkins build break since it seems that the RC4 snapshot is no longer available via Maven. I also enabled MiMa checks for the `launcher` subproject, which we should have done right after 1.4.0 was released. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6974 from JoshRosen/mima-hotfix and squashes the following commits: 4b4175a [Josh Rosen] [HOTFIX] [BUILD] Fix MiMa checks in master branch; enable MiMa for launcher project
Configuration menu - View commit details
-
Copy full SHA for 13ae806 - Browse repository at this point
Copy the full SHA 13ae806View commit details -
[SPARK-8371] [SQL] improve unit test for MaxOf and MinOf and fix bugs
a follow up of apache#6813 Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6825 from cloud-fan/cg and squashes the following commits: 43170cc [Wenchen Fan] fix bugs in code gen
Configuration menu - View commit details
-
Copy full SHA for 09fcf96 - Browse repository at this point
Copy the full SHA 09fcf96View commit details -
[SPARK-8138] [SQL] Improves error message when conflicting partition …
…columns are found This PR improves the error message shown when conflicting partition column names are detected. This can be particularly annoying and confusing when there are a large number of partitions while a handful of them happened to contain unexpected temporary file(s). Now all suspicious directories are listed as below: ``` java.lang.AssertionError: assertion failed: Conflicting partition column names detected: Partition column name list #0: b, c, d Partition column name list #1: b, c Partition column name list #2: b For partitioned table directories, data files should only live in leaf directories. Please check the following directories for unexpected files: file:/tmp/foo/b=0 file:/tmp/foo/b=1 file:/tmp/foo/b=1/c=1 file:/tmp/foo/b=0/c=0 ``` Author: Cheng Lian <lian@databricks.com> Closes apache#6610 from liancheng/part-errmsg and squashes the following commits: 7d05f2c [Cheng Lian] Fixes Scala style issue a149250 [Cheng Lian] Adds test case for the error message 6b74dd8 [Cheng Lian] Also lists suspicious non-leaf partition directories a935eb8 [Cheng Lian] Improves error message when conflicting partition columns are found
Configuration menu - View commit details
-
Copy full SHA for cc465fd - Browse repository at this point
Copy the full SHA cc465fdView commit details -
[SPARK-8567] [SQL] Debugging flaky HiveSparkSubmitSuite
Using similar approach used in `HiveThriftServer2Suite` to print stdout/stderr of the spawned process instead of logging them to see what happens on Jenkins. (This test suite only fails on Jenkins and doesn't spill out any log...) cc yhuai Author: Cheng Lian <lian@databricks.com> Closes apache#6978 from liancheng/debug-hive-spark-submit-suite and squashes the following commits: b031647 [Cheng Lian] Prints process stdout/stderr instead of logging them
Configuration menu - View commit details
-
Copy full SHA for 9d36ec2 - Browse repository at this point
Copy the full SHA 9d36ec2View commit details -
[SPARK-8578] [SQL] Should ignore user defined output committer when a…
…ppending data https://issues.apache.org/jira/browse/SPARK-8578 It is not very safe to use a custom output committer when append data to an existing dir. This changes adds the logic to check if we are appending data, and if so, we use the output committer associated with the file output format. Author: Yin Huai <yhuai@databricks.com> Closes apache#6964 from yhuai/SPARK-8578 and squashes the following commits: 43544c4 [Yin Huai] Do not use a custom output commiter when appendiing data.
Configuration menu - View commit details
-
Copy full SHA for bba6699 - Browse repository at this point
Copy the full SHA bba6699View commit details -
[SPARK-8576] Add spark-ec2 options to set IAM roles and instance-init…
…iated shutdown behavior Both of these options are useful when spark-ec2 is being used as part of an automated pipeline and the engineers want to minimize the need to pass around AWS keys for access to things like S3 (keys are replaced by the IAM role) and to be able to launch a cluster that can terminate itself cleanly. Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes apache#6962 from nchammas/additional-ec2-options and squashes the following commits: fcf252e [Nicholas Chammas] PEP8 fixes efba9ee [Nicholas Chammas] add help for --instance-initiated-shutdown-behavior 598aecf [Nicholas Chammas] option to launch instances into IAM role 2743632 [Nicholas Chammas] add option for instance initiated shutdown
Configuration menu - View commit details
-
Copy full SHA for 31f48e5 - Browse repository at this point
Copy the full SHA 31f48e5View commit details -
[SPARK-8399] [STREAMING] [WEB UI] Overlap between histograms and axis…
…' name in Spark Streaming UI Moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui so the histograms and the axis' name do not overlap. Author: BenFradet <benjamin.fradet@gmail.com> Closes apache#6845 from BenFradet/SPARK-8399 and squashes the following commits: b63695f [BenFradet] adjusted inner histograms eb610ee [BenFradet] readjusted #batches on the x axis dd46f98 [BenFradet] aligned all unit labels and ticks 0564b62 [BenFradet] readjusted #batches placement edd0936 [BenFradet] moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui
Configuration menu - View commit details
-
Copy full SHA for 1173483 - Browse repository at this point
Copy the full SHA 1173483View commit details -
[SPARK-8506] Add pakages to R context created through init.
Author: Holden Karau <holden@pigscanfly.ca> Closes apache#6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits: b60dd63 [Holden Karau] Add an example with the spark-csv package fa8bc92 [Holden Karau] typo: sparm -> spark 865a90c [Holden Karau] strip spaces for comparision c7a4471 [Holden Karau] Add some documentation c1a9233 [Holden Karau] refactor for testing c818556 [Holden Karau] Add pakages to R
Configuration menu - View commit details
-
Copy full SHA for 43e6619 - Browse repository at this point
Copy the full SHA 43e6619View commit details -
[SPARK-7088] [SQL] Fix analysis for 3rd party logical plan.
ResolveReferences analysis rule now does not throw when it cannot resolve references in a self-join. Author: Santiago M. Mola <smola@stratio.com> Closes apache#6853 from smola/SPARK-7088 and squashes the following commits: af71ac7 [Santiago M. Mola] [SPARK-7088] Fix analysis for 3rd party logical plan.
1Configuration menu - View commit details
-
Copy full SHA for b84d4b4 - Browse repository at this point
Copy the full SHA b84d4b4View commit details -
[SPARK-7289] handle project -> limit -> sort efficiently
make the `TakeOrdered` strategy and operator more general, such that it can optionally handle a projection when necessary Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6780 from cloud-fan/limit and squashes the following commits: 34aa07b [Wenchen Fan] revert 07d5456 [Wenchen Fan] clean closure 20821ec [Wenchen Fan] fix 3676a82 [Wenchen Fan] address comments b558549 [Wenchen Fan] address comments 214842b [Wenchen Fan] fix style 2d8be83 [Wenchen Fan] add LimitPushDown 948f740 [Wenchen Fan] fix existing
Configuration menu - View commit details
-
Copy full SHA for f04b567 - Browse repository at this point
Copy the full SHA f04b567View commit details -
[SPARK-7633] [MLLIB] [PYSPARK] Python bindings for StreamingLogisticR…
…egressionwithSGD Add Python bindings to StreamingLogisticRegressionwithSGD. No Java wrappers are needed as models are updated directly using train. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes apache#6849 from MechCoder/spark-3258 and squashes the following commits: b4376a5 [MechCoder] minor d7e5fc1 [MechCoder] Refactor into StreamingLinearAlgorithm Better docs 9c09d4e [MechCoder] [SPARK-7633] Python bindings for StreamingLogisticRegressionwithSGD
Configuration menu - View commit details
-
Copy full SHA for fb32c38 - Browse repository at this point
Copy the full SHA fb32c38View commit details -
[SPARK-6777] [SQL] Implements backwards compatibility rules in Cataly…
…stSchemaConverter This PR introduces `CatalystSchemaConverter` for converting Parquet schema to Spark SQL schema and vice versa. Original conversion code in `ParquetTypesConverter` is removed. Benefits of the new version are: 1. When converting Spark SQL schemas, it generates standard Parquet schemas conforming to [the most updated Parquet format spec] [1]. Converting to old style Parquet schemas is also supported via feature flag `spark.sql.parquet.followParquetFormatSpec` (which is set to `false` for now, and should be set to `true` after both read and write paths are fixed). Note that although this version of Parquet format spec hasn't been officially release yet, Parquet MR 1.7.0 already sticks to it. So it should be safe to follow. 1. It implements backwards-compatibility rules described in the most updated Parquet format spec. Thus can recognize more schema patterns generated by other/legacy systems/tools. 1. Code organization follows convention used in [parquet-mr] [2], which is easier to follow. (Structure of `CatalystSchemaConverter` is similar to `AvroSchemaConverter`). To fully implement backwards-compatibility rules in both read and write path, we also need to update `CatalystRowConverter` (which is responsible for converting Parquet records to `Row`s), `RowReadSupport`, and `RowWriteSupport`. These would be done in follow-up PRs. TODO - [x] More schema conversion test cases for legacy schema patterns. [1]: https://github.com/apache/parquet-format/blob/ea095226597fdbecd60c2419d96b54b2fdb4ae6c/LogicalTypes.md [2]: https://github.com/apache/parquet-mr/ Author: Cheng Lian <lian@databricks.com> Closes apache#6617 from liancheng/spark-6777 and squashes the following commits: 2a2062d [Cheng Lian] Don't convert decimals without precision information b60979b [Cheng Lian] Adds a constructor which accepts a Configuration, and fixes default value of assumeBinaryIsString 743730f [Cheng Lian] Decimal scale shouldn't be larger than precision a104a9e [Cheng Lian] Fixes Scala style issue 1f71d8d [Cheng Lian] Adds feature flag to allow falling back to old style Parquet schema conversion ba84f4b [Cheng Lian] Fixes MapType schema conversion bug 13cb8d5 [Cheng Lian] Fixes MiMa failure 81de5b0 [Cheng Lian] Fixes UDT, workaround read path, and add tests 28ef95b [Cheng Lian] More AnalysisExceptions b10c322 [Cheng Lian] Replaces require() with analysisRequire() which throws AnalysisException cceaf3f [Cheng Lian] Implements backwards compatibility rules in CatalystSchemaConverter
Configuration menu - View commit details
-
Copy full SHA for 8ab5076 - Browse repository at this point
Copy the full SHA 8ab5076View commit details -
[SPARK-8558] [BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS e…
…nv var set Author: fe2s <aka.fe2s@gmail.com> Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com> Closes apache#6956 from fe2s/fix-run-tests and squashes the following commits: 31b6edc [fe2s] str is a built-in function, so using it as a variable name will lead to spurious warnings in some Python linters 7d781a0 [fe2s] fixing for openjdk/IBM, seems like they have slightly different wording, but all have 'version' word. Surrounding with spaces for the case if version word appears in _JAVA_OPTIONS cd455ef [fe2s] address comment, looking for java version string rather than expecting to have on a certain line number ad577d7 [Oleksiy Dyagilev] [SPARK-8558][BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set
Configuration menu - View commit details
-
Copy full SHA for dca21a8 - Browse repository at this point
Copy the full SHA dca21a8View commit details -
[SPARK-8567] [SQL] Increase the timeout of HiveSparkSubmitSuite
https://issues.apache.org/jira/browse/SPARK-8567 Author: Yin Huai <yhuai@databricks.com> Closes apache#6957 from yhuai/SPARK-8567 and squashes the following commits: 62dff5b [Yin Huai] Increase the timeout.
Configuration menu - View commit details
-
Copy full SHA for 7daa702 - Browse repository at this point
Copy the full SHA 7daa702View commit details -
[SPARK-8075] [SQL] apply type check interface to more expressions
a follow up of apache#6405. Note: It's not a big change, a lot of changing is due to I swap some code in `aggregates.scala` to make aggregate functions right below its corresponding aggregate expressions. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6723 from cloud-fan/type-check and squashes the following commits: 2124301 [Wenchen Fan] fix tests 5a658bb [Wenchen Fan] add tests 287d3bb [Wenchen Fan] apply type check interface to more expressions
Configuration menu - View commit details
-
Copy full SHA for b71d325 - Browse repository at this point
Copy the full SHA b71d325View commit details
Commits on Jun 25, 2015
-
Two minor SQL cleanup (compiler warning & indent).
Author: Reynold Xin <rxin@databricks.com> Closes apache#7000 from rxin/minor-cleanup and squashes the following commits: 046044c [Reynold Xin] Two minor SQL cleanup (compiler warning & indent).
Configuration menu - View commit details
-
Copy full SHA for 82f80c1 - Browse repository at this point
Copy the full SHA 82f80c1View commit details -
[SPARK-7884] Move block deserialization from BlockStoreShuffleFetcher…
… to ShuffleReader This commit updates the shuffle read path to enable ShuffleReader implementations more control over the deserialization process. The BlockStoreShuffleFetcher.fetch() method has been renamed to BlockStoreShuffleFetcher.fetchBlockStreams(). Previously, this method returned a record iterator; now, it returns an iterator of (BlockId, InputStream). Deserialization of records is now handled in the ShuffleReader.read() method. This change creates a cleaner separation of concerns and allows implementations of ShuffleReader more flexibility in how records are retrieved. Author: Matt Massie <massie@cs.berkeley.edu> Author: Kay Ousterhout <kayousterhout@gmail.com> Closes apache#6423 from massie/shuffle-api-cleanup and squashes the following commits: 8b0632c [Matt Massie] Minor Scala style fixes d0a1b39 [Matt Massie] Merge pull request #1 from kayousterhout/massie_shuffle-api-cleanup 290f1eb [Kay Ousterhout] Added test for HashShuffleReader.read() 5186da0 [Kay Ousterhout] Revert "Add test to ensure HashShuffleReader is freeing resources" f98a1b9 [Matt Massie] Add test to ensure HashShuffleReader is freeing resources a011bfa [Matt Massie] Use PrivateMethodTester on check that delegate stream is closed 4ea1712 [Matt Massie] Small code cleanup for readability 7429a98 [Matt Massie] Update tests to check that BufferReleasingStream is closing delegate InputStream f458489 [Matt Massie] Remove unnecessary map() on return Iterator 4abb855 [Matt Massie] Consolidate metric code. Make it clear why InterrubtibleIterator is needed. 5c30405 [Matt Massie] Return visibility of BlockStoreShuffleFetcher to private[hash] 7eedd1d [Matt Massie] Small Scala import cleanup 28f8085 [Matt Massie] Small import nit f93841e [Matt Massie] Update shuffle read metrics in ShuffleReader instead of BlockStoreShuffleFetcher. 7e8e0fe [Matt Massie] Minor Scala style fixes 01e8721 [Matt Massie] Explicitly cast iterator in branches for type clarity 7c8f73e [Matt Massie] Close Block InputStream immediately after all records are read 208b7a5 [Matt Massie] Small code style changes b70c945 [Matt Massie] Make BlockStoreShuffleFetcher visible to shuffle package 19135f2 [Matt Massie] [SPARK-7884] Allow Spark shuffle APIs to be more customizable
Configuration menu - View commit details
-
Copy full SHA for 7bac2fe - Browse repository at this point
Copy the full SHA 7bac2feView commit details -
[SPARK-8604] [SQL] HadoopFsRelation subclasses should set their outpu…
…t format class `HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers). Author: Cheng Lian <lian@databricks.com> Closes apache#6998 from liancheng/spark-8604 and squashes the following commits: 9be51d1 [Cheng Lian] Adds more comments 6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class
Configuration menu - View commit details
-
Copy full SHA for c337844 - Browse repository at this point
Copy the full SHA c337844View commit details -
[SPARK-5768] [WEB UI] Fix for incorrect memory in Spark UI
Fix for incorrect memory in Spark UI as per SPARK-5768 Author: Joshi <rekhajoshm@gmail.com> Author: Rekha Joshi <rekhajoshm@gmail.com> Closes apache#6972 from rekhajoshm/SPARK-5768 and squashes the following commits: b678a91 [Joshi] Fix for incorrect memory in Spark UI 2fe53d9 [Joshi] Fix for incorrect memory in Spark UI eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI 0be142d [Rekha Joshi] Merge pull request #3 from apache/master 106fd8e [Rekha Joshi] Merge pull request #2 from apache/master e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
Configuration menu - View commit details
-
Copy full SHA for 085a721 - Browse repository at this point
Copy the full SHA 085a721View commit details -
[SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/ta…
…rget versions. I basically copied the compatibility rules from the top level pom.xml into here. Someone more familiar with all the options in the top level pom may want to make sure nothing else should be copied on down. With this is allows me to build with jdk8 and run with lower versions. Source shows compiled for jdk6 as its supposed to. Author: Tom Graves <tgraves@yahoo-inc.com> Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes apache#6989 from tgravescs/SPARK-8574 and squashes the following commits: e1ea2d4 [Thomas Graves] Change to use combine.children="append" 150d645 [Tom Graves] [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/target versions
Tom Graves committedJun 25, 2015 Configuration menu - View commit details
-
Copy full SHA for e988adb - Browse repository at this point
Copy the full SHA e988adbView commit details -
[SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmit…
…Suite. Author: Yin Huai <yhuai@databricks.com> Closes apache#7009 from yhuai/SPARK-8567 and squashes the following commits: 62fb1f9 [Yin Huai] Add sc.stop(). b22cf7d [Yin Huai] Add logs.
Configuration menu - View commit details
-
Copy full SHA for f9b397f - Browse repository at this point
Copy the full SHA f9b397fView commit details -
[MINOR] [MLLIB] rename some functions of PythonMLLibAPI
Keep the same naming conventions for PythonMLLibAPI. Only the following three functions is different from others ```scala trainNaiveBayes trainGaussianMixture trainWord2Vec ``` So change them to ```scala trainNaiveBayesModel trainGaussianMixtureModel trainWord2VecModel ``` It does not affect any users and public APIs, only to make better understand for developer and code hacker. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#7011 from yanboliang/py-mllib-api-rename and squashes the following commits: 771ffec [Yanbo Liang] rename some functions of PythonMLLibAPI
Configuration menu - View commit details
-
Copy full SHA for 2519dcc - Browse repository at this point
Copy the full SHA 2519dccView commit details -
[SPARK-8637] [SPARKR] [HOTFIX] Fix packages argument, sparkSubmitBinName
cc cafreeman Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes apache#7022 from shivaram/sparkr-init-hotfix and squashes the following commits: 9178d15 [Shivaram Venkataraman] Fix packages argument, sparkSubmitBinName
Configuration menu - View commit details
-
Copy full SHA for c392a9e - Browse repository at this point
Copy the full SHA c392a9eView commit details
Commits on Jun 26, 2015
-
[SPARK-8237] [SQL] Add misc function sha2
JIRA: https://issues.apache.org/jira/browse/SPARK-8237 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#6934 from viirya/expr_sha2 and squashes the following commits: 35e0bb3 [Liang-Chi Hsieh] For comments. 68b5284 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2 8573aff [Liang-Chi Hsieh] Remove unnecessary Product. ee61e06 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2 59e41aa [Liang-Chi Hsieh] Add misc function: sha2.
Configuration menu - View commit details
-
Copy full SHA for 47c874b - Browse repository at this point
Copy the full SHA 47c874bView commit details -
[SPARK-8620] [SQL] cleanup CodeGenContext
fix docs, remove nativeTypes , use java type to get boxed type ,default value, etc. to avoid handle `DateType` and `TimestampType` as int and long again and again. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#7010 from cloud-fan/cg and squashes the following commits: aa01cf9 [Wenchen Fan] cleanup CodeGenContext
Configuration menu - View commit details
-
Copy full SHA for 4036011 - Browse repository at this point
Copy the full SHA 4036011View commit details -
[SPARK-8635] [SQL] improve performance of CatalystTypeConverters
In `CatalystTypeConverters.createToCatalystConverter`, we add special handling for primitive types. We can apply this strategy to more places to improve performance. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#7018 from cloud-fan/converter and squashes the following commits: 8b16630 [Wenchen Fan] another fix 326c82c [Wenchen Fan] optimize type converter
Configuration menu - View commit details
-
Copy full SHA for 1a79f0e - Browse repository at this point
Copy the full SHA 1a79f0eView commit details -
[SPARK-8344] Add message processing time metric to DAGScheduler
This commit adds a new metric, `messageProcessingTime`, to the DAGScheduler metrics source. This metrics tracks the time taken to process messages in the scheduler's event processing loop, which is a helpful debugging aid for diagnosing performance issues in the scheduler (such as SPARK-4961). In order to do this, I moved the creation of the DAGSchedulerSource metrics source into DAGScheduler itself, similar to how MasterSource is created and registered in Master. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7002 from JoshRosen/SPARK-8344 and squashes the following commits: 57f914b [Josh Rosen] Fix import ordering 7d6bb83 [Josh Rosen] Add message processing time metrics to DAGScheduler
Configuration menu - View commit details
-
Copy full SHA for 9fed6ab - Browse repository at this point
Copy the full SHA 9fed6abView commit details -
[SPARK-8613] [ML] [TRIVIAL] add param to disable linear feature scaling
Add a param to disable linear feature scaling (to be implemented later in linear & logistic regression). Done as a seperate PR so we can use same param & not conflict while working on the sub-tasks. Author: Holden Karau <holden@pigscanfly.ca> Closes apache#7024 from holdenk/SPARK-8522-Disable-Linear_featureScaling-Spark-8613-Add-param and squashes the following commits: ce8931a [Holden Karau] Regenerate the sharedParams code fa6427e [Holden Karau] update text for standardization param. 7b24a2b [Holden Karau] generate the new standardization param 3c190af [Holden Karau] Add the standardization param to sharedparamscodegen
Configuration menu - View commit details
-
Copy full SHA for c9e05a3 - Browse repository at this point
Copy the full SHA c9e05a3View commit details -
[SPARK-8302] Support heterogeneous cluster install paths on YARN.
Some users have Hadoop installations on different paths across their cluster. Currently, that makes it hard to set up some configuration in Spark since that requires hardcoding paths to jar files or native libraries, which wouldn't work on such a cluster. This change introduces a couple of YARN-specific configurations that instruct the backend to replace certain paths when launching remote processes. That way, if the configuration says the Spark jar is in "/spark/spark.jar", and also says that "/spark" should be replaced with "{{SPARK_INSTALL_DIR}}", YARN will start containers in the NMs with "{{SPARK_INSTALL_DIR}}/spark.jar" as the location of the jar. Coupled with YARN's environment whitelist (which allows certain env variables to be exposed to containers), this allows users to support such heterogeneous environments, as long as a single replacement is enough. (Otherwise, this feature would need to be extended to support multiple path replacements.) Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#6752 from vanzin/SPARK-8302 and squashes the following commits: 4bff8d4 [Marcelo Vanzin] Add docs, rename configs. 0aa2a02 [Marcelo Vanzin] Only do replacement for paths that need it. 2e9cc9d [Marcelo Vanzin] Style. a5e1f68 [Marcelo Vanzin] [SPARK-8302] Support heterogeneous cluster install paths on YARN.
Configuration menu - View commit details
-
Copy full SHA for 37bf76a - Browse repository at this point
Copy the full SHA 37bf76aView commit details -
[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.tes…
…tmod() This patch addresses a critical issue in the PySpark tests: Several of our Python modules' `__main__` methods call `doctest.testmod()` in order to run doctests but forget to check and handle its return value. As a result, some PySpark test failures can go unnoticed because they will not fail the build. Fortunately, there was only one test failure which was masked by this bug: a `pyspark.profiler` doctest was failing due to changes in RDD pipelining. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7032 from JoshRosen/testmod-fix and squashes the following commits: 60dbdc0 [Josh Rosen] Account for int vs. long formatting change in Python 3 8b8d80a [Josh Rosen] Fix failing test. e6423f9 [Josh Rosen] Check return code for all uses of doctest.testmod().
Configuration menu - View commit details
-
Copy full SHA for 41afa16 - Browse repository at this point
Copy the full SHA 41afa16View commit details -
[SPARK-8662] SparkR Update SparkSQL Test
Test `infer_type` using a more fine-grained approach rather than comparing environments. Since `all.equal`'s behavior has changed in R 3.2, the test became unpassable. JIRA here: https://issues.apache.org/jira/browse/SPARK-8662 Author: cafreeman <cfreeman@alteryx.com> Closes apache#7045 from cafreeman/R32_Test and squashes the following commits: b97cc52 [cafreeman] Add `checkStructField` utility 3381e5c [cafreeman] Update SparkSQL Test (cherry picked from commit 78b31a2) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for a56516f - Browse repository at this point
Copy the full SHA a56516fView commit details
Commits on Jun 27, 2015
-
[SPARK-8607] SparkR -- jars not being added to application classpath …
…correctly Add `getStaticClass` method in SparkR's `RBackendHandler` This is a fix for the problem referenced in [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185). cc shivaram Author: cafreeman <cfreeman@alteryx.com> Closes apache#7001 from cafreeman/branch-1.4 and squashes the following commits: 8f81194 [cafreeman] Add missing license 31aedcf [cafreeman] Refactor test to call an external R script 2c22073 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4 0bea809 [cafreeman] Fixed relative path issue and added smaller JAR ee25e60 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4 9a5c362 [cafreeman] test for including JAR when launching sparkContext 9101223 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4 5a80844 [cafreeman] Fix style nits 7c6bd0c [cafreeman] [SPARK-8607] SparkR (cherry picked from commit 2579948) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for 9d11817 - Browse repository at this point
Copy the full SHA 9d11817View commit details -
[SPARK-8639] [DOCS] Fixed Minor Typos in Documentation
Ticket: [SPARK-8639](https://issues.apache.org/jira/browse/SPARK-8639) fixed minor typos in docs/README.md and docs/api.md Author: Rosstin <asterazul@gmail.com> Closes apache#7046 from Rosstin/SPARK-8639 and squashes the following commits: 6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
Configuration menu - View commit details
-
Copy full SHA for b5a6663 - Browse repository at this point
Copy the full SHA b5a6663View commit details -
[SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN…
…" document As per the description in the JIRA, I moved the contents of the page and added a few additional content. Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes apache#6924 from nssalian/SPARK-3629 and squashes the following commits: 944b7a0 [Neelesh Srinivas Salian] Changed the lines about deploy-mode and added backticks to all parameters 40dbc0b [Neelesh Srinivas Salian] Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line 9cbc072 [Neelesh Srinivas Salian] Updated a few lines in the Launching Spark on YARN Section 8e8db7f [Neelesh Srinivas Salian] Removed the changes in this commit to help clearly distinguish movement from update 151c298 [Neelesh Srinivas Salian] SPARK-3629: Improvement of the Spark on YARN document
Configuration menu - View commit details
-
Copy full SHA for d48e789 - Browse repository at this point
Copy the full SHA d48e789View commit details -
[SPARK-8623] Hadoop RDDs fail to properly serialize configuration
Author: Sandy Ryza <sandy@cloudera.com> Closes apache#7050 from sryza/sandy-spark-8623 and squashes the following commits: 58a8079 [Sandy Ryza] SPARK-8623. Hadoop RDDs fail to properly serialize configuration
Configuration menu - View commit details
-
Copy full SHA for 4153776 - Browse repository at this point
Copy the full SHA 4153776View commit details -
[SPARK-8606] Prevent exceptions in RDD.getPreferredLocations() from c…
…rashing DAGScheduler If `RDD.getPreferredLocations()` throws an exception it may crash the DAGScheduler and SparkContext. This patch addresses this by adding a try-catch block. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7023 from JoshRosen/SPARK-8606 and squashes the following commits: 770b169 [Josh Rosen] Fix getPreferredLocations() DAGScheduler crash with try block. 44a9b55 [Josh Rosen] Add test of a buggy getPartitions() method 19aa9f7 [Josh Rosen] Add (failing) regression test for getPreferredLocations() DAGScheduler crash
Configuration menu - View commit details
-
Copy full SHA for 0b5abbf - Browse repository at this point
Copy the full SHA 0b5abbfView commit details
Commits on Jun 28, 2015
-
[SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integr…
…ate with dev/run-tests module system This patch refactors the `python/run-tests` script: - It's now written in Python instead of Bash. - The descriptions of the tests to run are now stored in `dev/run-tests`'s modules. This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes. For example, we can now skip the PySpark Streaming test cases when only SQL files are changed. - `python/run-tests` now supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482): ``` Usage: run-tests [options] Options: -h, --help show this help message and exit --python-executables=PYTHON_EXECUTABLES A comma-separated list of Python executables to test against (default: python2.6,python3.4,pypy) --modules=MODULES A comma-separated list of Python modules to test (default: pyspark-core,pyspark-ml,pyspark-mllib ,pyspark-sql,pyspark-streaming) ``` - `dev/run-tests` has been split into multiple files: the module definitions and test utility functions are now stored inside of a `dev/sparktestsupport` Python module, allowing them to be re-used from the Python test runner script. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6967 from JoshRosen/run-tests-python-modules and squashes the following commits: f578d6d [Josh Rosen] Fix print for Python 2.x 8233d61 [Josh Rosen] Add python/run-tests.py to Python lint checks 34c98d2 [Josh Rosen] Fix universal_newlines for Python 3 8f65ed0 [Josh Rosen] Fix handling of module in python/run-tests 37aff00 [Josh Rosen] Python 3 fix 27a389f [Josh Rosen] Skip MLLib tests for PyPy c364ccf [Josh Rosen] Use which() to convert PYSPARK_PYTHON to an absolute path before shelling out to run tests 568a3fd [Josh Rosen] Fix hashbang 3b852ae [Josh Rosen] Fall back to PYSPARK_PYTHON when sys.executable is None (fixes a test) f53db55 [Josh Rosen] Remove python2 flag, since the test runner script also works fine under Python 3 9c80469 [Josh Rosen] Fix passing of PYSPARK_PYTHON d33e525 [Josh Rosen] Merge remote-tracking branch 'origin/master' into run-tests-python-modules 4f8902c [Josh Rosen] Python lint fixes. 8f3244c [Josh Rosen] Use universal_newlines to fix dev/run-tests doctest failures on Python 3. f542ac5 [Josh Rosen] Fix lint check for Python 3 fff4d09 [Josh Rosen] Add dev/sparktestsupport to pep8 checks 2efd594 [Josh Rosen] Update dev/run-tests to use new Python test runner flags b2ab027 [Josh Rosen] Add command-line options for running individual suites in python/run-tests caeb040 [Josh Rosen] Fixes to PySpark test module definitions d6a77d3 [Josh Rosen] Fix the tests of dev/run-tests def2d8a [Josh Rosen] Two minor fixes aec0b8f [Josh Rosen] Actually get the Kafka stuff to run properly 04015b9 [Josh Rosen] First attempt at getting PySpark Kafka test to work in new runner script 4c97136 [Josh Rosen] PYTHONPATH fixes dcc9c09 [Josh Rosen] Fix time division 32660fc [Josh Rosen] Initial cut at Python test runner refactoring 311c6a9 [Josh Rosen] Move shell utility functions to own module. 1bdeb87 [Josh Rosen] Move module definitions to separate file.
Configuration menu - View commit details
-
Copy full SHA for 40648c5 - Browse repository at this point
Copy the full SHA 40648c5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 42db3a1 - Browse repository at this point
Copy the full SHA 42db3a1View commit details -
[SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all
Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers. See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits: 70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
Configuration menu - View commit details
-
Copy full SHA for f510045 - Browse repository at this point
Copy the full SHA f510045View commit details -
[SPARK-8649] [BUILD] Mapr repository is not defined properly
The previous commiter on this part was pwendell The previous url gives 404, the new one seems to be OK. This patch is added under the Apache License 2.0. The JIRA link: https://issues.apache.org/jira/browse/SPARK-8649 Author: Thomas Szymanski <develop@tszymanski.com> Closes apache#7054 from tszym/SPARK-8649 and squashes the following commits: bfda9c4 [Thomas Szymanski] [SPARK-8649] [BUILD] Mapr repository is not defined properly
Configuration menu - View commit details
-
Copy full SHA for 52d1281 - Browse repository at this point
Copy the full SHA 52d1281View commit details -
[SPARK-8610] [SQL] Separate Row and InternalRow (part 2)
Currently, we use GenericRow both for Row and InternalRow, which is confusing because it could contain Scala type also Catalyst types. This PR changes to use GenericInternalRow for InternalRow (contains catalyst types), GenericRow for Row (contains Scala types). Also fixes some incorrect use of InternalRow or Row. Author: Davies Liu <davies@databricks.com> Closes apache#7003 from davies/internalrow and squashes the following commits: d05866c [Davies Liu] fix test: rollback changes for pyspark 72878dd [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow efd0b25 [Davies Liu] fix copy of MutableRow 87b13cf [Davies Liu] fix test d2ebd72 [Davies Liu] fix style eb4b473 [Davies Liu] mark expensive API as final bd4e99c [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow bdfb78f [Davies Liu] remove BaseMutableRow 6f99a97 [Davies Liu] fix catalyst test defe931 [Davies Liu] remove BaseRow 288b31f [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow 9d24350 [Davies Liu] separate Row and InternalRow (part 2)
Davies Liu committedJun 28, 2015 Configuration menu - View commit details
-
Copy full SHA for 77da5be - Browse repository at this point
Copy the full SHA 77da5beView commit details -
[SPARK-8686] [SQL] DataFrame should support
where
with expression r……epresented by String DataFrame supports `filter` function with two types of argument, `Column` and `String`. But `where` doesn't. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#7063 from sarutak/SPARK-8686 and squashes the following commits: 180f9a4 [Kousuke Saruta] Added test d61aec4 [Kousuke Saruta] Add "where" method with String argument to DataFrame
Configuration menu - View commit details
-
Copy full SHA for ec78438 - Browse repository at this point
Copy the full SHA ec78438View commit details -
[SPARK-8596] [EC2] Added port for Rstudio
This would otherwise need to be set manually by R users in AWS. https://issues.apache.org/jira/browse/SPARK-8596 Author: Vincent D. Warmerdam <vincentwarmerdam@gmail.com> Author: vincent <vincentwarmerdam@gmail.com> Closes apache#7068 from koaning/rstudio-port-number and squashes the following commits: ac8100d [vincent] Update spark_ec2.py ce6ad88 [Vincent D. Warmerdam] added port number for rstudio
Configuration menu - View commit details
-
Copy full SHA for 9ce78b4 - Browse repository at this point
Copy the full SHA 9ce78b4View commit details -
[SPARK-8677] [SQL] Fix non-terminating decimal expansion for decimal …
…divide operation JIRA: https://issues.apache.org/jira/browse/SPARK-8677 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#7056 from viirya/fix_decimal3 and squashes the following commits: 34d7419 [Liang-Chi Hsieh] Fix Non-terminating decimal expansion for decimal divide operation.
Configuration menu - View commit details
-
Copy full SHA for 24fda73 - Browse repository at this point
Copy the full SHA 24fda73View commit details
Commits on Jun 29, 2015
-
[SPARK-7845] [BUILD] Bumping default Hadoop version used in profile h…
…adoop-1 to 1.2.1 PR apache#5694 reverted PR apache#6384 while refactoring `dev/run-tests` to `dev/run-tests.py`. Also, PR apache#6384 didn't bump Hadoop 1 version defined in POM. Author: Cheng Lian <lian@databricks.com> Closes apache#7062 from liancheng/spark-7845 and squashes the following commits: c088b72 [Cheng Lian] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1
Configuration menu - View commit details
-
Copy full SHA for 00a9d22 - Browse repository at this point
Copy the full SHA 00a9d22View commit details -
[SPARK-7212] [MLLIB] Add sequence learning flag
Support mining of ordered frequent item sequences. Author: Feynman Liang <fliang@databricks.com> Closes apache#6997 from feynmanliang/fp-sequence and squashes the following commits: 7c14e15 [Feynman Liang] Improve scalatests with R code and Seq 0d3e4b6 [Feynman Liang] Fix python test ce987cb [Feynman Liang] Backwards compatibility aux constructor 34ef8f2 [Feynman Liang] Fix failing test due to reverse orderering f04bd50 [Feynman Liang] Naming, add ordered to FreqItemsets, test ordering using Seq 648d4d4 [Feynman Liang] Test case for frequent item sequences 252a36a [Feynman Liang] Add sequence learning flag
Configuration menu - View commit details
-
Copy full SHA for 25f574e - Browse repository at this point
Copy the full SHA 25f574eView commit details -
[SPARK-5962] [MLLIB] Python support for Power Iteration Clustering
Python support for Power Iteration Clustering https://issues.apache.org/jira/browse/SPARK-5962 Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#6992 from yanboliang/pyspark-pic and squashes the following commits: 6b03d82 [Yanbo Liang] address comments 4be4423 [Yanbo Liang] Python support for Power Iteration Clustering
Configuration menu - View commit details
-
Copy full SHA for dfde31d - Browse repository at this point
Copy the full SHA dfde31dView commit details -
[SPARK-8575] [SQL] Deprecate callUDF in favor of udf
Follow up of [SPARK-8356](https://issues.apache.org/jira/browse/SPARK-8356) and apache#6902. Removes the unit test for the now deprecated ```callUdf``` Unit test in SQLQuerySuite now uses ```udf``` instead of ```callUDF``` Replaced ```callUDF``` by ```udf``` where possible in mllib Author: BenFradet <benjamin.fradet@gmail.com> Closes apache#6993 from BenFradet/SPARK-8575 and squashes the following commits: 26f5a7a [BenFradet] 2 spaces instead of 1 1ddb452 [BenFradet] renamed initUDF in order to be consistent in OneVsRest 48ca15e [BenFradet] used vector type tag for udf call in VectorIndexer 0ebd0da [BenFradet] replace the now deprecated callUDF by udf in VectorIndexer 8013409 [BenFradet] replaced the now deprecated callUDF by udf in Predictor 94345b5 [BenFradet] unifomized udf calls in ProbabilisticClassifier 1305492 [BenFradet] uniformized udf calls in Classifier a672228 [BenFradet] uniformized udf calls in OneVsRest 49e4904 [BenFradet] Revert "removal of the unit test for the now deprecated callUdf" bbdeaf3 [BenFradet] fixed syntax for init udf in OneVsRest fe2a10b [BenFradet] callUDF => udf in ProbabilisticClassifier 0ea30b3 [BenFradet] callUDF => udf in Classifier where possible 197ec82 [BenFradet] callUDF => udf in OneVsRest 84d6780 [BenFradet] modified unit test in SQLQuerySuite to use udf instead of callUDF 477709f [BenFradet] removal of the unit test for the now deprecated callUdf
Configuration menu - View commit details
-
Copy full SHA for 0b10662 - Browse repository at this point
Copy the full SHA 0b10662View commit details -
[SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala
I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all match. I added `Option` to reader and writer and updated the `pyspark-sql` test. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes apache#7078 from piaozhexiu/SPARK-8355 and squashes the following commits: c63d419 [Cheolsoo Park] Fix version 524e0aa [Cheolsoo Park] Add option function to df reader and writer
Configuration menu - View commit details
-
Copy full SHA for ac2e17b - Browse repository at this point
Copy the full SHA ac2e17bView commit details -
[SPARK-8698] partitionBy in Python DataFrame reader/writer interface …
…should not default to empty tuple. Author: Reynold Xin <rxin@databricks.com> Closes apache#7079 from rxin/SPARK-8698 and squashes the following commits: 8513e1c [Reynold Xin] [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.
Configuration menu - View commit details
-
Copy full SHA for 660c6ce - Browse repository at this point
Copy the full SHA 660c6ceView commit details -
[SPARK-8702] [WEBUI] Avoid massive concating strings in Javascript
When there are massive tasks, such as `sc.parallelize(1 to 100000, 10000).count()`, the generated JS codes have a lot of string concatenations in the stage page, nearly 40 string concatenations for one task. We can generate the whole string for a task instead of execution string concatenations in the browser. Before this patch, the load time of the page is about 21 seconds. ![screen shot 2015-06-29 at 6 44 04 pm](https://cloud.githubusercontent.com/assets/1000778/8406644/eb55ed18-1e90-11e5-9ad5-50d27ad1dff1.png) After this patch, it reduces to about 17 seconds. ![screen shot 2015-06-29 at 6 47 34 pm](https://cloud.githubusercontent.com/assets/1000778/8406665/087003ca-1e91-11e5-80a8-3485aa9adafa.png) One disadvantage is that the generated JS codes become hard to read. Author: zsxwing <zsxwing@gmail.com> Closes apache#7082 from zsxwing/js-string and squashes the following commits: b29231d [zsxwing] Avoid massive concating strings in Javascript
Configuration menu - View commit details
-
Copy full SHA for 630bd5f - Browse repository at this point
Copy the full SHA 630bd5fView commit details -
[SPARK-8693] [PROJECT INFRA] profiles and goals are not printed in a …
…nice way Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like: ``` -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: streaming-kafka-assembly/assembly ``` Author: Brennon York <brennon.york@capitalone.com> Closes apache#7085 from brennonyork/SPARK-8693 and squashes the following commits: c5575f1 [Brennon York] added commas to end of print statements for proper printing
Configuration menu - View commit details
-
Copy full SHA for 5c796d5 - Browse repository at this point
Copy the full SHA 5c796d5View commit details -
[SPARK-8554] Add the SparkR document files to
.rat-excludes
for `./……dev/check-license` [[SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8554) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes apache#6947 from yu-iskw/SPARK-8554 and squashes the following commits: 5ca240c [Yu ISHIKAWA] [SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license`
Configuration menu - View commit details
-
Copy full SHA for 715f084 - Browse repository at this point
Copy the full SHA 715f084View commit details -
Revert "[SPARK-8372] History server shows incorrect information for a…
…pplication not started" This reverts commit 2837e06.
Andrew Or committedJun 29, 2015 Configuration menu - View commit details
-
Copy full SHA for ea88b1a - Browse repository at this point
Copy the full SHA ea88b1aView commit details -
[SPARK-8692] [SQL] re-order the case statements that handling catalys…
…t data types use same order: boolean, byte, short, int, date, long, timestamp, float, double, string, binary, decimal. Then we can easily check whether some data types are missing by just one glance, and make sure we handle data/timestamp just as int/long. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#7073 from cloud-fan/fix-date and squashes the following commits: 463044d [Wenchen Fan] fix style 51cd347 [Wenchen Fan] refactor handling of date and timestmap
Configuration menu - View commit details
-
Copy full SHA for ed413bc - Browse repository at this point
Copy the full SHA ed413bcView commit details -
[SPARK-8066, SPARK-8067] [hive] Add support for Hive 1.0, 1.1 and 1.2.
Allow HiveContext to connect to metastores of those versions; some new shims had to be added to account for changing internal APIs. A new test was added to exercise the "reset()" path which now also requires a shim; and the test code was changed to use a directory under the build's target to store ivy dependencies. Without that, at least I consistently run into issues with Ivy messing up (or being confused) by my existing caches. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#7026 from vanzin/SPARK-8067 and squashes the following commits: 3e2e67b [Marcelo Vanzin] [SPARK-8066, SPARK-8067] [hive] Add support for Hive 1.0, 1.1 and 1.2.
Configuration menu - View commit details
-
Copy full SHA for 3664ee2 - Browse repository at this point
Copy the full SHA 3664ee2View commit details -
[SPARK-8235] [SQL] misc function sha / sha1
Jira: https://issues.apache.org/jira/browse/SPARK-8235 I added the support for sha1. If I understood rxin correctly, sha and sha1 should execute the same algorithm, shouldn't they? Please take a close look on the Python part. This is adopted from apache#6934 Author: Tarek Auel <tarek.auel@gmail.com> Author: Tarek Auel <tarek.auel@googlemail.com> Closes apache#6963 from tarekauel/SPARK-8235 and squashes the following commits: f064563 [Tarek Auel] change to shaHex 7ce3cdc [Tarek Auel] rely on automatic cast a1251d6 [Tarek Auel] Merge remote-tracking branch 'upstream/master' into SPARK-8235 68eb043 [Tarek Auel] added docstring be5aff1 [Tarek Auel] improved error message 7336c96 [Tarek Auel] added type check cf23a80 [Tarek Auel] simplified example ebf75ef [Tarek Auel] [SPARK-8301] updated the python documentation. Removed sha in python and scala 6d6ff0d [Tarek Auel] [SPARK-8233] added docstring ea191a9 [Tarek Auel] [SPARK-8233] fixed signatureof python function. Added expected type to misc e3fd7c3 [Tarek Auel] SPARK[8235] added sha to the list of __all__ e5dad4e [Tarek Auel] SPARK[8235] sha / sha1
Configuration menu - View commit details
-
Copy full SHA for a5c2961 - Browse repository at this point
Copy the full SHA a5c2961View commit details -
[SPARK-8528] Expose SparkContext.applicationId in PySpark
Use case - we want to log applicationId (YARN in hour case) to request help with troubleshooting from the DevOps Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com> Closes apache#6936 from smartkiwi/master and squashes the following commits: 870338b [Vladimir Vladimirov] this would make doctest to run in python3 0eae619 [Vladimir Vladimirov] Scala doesn't use u'...' for unicode literals 14d77a8 [Vladimir Vladimirov] stop using ELLIPSIS b4ebfc5 [Vladimir Vladimirov] addressed PR feedback - updated docstring 223a32f [Vladimir Vladimirov] fixed test - applicationId is property that returns the string 3221f5a [Vladimir Vladimirov] [SPARK-8528] added documentation for Scala 2cff090 [Vladimir Vladimirov] [SPARK-8528] add applicationId property for SparkContext object in pyspark
Configuration menu - View commit details
-
Copy full SHA for 492dca3 - Browse repository at this point
Copy the full SHA 492dca3View commit details -
[SQL][DOCS] Remove wrong example from DataFrame.scala
In DataFrame.scala, there are examples like as follows. ``` * // The following are equivalent: * peopleDf.filter($"age" > 15) * peopleDf.where($"age" > 15) * peopleDf($"age" > 15) ``` But, I think the last example doesn't work. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#6977 from sarutak/fix-dataframe-example and squashes the following commits: 46efbd7 [Kousuke Saruta] Removed wrong example
Configuration menu - View commit details
-
Copy full SHA for 94e040d - Browse repository at this point
Copy the full SHA 94e040dView commit details -
[SPARK-8214] [SQL] Add function hex
cc chenghao-intel adrian-wang Author: zhichao.li <zhichao.li@intel.com> Closes apache#6976 from zhichao-li/hex and squashes the following commits: e218d1b [zhichao.li] turn off scalastyle for non-ascii de3f5ea [zhichao.li] non-ascii char cf9c936 [zhichao.li] give separated buffer for each hex method 967ec90 [zhichao.li] Make 'value' as a feild of Hex 3b2fa13 [zhichao.li] tiny fix a647641 [zhichao.li] remove duplicate null check 7cab020 [zhichao.li] tiny refactoring 35ecfe5 [zhichao.li] add function hex
Configuration menu - View commit details
-
Copy full SHA for 637b4ee - Browse repository at this point
Copy the full SHA 637b4eeView commit details -
[SPARK-7862] [SQL] Disable the error message redirect to stderr
This is a follow up of apache#6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log. Author: Cheng Hao <hao.cheng@intel.com> Closes apache#6882 from chenghao-intel/verbose and squashes the following commits: bfedd77 [Cheng Hao] revert the write 76ff46b [Cheng Hao] update the CircularBuffer 692b19e [Cheng Hao] check the process exitValue for ScriptTransform 47e0970 [Cheng Hao] Use the RedirectThread instead 1de771d [Cheng Hao] naming the threads in ScriptTransformation 8536e81 [Cheng Hao] disable the error message redirection for stderr
Configuration menu - View commit details
-
Copy full SHA for c6ba2ea - Browse repository at this point
Copy the full SHA c6ba2eaView commit details -
[SPARK-8681] fixed wrong ordering of columns in crosstab
I specifically randomized the test. What crosstab does is equivalent to a countByKey, therefore if this test fails again for any reason, we will know that we hit a corner case or something. cc rxin marmbrus Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#7060 from brkyvz/crosstab-fixes and squashes the following commits: 0a65234 [Burak Yavuz] addressed comments v1 d96da7e [Burak Yavuz] fixed wrong ordering of columns in crosstab
Configuration menu - View commit details
-
Copy full SHA for be7ef06 - Browse repository at this point
Copy the full SHA be7ef06View commit details -
[SPARK-8070] [SQL] [PYSPARK] avoid spark jobs in createDataFrame
Avoid the unnecessary jobs when infer schema from list. cc yhuai mengxr Author: Davies Liu <davies@databricks.com> Closes apache#6606 from davies/improve_create and squashes the following commits: a5928bf [Davies Liu] Update MimaExcludes.scala 62da911 [Davies Liu] fix mima bab4d7d [Davies Liu] Merge branch 'improve_create' of github.com:davies/spark into improve_create eee44a8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_create 8d9292d [Davies Liu] Update context.py eb24531 [Davies Liu] Update context.py c969997 [Davies Liu] bug fix d5a8ab0 [Davies Liu] fix tests 8c3f10d [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_create 6ea5925 [Davies Liu] address comments 6ceaeff [Davies Liu] avoid spark jobs in createDataFrame
Configuration menu - View commit details
-
Copy full SHA for afae976 - Browse repository at this point
Copy the full SHA afae976View commit details -
[SPARK-8709] Exclude hadoop-client's mockito-all dependency
This patch excludes `hadoop-client`'s dependency on `mockito-all`. As of apache#7061, Spark depends on `mockito-core` instead of `mockito-all`, so the dependency from Hadoop was leading to test compilation failures for some of the Hadoop 2 SBT builds. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7090 from JoshRosen/SPARK-8709 and squashes the following commits: e190122 [Josh Rosen] [SPARK-8709] Exclude hadoop-client's mockito-all dependency.
Configuration menu - View commit details
-
Copy full SHA for 27ef854 - Browse repository at this point
Copy the full SHA 27ef854View commit details -
[SPARK-8056][SQL] Design an easier way to construct schema for both S…
…cala and Python I've added functionality to create new StructType similar to how we add parameters to a new SparkContext. I've also added tests for this type of creation. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes apache#6686 from ilganeli/SPARK-8056B and squashes the following commits: 27c1de1 [Ilya Ganelin] Rename 467d836 [Ilya Ganelin] Removed from_string in favor of _parse_Datatype_json_value 5fef5a4 [Ilya Ganelin] Updates for type parsing 4085489 [Ilya Ganelin] Style errors 3670cf5 [Ilya Ganelin] added string to DataType conversion 8109e00 [Ilya Ganelin] Fixed error in tests 41ab686 [Ilya Ganelin] Fixed style errors e7ba7e0 [Ilya Ganelin] Moved some python tests to tests.py. Added cleaner handling of null data type and added test for correctness of input format 15868fa [Ilya Ganelin] Fixed python errors b79b992 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-8056B a3369fc [Ilya Ganelin] Fixing space errors e240040 [Ilya Ganelin] Style bab7823 [Ilya Ganelin] Constructor error 73d4677 [Ilya Ganelin] Style 4ed00d9 [Ilya Ganelin] Fixed default arg 67df57a [Ilya Ganelin] Removed Foo 04cbf0c [Ilya Ganelin] Added comments for single object 0484d7a [Ilya Ganelin] Restored second method 6aeb740 [Ilya Ganelin] Style 689e54d [Ilya Ganelin] Style f497e9e [Ilya Ganelin] Got rid of old code e3c7a88 [Ilya Ganelin] Fixed doctest failure a62ccde [Ilya Ganelin] Style 966ac06 [Ilya Ganelin] style checks dabb7e6 [Ilya Ganelin] Added Python tests a3f4152 [Ilya Ganelin] added python bindings and better comments e6e536c [Ilya Ganelin] Added extra space 7529a2e [Ilya Ganelin] Fixed formatting d388f86 [Ilya Ganelin] Fixed small bug c4e3bf5 [Ilya Ganelin] Reverted to using parse. Updated parse to support long d7634b6 [Ilya Ganelin] Reverted to fromString to properly support types 22c39d5 [Ilya Ganelin] replaced FromString with DataTypeParser.parse. Replaced empty constructor initializing a null to have it instead create a new array to allow appends to it. faca398 [Ilya Ganelin] [SPARK-8056] Replaced default argument usage. Updated usage and code for DataType.fromString 1acf76e [Ilya Ganelin] Scala style e31c674 [Ilya Ganelin] Fixed bug in test 8dc0795 [Ilya Ganelin] Added tests for creation of StructType object with new methods fdf7e9f [Ilya Ganelin] [SPARK-8056] Created add methods to facilitate building new StructType objects.
Configuration menu - View commit details
-
Copy full SHA for f6fc254 - Browse repository at this point
Copy the full SHA f6fc254View commit details -
[SPARK-7810] [PYSPARK] solve python rdd socket connection problem
Method "_load_from_socket" in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New modification should work around both two protocols. Author: Ai He <ai.he@ussuning.com> Author: AiHe <ai.he@ussuning.com> Closes apache#6338 from AiHe/pyspark-networking-issue and squashes the following commits: d4fc9c4 [Ai He] handle code review 2 e75c5c8 [Ai He] handle code review 5644953 [AiHe] solve python rdd socket connection problem to jvm
Ai He authored and Davies Liu committedJun 29, 2015 Configuration menu - View commit details
-
Copy full SHA for ecd3aac - Browse repository at this point
Copy the full SHA ecd3aacView commit details -
[SPARK-8660][ML] Convert JavaDoc style comments inLogisticRegressionS…
…uite.scala to regular multiline comments, to make copy-pasting R commands easier Converted JavaDoc style comments in mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala to regular multiline comments, to make copy-pasting R commands easier. Author: Rosstin <asterazul@gmail.com> Closes apache#7096 from Rosstin/SPARK-8660 and squashes the following commits: 242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala 2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
Configuration menu - View commit details
-
Copy full SHA for c8ae887 - Browse repository at this point
Copy the full SHA c8ae887View commit details -
[SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF in…
…stead of Udf Follow-up of apache#6902 for being coherent between ```Udf``` and ```UDF``` Author: BenFradet <benjamin.fradet@gmail.com> Closes apache#6920 from BenFradet/SPARK-8478 and squashes the following commits: c500f29 [BenFradet] renamed a few variables in functions to use UDF 8ab0f2d [BenFradet] renamed idUdf to idUDF in SQLQuerySuite 98696c2 [BenFradet] renamed originalUdfs in TestHive to originalUDFs 7738f74 [BenFradet] modified HiveUDFSuite to use only UDF c52608d [BenFradet] renamed HiveUdfSuite to HiveUDFSuite e51b9ac [BenFradet] renamed ExtractPythonUdfs to ExtractPythonUDFs 8c756f1 [BenFradet] renamed Hive UDF related code 2a1ca76 [BenFradet] renamed pythonUdfs to pythonUDFs 261e6fb [BenFradet] renamed ScalaUdf to ScalaUDF
Configuration menu - View commit details
-
Copy full SHA for 931da5c - Browse repository at this point
Copy the full SHA 931da5cView commit details -
[SPARK-8579] [SQL] support arbitrary object in UnsafeRow
This PR brings arbitrary object support in UnsafeRow (both in grouping key and aggregation buffer). Two object pools will be created to hold those non-primitive objects, and put the index of them into UnsafeRow. In order to compare the grouping key as bytes, the objects in key will be stored in a unique object pool, to make sure same objects will have same index (used as hashCode). For StringType and BinaryType, we still put them as var-length in UnsafeRow when initializing for better performance. But for update, they will be an object inside object pools (there will be some garbages left in the buffer). BTW: Will create a JIRA once issue.apache.org is available. cc JoshRosen rxin Author: Davies Liu <davies@databricks.com> Closes apache#6959 from davies/unsafe_obj and squashes the following commits: 5ce39da [Davies Liu] fix comment 5e797bf [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj 5803d64 [Davies Liu] fix conflict 461d304 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj 2f41c90 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj b04d69c [Davies Liu] address comments 4859b80 [Davies Liu] fix comments f38011c [Davies Liu] add a test for grouping by decimal d2cf7ab [Davies Liu] add more tests for null checking 71983c5 [Davies Liu] add test for timestamp e8a1649 [Davies Liu] reuse buffer for string 39f09ca [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj 035501e [Davies Liu] fix style 236d6de [Davies Liu] support arbitrary object in UnsafeRow
Davies Liu committedJun 29, 2015 Configuration menu - View commit details
-
Copy full SHA for ed359de - Browse repository at this point
Copy the full SHA ed359deView commit details -
[SPARK-8661][ML] for LinearRegressionSuite.scala, changed javadoc-sty…
…le comments to regular multiline comments, to make copy-pasting R code more simple for mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments, to make copy-pasting R code more simple Author: Rosstin <asterazul@gmail.com> Closes apache#7098 from Rosstin/SPARK-8661 and squashes the following commits: 5a05dee [Rosstin] SPARK-8661 for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments to make it easier to copy-paste the R code. bb9a4b1 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8660 242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala 2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
Configuration menu - View commit details
-
Copy full SHA for 4e880cf - Browse repository at this point
Copy the full SHA 4e880cfView commit details -
[SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def.
jira: https://issues.apache.org/jira/browse/SPARK-8710 Author: Yin Huai <yhuai@databricks.com> Closes apache#7094 from yhuai/SPARK-8710 and squashes the following commits: c854baa [Yin Huai] Change ScalaReflection.mirror from a val to a def.
Configuration menu - View commit details
-
Copy full SHA for 4b497a7 - Browse repository at this point
Copy the full SHA 4b497a7View commit details -
[SPARK-8589] [SQL] cleanup DateTimeUtils
move date time related operations into `DateTimeUtils` and rename some methods to make it more clear. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6980 from cloud-fan/datetime and squashes the following commits: 9373a9d [Wenchen Fan] cleanup DateTimeUtil
Configuration menu - View commit details
-
Copy full SHA for 881662e - Browse repository at this point
Copy the full SHA 881662eView commit details
Commits on Jun 30, 2015
-
[SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuit…
…e "receiver info reporting" As per the unit test log in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35754/ ``` 15/06/24 23:09:10.210 Thread-3495 INFO ReceiverTracker: Starting 1 receivers 15/06/24 23:09:10.270 Thread-3495 INFO SparkContext: Starting job: apply at Transformer.scala:22 ... 15/06/24 23:09:14.259 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Started receiver and sleeping 15/06/24 23:09:14.270 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Reporting error and sleeping ``` it needs at least 4 seconds to receive all receiver events in this slow machine, but `timeout` for `eventually` is only 2 seconds. This PR increases `timeout` to make this test stable. Author: zsxwing <zsxwing@gmail.com> Closes apache#7017 from zsxwing/SPARK-8634 and squashes the following commits: 719cae4 [zsxwing] Fix flaky test StreamingListenerSuite "receiver info reporting"
Configuration menu - View commit details
-
Copy full SHA for cec9852 - Browse repository at this point
Copy the full SHA cec9852View commit details -
[SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in Spark…
…SubmitSuite Hopefully, this suite will not be flaky anymore. Author: Yin Huai <yhuai@databricks.com> Closes apache#7027 from yhuai/SPARK-8567 and squashes the following commits: c0167e2 [Yin Huai] Add sc.stop().
Configuration menu - View commit details
-
Copy full SHA for fbf7573 - Browse repository at this point
Copy the full SHA fbf7573View commit details -
[SPARK-8437] [DOCS] Using directory path without wildcard for filenam…
…e slow for large number of files with wholeTextFiles and binaryFiles Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' Author: Sean Owen <sowen@cloudera.com> Closes apache#7036 from srowen/SPARK-8437 and squashes the following commits: 0e813ae [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
Configuration menu - View commit details
-
Copy full SHA for 5d30eae - Browse repository at this point
Copy the full SHA 5d30eaeView commit details -
[SPARK-8410] [SPARK-8475] remove previous ivy resolution when using s…
…park-submit This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be prioritized. cc andrewor14 Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#7089 from brkyvz/delete-prev-ivy-resolution and squashes the following commits: a21f95a [Burak Yavuz] remove previous ivy resolution when using spark-submit
Configuration menu - View commit details
-
Copy full SHA for d7f796d - Browse repository at this point
Copy the full SHA d7f796dView commit details -
[SPARK-8019] [SPARKR] Support SparkR spawning worker R processes with…
… a command other then Rscript This is a simple change to add a new environment variable "spark.sparkr.r.command" that specifies the command that SparkR will use when creating an R engine process. If this is not specified, "Rscript" will be used by default. I did not add any documentation, since I couldn't find any place where environment variables (such as "spark.sparkr.use.daemon") are documented. I also did not add a unit test. The only test that would work generally would be one starting SparkR with sparkR.init(sparkEnvir=list(spark.sparkr.r.command="Rscript")), just using the default value. I think that this is a low-risk change. Likely committers: shivaram Author: Michael Sannella x268 <msannell@tibco.com> Closes apache#6557 from msannell/altR and squashes the following commits: 7eac142 [Michael Sannella x268] add spark.sparkr.r.command config parameter
Configuration menu - View commit details
-
Copy full SHA for 4a9e03f - Browse repository at this point
Copy the full SHA 4a9e03fView commit details -
Revert "[SPARK-8437] [DOCS] Using directory path without wildcard for…
… filename slow for large number of files with wholeTextFiles and binaryFiles" This reverts commit 5d30eae.
Andrew Or committedJun 30, 2015 Configuration menu - View commit details
-
Copy full SHA for 4c1808b - Browse repository at this point
Copy the full SHA 4c1808bView commit details -
[SPARK-8456] [ML] Ngram featurizer python
Python API for N-gram feature transformer Author: Feynman Liang <fliang@databricks.com> Closes apache#6960 from feynmanliang/ngram-featurizer-python and squashes the following commits: f9e37c9 [Feynman Liang] Remove debugging code 4dd81f4 [Feynman Liang] Fix typo and doctest 06c79ac [Feynman Liang] Style guide 26c1175 [Feynman Liang] Add python NGram API
Configuration menu - View commit details
-
Copy full SHA for 620605a - Browse repository at this point
Copy the full SHA 620605aView commit details -
[SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.c…
…rosstab cc yhuai Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#7100 from brkyvz/ct-flakiness-fix and squashes the following commits: abc299a [Burak Yavuz] change 'to' to until 7e96d7c [Burak Yavuz] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab
Configuration menu - View commit details
-
Copy full SHA for ecacb1e - Browse repository at this point
Copy the full SHA ecacb1eView commit details -
[SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7
Patch to fix crash with BINARY fields with ENUM original types. Author: Steven She <steven@canopylabs.com> Closes apache#7048 from stevencanopy/SPARK-8669 and squashes the following commits: 2e72979 [Steven She] [SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7
Configuration menu - View commit details
-
Copy full SHA for 4915e9e - Browse repository at this point
Copy the full SHA 4915e9eView commit details -
[SPARK-7667] [MLLIB] MLlib Python API consistency check
MLlib Python API consistency check Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#6856 from yanboliang/spark-7667 and squashes the following commits: 21bae35 [Yanbo Liang] remove duplicate code eb12f95 [Yanbo Liang] fix doc inherit problem 9e7ec3c [Yanbo Liang] address comments e763d32 [Yanbo Liang] MLlib Python API consistency check
Configuration menu - View commit details
-
Copy full SHA for f9b6bf2 - Browse repository at this point
Copy the full SHA f9b6bf2View commit details -
[SPARK-5161] Parallelize Python test execution
This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times. Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism). To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7031 from JoshRosen/parallelize-python-tests and squashes the following commits: feb3763 [Josh Rosen] Re-enable other tests f87ea81 [Josh Rosen] Only log output from failed tests d4ded73 [Josh Rosen] Logging improvements a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests 1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests 110cd9d [Josh Rosen] Fix universal_newlines for Python 3 cd13db8 [Josh Rosen] Also log python_implementation 9e31127 [Josh Rosen] Log Python --version output for each executable. a2b9094 [Josh Rosen] Bump up parallelism. 5552380 [Josh Rosen] Python 3 fix 866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks 87cb988 [Josh Rosen] Skip MLLib tests for PyPy 8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure 9129027 [Josh Rosen] Disable Spark UI in Python tests 037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins. af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution
Configuration menu - View commit details
-
Copy full SHA for 7bbbe38 - Browse repository at this point
Copy the full SHA 7bbbe38View commit details -
MAINTENANCE: Automated closing of pull requests.
This commit exists to close the following pull requests on Github: Closes apache#1767 (close requested by 'andrewor14') Closes apache#6952 (close requested by 'andrewor14') Closes apache#7051 (close requested by 'andrewor14') Closes apache#5357 (close requested by 'marmbrus') Closes apache#5233 (close requested by 'andrewor14') Closes apache#6930 (close requested by 'JoshRosen') Closes apache#5502 (close requested by 'andrewor14') Closes apache#6778 (close requested by 'andrewor14') Closes apache#7006 (close requested by 'andrewor14')
Configuration menu - View commit details
-
Copy full SHA for ea775b0 - Browse repository at this point
Copy the full SHA ea775b0View commit details -
[SPARK-8721][SQL] Rename ExpectsInputTypes => AutoCastInputTypes.
Author: Reynold Xin <rxin@databricks.com> Closes apache#7109 from rxin/auto-cast and squashes the following commits: a914cc3 [Reynold Xin] [SPARK-8721][SQL] Rename ExpectsInputTypes => AutoCastInputTypes.
Configuration menu - View commit details
-
Copy full SHA for f79410c - Browse repository at this point
Copy the full SHA f79410cView commit details -
[SPARK-8650] [SQL] Use the user-specified app name priority in SparkS…
…QLCLIDriver or HiveThriftServer2 When run `./bin/spark-sql --name query1.sql` [Before] ![before](https://cloud.githubusercontent.com/assets/1400819/8370336/fa20b75a-1bf8-11e5-9171-040049a53240.png) [After] ![after](https://cloud.githubusercontent.com/assets/1400819/8370189/dcc35cb4-1bf6-11e5-8796-a0694140bffb.png) Author: Yadong Qi <qiyadong2010@gmail.com> Closes apache#7030 from watermen/SPARK-8650 and squashes the following commits: 51b5134 [Yadong Qi] Improve code and add comment. e3d7647 [Yadong Qi] use spark.app.name priority.
Configuration menu - View commit details
-
Copy full SHA for e6c3f74 - Browse repository at this point
Copy the full SHA e6c3f74View commit details -
[SPARK-5161] [HOTFIX] Fix bug in Python test failure reporting
This patch fixes a bug introduced in apache#7031 which can cause Jenkins to incorrectly report a build with failed Python tests as passing if an error occurred while printing the test failure message. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7112 from JoshRosen/python-tests-hotfix and squashes the following commits: c3f2961 [Josh Rosen] Hotfix for bug in Python test failure reporting
Configuration menu - View commit details
-
Copy full SHA for 6c5a6db - Browse repository at this point
Copy the full SHA 6c5a6dbView commit details -
[SPARK-8434][SQL]Add a "pretty" parameter to the "show" method to dis…
…play long strings Sometimes the user may want to show the complete content of cells. Now `sql("set -v").show()` displays: ![screen shot 2015-06-18 at 4 34 51 pm](https://cloud.githubusercontent.com/assets/1000778/8227339/14d3c5ea-15d9-11e5-99b9-f00b7e93beef.png) The user needs to use something like `sql("set -v").collect().foreach(r => r.toSeq.mkString("\t"))` to show the complete content. This PR adds a `pretty` parameter to show. If `pretty` is false, `show` won't truncate strings or align cells right. ![screen shot 2015-06-18 at 4 21 44 pm](https://cloud.githubusercontent.com/assets/1000778/8227407/b6f8dcac-15d9-11e5-8219-8079280d76fc.png) Author: zsxwing <zsxwing@gmail.com> Closes apache#6877 from zsxwing/show and squashes the following commits: 22e28e9 [zsxwing] pretty -> truncate e582628 [zsxwing] Add pretty parameter to the show method in R a3cd55b [zsxwing] Fix calling showString in R 923cee4 [zsxwing] Add a "pretty" parameter to show to display long strings
Configuration menu - View commit details
-
Copy full SHA for 12671dd - Browse repository at this point
Copy the full SHA 12671ddView commit details -
[SPARK-8551] [ML] Elastic net python code example
Author: Shuo Xiang <shuoxiangpub@gmail.com> Closes apache#6946 from coderxiang/en-java-code-example and squashes the following commits: 7a4bdf8 [Shuo Xiang] address comments cddb02b [Shuo Xiang] add elastic net python example code f4fa534 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 6ad4865 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 180b496 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' aa0717d [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 98804c9 [Shuo Xiang] fix bug in topBykey and update test
Configuration menu - View commit details
-
Copy full SHA for 5452457 - Browse repository at this point
Copy the full SHA 5452457View commit details -
[SPARK-7756] [CORE] More robust SSL options processing.
Subset the enabled algorithms in an SSLOptions to the elements that are supported by the protocol provider. Update the list of ciphers in the sample config to include modern algorithms, and specify both Oracle and IBM names. In practice the user would either specify their own chosen cipher suites, or specify none, and delegate the decision to the provider. Author: Tim Ellison <t.p.ellison@gmail.com> Closes apache#7043 from tellison/SSLEnhancements and squashes the following commits: 034efa5 [Tim Ellison] Ensure Java imports are grouped and ordered by package. 3797f8b [Tim Ellison] Remove unnecessary use of Option to improve clarity, and fix import style ordering. 4b5c89f [Tim Ellison] More robust SSL options processing.
Configuration menu - View commit details
-
Copy full SHA for 2ed0c0a - Browse repository at this point
Copy the full SHA 2ed0c0aView commit details -
[SPARK-8590] [SQL] add code gen for ExtractValue
TODO: use array instead of Seq as internal representation for `ArrayType` Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6982 from cloud-fan/extract-value and squashes the following commits: e203bc1 [Wenchen Fan] address comments 4da0f0b [Wenchen Fan] some clean up f679969 [Wenchen Fan] fix bug e64f942 [Wenchen Fan] remove generic e3f8427 [Wenchen Fan] fix style and address comments fc694e8 [Wenchen Fan] add code gen for extract value
Configuration menu - View commit details
-
Copy full SHA for 08fab48 - Browse repository at this point
Copy the full SHA 08fab48View commit details -
[SPARK-8723] [SQL] improve divide and remainder code gen
We can avoid execution of both left and right expression by null and zero check. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#7111 from cloud-fan/cg and squashes the following commits: d6b12ef [Wenchen Fan] improve divide and remainder code gen
Configuration menu - View commit details
-
Copy full SHA for 865a834 - Browse repository at this point
Copy the full SHA 865a834View commit details -
[SPARK-8680] [SQL] Slightly improve PropagateTypes
JIRA: https://issues.apache.org/jira/browse/SPARK-8680 This PR slightly improve `PropagateTypes` in `HiveTypeCoercion`. It moves `q.inputSet` outside `q transformExpressions` instead calling `inputSet` multiple times. It also builds a map of attributes for looking attribute easily. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#7087 from viirya/improve_propagatetypes and squashes the following commits: 5c314c1 [Liang-Chi Hsieh] For comments. 913f6ad [Liang-Chi Hsieh] Slightly improve PropagateTypes.
Configuration menu - View commit details
-
Copy full SHA for a48e619 - Browse repository at this point
Copy the full SHA a48e619View commit details -
[SPARK-8236] [SQL] misc functions: crc32
https://issues.apache.org/jira/browse/SPARK-8236 Author: Shilei <shilei.qian@intel.com> Closes apache#7108 from qiansl127/Crc32 and squashes the following commits: 5477352 [Shilei] Change to AutoCastInputTypes 5f16e5d [Shilei] Add misc function crc32
Configuration menu - View commit details
-
Copy full SHA for 722aa5f - Browse repository at this point
Copy the full SHA 722aa5fView commit details -
[SPARK-8592] [CORE] CoarseGrainedExecutorBackend: Cannot register wit…
…h driver => NPE Look detail of this issue at [SPARK-8592](https://issues.apache.org/jira/browse/SPARK-8592) **CoarseGrainedExecutorBackend** should exit when **RegisterExecutor** failed Author: xuchenCN <chenxu198511@gmail.com> Closes apache#7110 from xuchenCN/SPARK-8592 and squashes the following commits: 71e0077 [xuchenCN] [SPARK-8592] [CORE] CoarseGrainedExecutorBackend: Cannot register with driver => NPE
Configuration menu - View commit details
-
Copy full SHA for 689da28 - Browse repository at this point
Copy the full SHA 689da28View commit details -
[SPARK-8437] [DOCS] Corrected: Using directory path without wildcard …
…for filename slow for large number of files with wholeTextFiles and binaryFiles Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *) Author: Sean Owen <sowen@cloudera.com> Closes apache#7126 from srowen/SPARK-8437.2 and squashes the following commits: 7bb45da [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)
Configuration menu - View commit details
-
Copy full SHA for ada384b - Browse repository at this point
Copy the full SHA ada384bView commit details -
[SPARK-4127] [MLLIB] [PYSPARK] Python bindings for StreamingLinearReg…
…ressionWithSGD Python bindings for StreamingLinearRegressionWithSGD Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes apache#6744 from MechCoder/spark-4127 and squashes the following commits: d8f6457 [MechCoder] Moved StreamingLinearAlgorithm to pyspark.mllib.regression d47cc24 [MechCoder] Inherit from StreamingLinearAlgorithm 1b4ddd6 [MechCoder] minor 4de6c68 [MechCoder] Minor refactor 5e85a3b [MechCoder] Add tests for simultaneous training and prediction fb27889 [MechCoder] Add example and docs 505380b [MechCoder] Add tests d42bdae [MechCoder] [SPARK-4127] Python bindings for StreamingLinearRegressionWithSGD
Configuration menu - View commit details
-
Copy full SHA for 4528166 - Browse repository at this point
Copy the full SHA 4528166View commit details -
[SPARK-8679] [PYSPARK] [MLLIB] Default values in Pipeline API should …
…be immutable It might be dangerous to have a mutable as value for default param. (http://stackoverflow.com/a/11416002/1170730) e.g def func(example, f={}): f[example] = 1 return f func(2) {2: 1} func(3) {2:1, 3:1} mengxr Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes apache#7058 from MechCoder/pipeline_api_playground and squashes the following commits: 40a5eb2 [MechCoder] copy 95f7ff2 [MechCoder] [SPARK-8679] [PySpark] [MLlib] Default values in Pipeline API should be immutable
Configuration menu - View commit details
-
Copy full SHA for 5fa0863 - Browse repository at this point
Copy the full SHA 5fa0863View commit details -
[SPARK-8713] Make codegen thread safe
Codegen takes three steps: 1. Take a list of expressions, convert them into Java source code and a list of expressions that don't not support codegen (fallback to interpret mode). 2. Compile the Java source into Java class (bytecode) 3. Using the Java class and the list of expression to build a Projection. Currently, we cache the whole three steps, the key is a list of expression, result is projection. Because some of expressions (which may not thread-safe, for example, Random) will be hold by the Projection, the projection maybe not thread safe. This PR change to only cache the second step, then we can build projection using codegen even some expressions are not thread-safe, because the cache will not hold any expression anymore. cc marmbrus rxin JoshRosen Author: Davies Liu <davies@databricks.com> Closes apache#7101 from davies/codegen_safe and squashes the following commits: 7dd41f1 [Davies Liu] Merge branch 'master' of github.com:apache/spark into codegen_safe 847bd08 [Davies Liu] don't use scala.refect 4ddaaed [Davies Liu] Merge branch 'master' of github.com:apache/spark into codegen_safe 1793cf1 [Davies Liu] make codegen thread safe
Davies Liu committedJun 30, 2015 Configuration menu - View commit details
-
Copy full SHA for fbb267e - Browse repository at this point
Copy the full SHA fbb267eView commit details -
[SPARK-8615] [DOCUMENTATION] Fixed Sample deprecated code
Modified the deprecated jdbc api in the documentation. Author: Tijo Thomas <tijoparacka@gmail.com> Closes apache#7039 from tijoparacka/JIRA_8615 and squashes the following commits: 6e73b8a [Tijo Thomas] Reverted new lines 4042fcf [Tijo Thomas] updated to sql documentation a27949c [Tijo Thomas] Fixed Sample deprecated code
Configuration menu - View commit details
-
Copy full SHA for 9213f73 - Browse repository at this point
Copy the full SHA 9213f73View commit details -
[SPARK-7988] [STREAMING] Round-robin scheduling of receivers by default
Minimal PR for round-robin scheduling of receivers. Dense scheduling can be enabled by setting preferredLocation, so a new config parameter isn't really needed. Tested this on a cluster of 6 nodes and noticed 20-25% gain in throughput compared to random scheduling. tdas pwendell Author: nishkamravi2 <nishkamravi@gmail.com> Author: Nishkam Ravi <nravi@cloudera.com> Closes apache#6607 from nishkamravi2/master_nravi and squashes the following commits: 1918819 [Nishkam Ravi] Update ReceiverTrackerSuite.scala f747739 [Nishkam Ravi] Update ReceiverTrackerSuite.scala 6127e58 [Nishkam Ravi] Update ReceiverTracker and ReceiverTrackerSuite 9f1abc2 [nishkamravi2] Update ReceiverTrackerSuite.scala ae29152 [Nishkam Ravi] Update test suite with TD's suggestions 48a4a97 [nishkamravi2] Update ReceiverTracker.scala bc23907 [nishkamravi2] Update ReceiverTracker.scala 68e8540 [nishkamravi2] Update SchedulerSuite.scala 4604f28 [nishkamravi2] Update SchedulerSuite.scala 179b90f [nishkamravi2] Update ReceiverTracker.scala 242e677 [nishkamravi2] Update SchedulerSuite.scala 7f3e028 [Nishkam Ravi] Update ReceiverTracker.scala, add unit test cases in SchedulerSuite f8a3e05 [nishkamravi2] Update ReceiverTracker.scala 4cf97b6 [nishkamravi2] Update ReceiverTracker.scala 16e84ec [Nishkam Ravi] Update ReceiverTracker.scala 45e3a99 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi 02dbdb8 [Nishkam Ravi] Update ReceiverTracker.scala 07b9dfa [nishkamravi2] Update ReceiverTracker.scala 6caeefe [nishkamravi2] Update ReceiverTracker.scala 7888257 [nishkamravi2] Update ReceiverTracker.scala 6e3515c [Nishkam Ravi] Minor changes 975b8d8 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi 3cac21b [Nishkam Ravi] Generalize the scheduling algorithm b05ee2f [nishkamravi2] Update ReceiverTracker.scala bb5e09b [Nishkam Ravi] Add a new var in receiver to store location information for round-robin scheduling 41705de [nishkamravi2] Update ReceiverTracker.scala fff1b2e [Nishkam Ravi] Round-robin scheduling of streaming receivers
Configuration menu - View commit details
-
Copy full SHA for ca7e460 - Browse repository at this point
Copy the full SHA ca7e460View commit details -
[SPARK-8630] [STREAMING] Prevent from checkpointing QueueInputDStream
This PR throws an exception in `QueueInputDStream.writeObject` so that it can fail the application when calling `StreamingContext.start` rather than failing it during recovering QueueInputDStream. Author: zsxwing <zsxwing@gmail.com> Closes apache#7016 from zsxwing/queueStream-checkpoint and squashes the following commits: 89a3d73 [zsxwing] Fix JavaAPISuite.testQueueStream cc40fd7 [zsxwing] Prevent from checkpointing QueueInputDStream
Configuration menu - View commit details
-
Copy full SHA for 5726440 - Browse repository at this point
Copy the full SHA 5726440View commit details -
[SPARK-8619] [STREAMING] Don't recover keytab and principal configura…
…tion within Streaming checkpoint [Client.scala](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L786) will change these configurations, so this would cause the problem that the Streaming recover logic can't find the local keytab file(since configuration was changed) ```scala sparkConf.set("spark.yarn.keytab", keytabFileName) sparkConf.set("spark.yarn.principal", args.principal) ``` Problem described at [Jira](https://issues.apache.org/jira/browse/SPARK-8619) Author: huangzhaowei <carlmartinmax@gmail.com> Closes apache#7008 from SaintBacchus/SPARK-8619 and squashes the following commits: d50dbdf [huangzhaowei] Delect one blank space 9b8e92c [huangzhaowei] Fix code style and add a short comment. 0d8f800 [huangzhaowei] Don't recover keytab and principal configuration within Streaming checkpoint.
Configuration menu - View commit details
-
Copy full SHA for d16a944 - Browse repository at this point
Copy the full SHA d16a944View commit details -
[SPARK-6785] [SQL] fix DateTimeUtils for dates before 1970
Hi Michael, this Pull-Request is a follow-up to [PR-6242](apache#6242). I removed the two obsolete test cases from the HiveQuerySuite and deleted the corresponding golden answer files. Thanks for your review! Author: Christian Kadner <ckadner@us.ibm.com> Closes apache#6983 from ckadner/SPARK-6785 and squashes the following commits: ab1e79b [Christian Kadner] Merge remote-tracking branch 'origin/SPARK-6785' into SPARK-6785 1fed877 [Christian Kadner] [SPARK-6785][SQL] failed Scala style test, remove spaces on empty line DateTimeUtils.scala:61 9d8021d [Christian Kadner] [SPARK-6785][SQL] merge recent changes in DateTimeUtils & MiscFunctionsSuite b97c3fb [Christian Kadner] [SPARK-6785][SQL] move test case for DateTimeUtils to DateTimeUtilsSuite a451184 [Christian Kadner] [SPARK-6785][SQL] fix DateTimeUtils.fromJavaDate(java.util.Date) for Dates before 1970
Configuration menu - View commit details
-
Copy full SHA for 1e1f339 - Browse repository at this point
Copy the full SHA 1e1f339View commit details -
[SPARK-8664] [ML] Add PCA transformer
Add PCA transformer for ML pipeline Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#7065 from yanboliang/spark-8664 and squashes the following commits: 4afae45 [Yanbo Liang] address comments e9effd7 [Yanbo Liang] Add PCA transformer
Configuration menu - View commit details
-
Copy full SHA for c1befd7 - Browse repository at this point
Copy the full SHA c1befd7View commit details -
[SPARK-8628] [SQL] Race condition in AbstractSparkSQLParser.parse
Made lexical iniatialization as lazy val Author: Vinod K C <vinod.kc@huawei.com> Closes apache#7015 from vinodkc/handle_lexical_initialize_schronization and squashes the following commits: b6d1c74 [Vinod K C] Avoided repeated lexical initialization 5863cf7 [Vinod K C] Removed space e27c66c [Vinod K C] Avoid reinitialization of lexical in parse method ef4f60f [Vinod K C] Reverted import order e9fc49a [Vinod K C] handle synchronization in SqlLexical.initialize
Configuration menu - View commit details
-
Copy full SHA for b8e5bb6 - Browse repository at this point
Copy the full SHA b8e5bb6View commit details -
[SPARK-8471] [ML] Discrete Cosine Transform Feature Transformer
Implementation and tests for Discrete Cosine Transformer. Author: Feynman Liang <fliang@databricks.com> Closes apache#6894 from feynmanliang/dct-features and squashes the following commits: 433dbc7 [Feynman Liang] Test refactoring 91e9636 [Feynman Liang] Style guide and test helper refactor b5ac19c [Feynman Liang] Use Vector types, add Java test 530983a [Feynman Liang] Tests for other numeric datatypes 195d7aa [Feynman Liang] Implement support for arbitrary numeric types 95d4939 [Feynman Liang] Working DCT for 1D Doubles
Configuration menu - View commit details
-
Copy full SHA for 74cc16d - Browse repository at this point
Copy the full SHA 74cc16dView commit details -
[SPARK-7514] [MLLIB] Add MinMaxScaler to feature transformation
jira: https://issues.apache.org/jira/browse/SPARK-7514 Add a popular scaling method to feature component, which is commonly known as min-max normalization or Rescaling. Core function is, Normalized(x) = (x - min) / (max - min) * scale + newBase where `newBase` and `scale` are parameters (type Double) of the `VectorTransformer`. `newBase` is the new minimum number for the features, and `scale` controls the ranges after transformation. This is a little complicated than the basic MinMax normalization, yet it provides flexibility so that users can control the range more specifically. like [0.1, 0.9] in some NN application. For case that `max == min`, 0.5 is used as the raw value. (0.5 * scale + newBase) I'll add UT once the design got settled ( and this is not considered as too naive) reference: http://en.wikipedia.org/wiki/Feature_scaling http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm Author: Yuhao Yang <hhbyyh@gmail.com> Closes apache#6039 from hhbyyh/minMaxNorm and squashes the following commits: f942e9f [Yuhao Yang] add todo for metadata 8b37bbc [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm 4894dbc [Yuhao Yang] add copy fa2989f [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm 29db415 [Yuhao Yang] add clue and minor adjustment 5b8f7cc [Yuhao Yang] style fix 9b133d0 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm 22f20f2 [Yuhao Yang] style change and bug fix 747c9bb [Yuhao Yang] add ut and remove mllib version a5ba0aa [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm 585cc07 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm 1c6dcb1 [Yuhao Yang] minor change 0f1bc80 [Yuhao Yang] add MinMaxScaler to ml 8e7436e [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm 3663165 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into minMaxNorm 1247c27 [Yuhao Yang] some comments improvement d285a19 [Yuhao Yang] initial checkin for minMaxNorm
Configuration menu - View commit details
-
Copy full SHA for 61d7b53 - Browse repository at this point
Copy the full SHA 61d7b53View commit details -
[SPARK-8560] [UI] The Executors page will have negative if having res…
…ubmitted tasks when the ```taskEnd.reason``` is ```Resubmitted```, it shouldn't do statistics. Because this tasks has a ```SUCCESS``` taskEnd before. Author: xutingjun <xutingjun@huawei.com> Closes apache#6950 from XuTingjun/pageError and squashes the following commits: af35dc3 [xutingjun] When taskEnd is Resubmitted, don't do statistics
Configuration menu - View commit details
-
Copy full SHA for 79f0b37 - Browse repository at this point
Copy the full SHA 79f0b37View commit details -
[SPARK-2645] [CORE] Allow SparkEnv.stop() to be called multiple times…
… without side effects. Fix for SparkContext stop behavior - Allow sc.stop() to be called multiple times without side effects. Author: Joshi <rekhajoshm@gmail.com> Author: Rekha Joshi <rekhajoshm@gmail.com> Closes apache#6973 from rekhajoshm/SPARK-2645 and squashes the following commits: 277043e [Joshi] Fix for SparkContext stop behavior 446b0a4 [Joshi] Fix for SparkContext stop behavior 2ce5760 [Joshi] Fix for SparkContext stop behavior c97839a [Joshi] Fix for SparkContext stop behavior 1aff39c [Joshi] Fix for SparkContext stop behavior 12f66b5 [Joshi] Fix for SparkContext stop behavior 72bb484 [Joshi] Fix for SparkContext stop behavior a5a7d7f [Joshi] Fix for SparkContext stop behavior 9193a0c [Joshi] Fix for SparkContext stop behavior 58dba70 [Joshi] SPARK-2645: Fix for SparkContext stop behavior 380c5b0 [Joshi] SPARK-2645: Fix for SparkContext stop behavior b566b66 [Joshi] SPARK-2645: Fix for SparkContext stop behavior 0be142d [Rekha Joshi] Merge pull request #3 from apache/master 106fd8e [Rekha Joshi] Merge pull request #2 from apache/master e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
Configuration menu - View commit details
-
Copy full SHA for 7dda084 - Browse repository at this point
Copy the full SHA 7dda084View commit details -
[SPARK-8372] Do not show applications that haven't recorded their app…
… ID yet. Showing these applications may lead to weird behavior in the History Server. For old logs, if the app ID is recorded later, you may end up with a duplicate entry. For new logs, the app might be listed with a ".inprogress" suffix. So ignore those, but still allow old applications that don't record app IDs at all (1.0 and 1.1) to be shown. Author: Marcelo Vanzin <vanzin@cloudera.com> Author: Carson Wang <carson.wang@intel.com> Closes apache#7097 from vanzin/SPARK-8372 and squashes the following commits: a24eab2 [Marcelo Vanzin] Feedback. 112ae8f [Marcelo Vanzin] Merge branch 'master' into SPARK-8372 7b91b74 [Marcelo Vanzin] Handle logs generated by 1.0 and 1.1. 1eca3fe [Carson Wang] [SPARK-8372] History server shows incorrect information for application not started
Marcelo Vanzin authored and Andrew Or committedJun 30, 2015 Configuration menu - View commit details
-
Copy full SHA for 4bb8375 - Browse repository at this point
Copy the full SHA 4bb8375View commit details -
[SPARK-8736] [ML] GBTRegressor should not threshold prediction
Changed GBTRegressor so it does NOT threshold the prediction. Added test which fails with bug but works after fix. CC: feynmanliang mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#7134 from jkbradley/gbrt-fix and squashes the following commits: 613b90e [Joseph K. Bradley] Changed GBTRegressor so it does NOT threshold the prediction
Configuration menu - View commit details
-
Copy full SHA for 3ba23ff - Browse repository at this point
Copy the full SHA 3ba23ffView commit details -
[SPARK-8705] [WEBUI] Don't display rects when totalExecutionTime is 0
Because `System.currentTimeMillis()` is not accurate for tasks that only need several milliseconds, sometimes `totalExecutionTime` in `makeTimeline` will be 0. If `totalExecutionTime` is 0, there will the following error in the console. ![screen shot 2015-06-29 at 7 08 55 pm](https://cloud.githubusercontent.com/assets/1000778/8406776/5cd38e04-1e92-11e5-89f2-0c5134fe4b6b.png) This PR fixes it by using an empty svg tag when `totalExecutionTime` is 0. This is a screenshot for a task that its totalExecutionTime is 0 after fixing it. ![screen shot 2015-06-30 at 12 26 52 am](https://cloud.githubusercontent.com/assets/1000778/8412896/7b33b4be-1ebf-11e5-9100-d6d656af3747.png) Author: zsxwing <zsxwing@gmail.com> Closes apache#7088 from zsxwing/SPARK-8705 and squashes the following commits: 9ee4ef5 [zsxwing] Address comments ef2ecfa [zsxwing] Don't display rects when totalExecutionTime is 0
Configuration menu - View commit details
-
Copy full SHA for 8c89896 - Browse repository at this point
Copy the full SHA 8c89896View commit details -
[SPARK-8563] [MLLIB] Fixed a bug so that IndexedRowMatrix.computeSVD(…
…).U.numCols = k I'm sorry that I made apache#6949 closed by mistake. I pushed codes again. And, I added a test code. > There is a bug that `U.numCols() = self.nCols` in `IndexedRowMatrix.computeSVD()` It should have been `U.numCols() = k = svd.U.numCols()` > ``` self = U * sigma * V.transpose (m x n) = (m x n) * (k x k) * (k x n) //ASIS --> (m x n) = (m x k) * (k x k) * (k x n) //TOBE ``` Author: lee19 <lee19@live.co.kr> Closes apache#6953 from lee19/MLlibBugfix and squashes the following commits: c1812a0 [lee19] [SPARK-8563] [MLlib] Used nRows instead of numRows() to reduce a burden. 4b9803b [lee19] [SPARK-8563] [MLlib] Fixed a build error. c2ccd89 [lee19] Added a unit test that validates matrix sizes of svd for [SPARK-8563][MLlib] 8373424 [lee19] [SPARK-8563][MLlib] Fixed a bug so that IndexedRowMatrix.computeSVD().U.numCols = k
Configuration menu - View commit details
-
Copy full SHA for e725262 - Browse repository at this point
Copy the full SHA e725262View commit details -
[SPARK-8739] [WEB UI] [WINDOWS] A illegal character
\r
can be conta……ined in StagePage. This issue was reported by saurfang. Thanks! There is a following code in StagePage.scala. ``` |width="$serializationTimeProportion%"></rect> |<rect class="getting-result-time-proportion" |x="$gettingResultTimeProportionPos%" y="0px" height="26px" |width="$gettingResultTimeProportion%"></rect></svg>', |'start': new Date($launchTime), |'end': new Date($finishTime) |} |""".stripMargin.replaceAll("\n", " ") ``` The last `replaceAll("\n", "")` doesn't work when we checkout and build source code on Windows and deploy on Linux. It's because when we checkout the source code on Windows, new-line-code is replaced with `"\r\n"` and `replaceAll("\n", "")` replaces only `"\n"`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#7133 from sarutak/SPARK-8739 and squashes the following commits: 17fb044 [Kousuke Saruta] Fixed a new-line-code issue
Configuration menu - View commit details
-
Copy full SHA for d2495f7 - Browse repository at this point
Copy the full SHA d2495f7View commit details -
[SPARK-8738] [SQL] [PYSPARK] capture SQL AnalysisException in Python API
Capture the AnalysisException in SQL, hide the long java stack trace, only show the error message. cc rxin Author: Davies Liu <davies@databricks.com> Closes apache#7135 from davies/ananylis and squashes the following commits: dad7ae7 [Davies Liu] add comment ec0c0e8 [Davies Liu] Update utils.py cdd7edd [Davies Liu] add doc 7b044c2 [Davies Liu] fix python 3 f84d3bd [Davies Liu] capture SQL AnalysisException in Python API
Davies Liu committedJun 30, 2015 Configuration menu - View commit details
-
Copy full SHA for 58ee2a2 - Browse repository at this point
Copy the full SHA 58ee2a2View commit details -
[SPARK-7739] [MLLIB] Improve ChiSqSelector example code in user guide
Author: sethah <seth.hendrickson16@gmail.com> Closes apache#7029 from sethah/working_on_SPARK-7739 and squashes the following commits: ef96916 [sethah] Fixing some style issues efea1f8 [sethah] adding clarification to ChiSqSelector example
Configuration menu - View commit details
-
Copy full SHA for 8d23587 - Browse repository at this point
Copy the full SHA 8d23587View commit details -
[SPARK-8741] [SQL] Remove e and pi from DataFrame functions.
Author: Reynold Xin <rxin@databricks.com> Closes apache#7137 from rxin/SPARK-8741 and squashes the following commits: 32c7e75 [Reynold Xin] [SPARK-8741][SQL] Remove e and pi from DataFrame functions.
Configuration menu - View commit details
-
Copy full SHA for 8133125 - Browse repository at this point
Copy the full SHA 8133125View commit details
Commits on Jul 1, 2015
-
[SPARK-8727] [SQL] Missing python api; md5, log2
Jira: https://issues.apache.org/jira/browse/SPARK-8727 Author: Tarek Auel <tarek.auel@gmail.com> Author: Tarek Auel <tarek.auel@googlemail.com> Closes apache#7114 from tarekauel/missing-python and squashes the following commits: ef4c61b [Tarek Auel] [SPARK-8727] revert dataframe change 4029d4d [Tarek Auel] removed dataframe pi and e unit test 66f0d2b [Tarek Auel] removed pi and e from python api and dataframe api; added _to_java_column(col) for strlen 4d07318 [Tarek Auel] fixed python unit test 45f2bee [Tarek Auel] fixed result of pi and e c39f47b [Tarek Auel] add python api bd50a3a [Tarek Auel] add missing python functions
Configuration menu - View commit details
-
Copy full SHA for ccdb052 - Browse repository at this point
Copy the full SHA ccdb052View commit details -
[SPARK-6602][Core] Update Master, Worker, Client, AppClient and relat…
…ed classes to use RpcEndpoint This PR updates the rest Actors in core to RpcEndpoint. Because there is no `ActorSelection` in RpcEnv, I changes the logic of `registerWithMaster` in Worker and AppClient to avoid blocking the message loop. These changes need to be reviewed carefully. Author: zsxwing <zsxwing@gmail.com> Closes apache#5392 from zsxwing/rpc-rewrite-part3 and squashes the following commits: 2de7bed [zsxwing] Merge branch 'master' into rpc-rewrite-part3 f12d943 [zsxwing] Address comments 9137b82 [zsxwing] Fix the code style e734c71 [zsxwing] Merge branch 'master' into rpc-rewrite-part3 2d24fb5 [zsxwing] Fix the code style 5a82374 [zsxwing] Merge branch 'master' into rpc-rewrite-part3 fa47110 [zsxwing] Merge branch 'master' into rpc-rewrite-part3 72304f0 [zsxwing] Update the error strategy for AkkaRpcEnv e56cb16 [zsxwing] Always send failure back to the sender a7b86e6 [zsxwing] Use JFuture for java.util.concurrent.Future aa34b9b [zsxwing] Fix the code style bd541e7 [zsxwing] Merge branch 'master' into rpc-rewrite-part3 25a84d8 [zsxwing] Use ThreadUtils 060ff31 [zsxwing] Merge branch 'master' into rpc-rewrite-part3 dbfc916 [zsxwing] Improve the docs and comments 837927e [zsxwing] Merge branch 'master' into rpc-rewrite-part3 5c27f97 [zsxwing] Merge branch 'master' into rpc-rewrite-part3 fadbb9e [zsxwing] Fix the code style 6637e3c [zsxwing] Merge remote-tracking branch 'origin/master' into rpc-rewrite-part3 7fdee0e [zsxwing] Fix the return type to ExecutorService and ScheduledExecutorService e8ad0a5 [zsxwing] Fix the code style 6b2a104 [zsxwing] Log error and use SparkExitCode.UNCAUGHT_EXCEPTION exit code fbf3194 [zsxwing] Add Utils.newDaemonSingleThreadExecutor and newDaemonSingleThreadScheduledExecutor b776817 [zsxwing] Update Master, Worker, Client, AppClient and related classes to use RpcEndpoint
Configuration menu - View commit details
-
Copy full SHA for 3bee0f1 - Browse repository at this point
Copy the full SHA 3bee0f1View commit details -
[SPARK-8471] [ML] Rename DiscreteCosineTransformer to DCT
Rename DiscreteCosineTransformer and related classes to DCT. Author: Feynman Liang <fliang@databricks.com> Closes apache#7138 from feynmanliang/dct-features and squashes the following commits: e547b3e [Feynman Liang] Fix renaming bug 9d5c9e4 [Feynman Liang] Lowercase JavaDCTSuite variable f9a8958 [Feynman Liang] Remove old files f8fe794 [Feynman Liang] Merge branch 'master' into dct-features 894d0b2 [Feynman Liang] Rename DiscreteCosineTransformer to DCT 433dbc7 [Feynman Liang] Test refactoring 91e9636 [Feynman Liang] Style guide and test helper refactor b5ac19c [Feynman Liang] Use Vector types, add Java test 530983a [Feynman Liang] Tests for other numeric datatypes 195d7aa [Feynman Liang] Implement support for arbitrary numeric types 95d4939 [Feynman Liang] Working DCT for 1D Doubles
Configuration menu - View commit details
-
Copy full SHA for f457569 - Browse repository at this point
Copy the full SHA f457569View commit details -
[SPARK-8535] [PYSPARK] PySpark : Can't create DataFrame from Pandas d…
…ataframe with no explicit column name Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`. So I think `pandas.columns` are should be convert to `String`. ### issue * [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535) Author: x1- <viva008@gmail.com> Closes apache#7124 from x1-/SPARK-8535 and squashes the following commits: d68fd38 [x1-] modify unit-test using pandas. ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String.
Configuration menu - View commit details
-
Copy full SHA for b6e76ed - Browse repository at this point
Copy the full SHA b6e76edView commit details -
[SPARK-6602][Core]Remove unnecessary synchronized
A follow-up pr to address apache#5392 (comment) Author: zsxwing <zsxwing@gmail.com> Closes apache#7141 from zsxwing/pr5392-follow-up and squashes the following commits: fcf7b50 [zsxwing] Remove unnecessary synchronized
Configuration menu - View commit details
-
Copy full SHA for 64c1461 - Browse repository at this point
Copy the full SHA 64c1461View commit details -
[SPARK-8748][SQL] Move castability test out from Cast case class into…
… Cast object. This patch moved resolve function in Cast case class into the companion object, and renamed it canCast. We can then use this in the analyzer without a Cast expr. Author: Reynold Xin <rxin@databricks.com> Closes apache#7145 from rxin/cast and squashes the following commits: cd086a9 [Reynold Xin] Whitespace changes. 4d2d989 [Reynold Xin] [SPARK-8748][SQL] Move castability test out from Cast case class into Cast object.
Configuration menu - View commit details
-
Copy full SHA for 365c140 - Browse repository at this point
Copy the full SHA 365c140View commit details -
[SPARK-8749][SQL] Remove HiveTypeCoercion trait.
Moved all the rules into the companion object. Author: Reynold Xin <rxin@databricks.com> Closes apache#7147 from rxin/SPARK-8749 and squashes the following commits: c1c6dc0 [Reynold Xin] [SPARK-8749][SQL] Remove HiveTypeCoercion trait.
Configuration menu - View commit details
-
Copy full SHA for fc3a6fe - Browse repository at this point
Copy the full SHA fc3a6feView commit details -
[SQL] [MINOR] remove internalRowRDD in DataFrame
Developers have already familiar with `queryExecution.toRDD` as internal row RDD, and we should not add new concept. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#7116 from cloud-fan/internal-rdd and squashes the following commits: 24756ca [Wenchen Fan] remove internalRowRDD
Configuration menu - View commit details
-
Copy full SHA for 0eee061 - Browse repository at this point
Copy the full SHA 0eee061View commit details -
[SPARK-8750][SQL] Remove the closure in functions.callUdf.
Author: Reynold Xin <rxin@databricks.com> Closes apache#7148 from rxin/calludf-closure and squashes the following commits: 00df372 [Reynold Xin] Fixed index out of bound exception. 4beba76 [Reynold Xin] [SPARK-8750][SQL] Remove the closure in functions.callUdf.
Configuration menu - View commit details
-
Copy full SHA for 9765241 - Browse repository at this point
Copy the full SHA 9765241View commit details -
[SPARK-8763] [PYSPARK] executing run-tests.py with Python 2.6 fails w…
…ith absence of subprocess.check_output function Running run-tests.py with Python 2.6 cause following error: ``` Running PySpark tests. Output is in python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log Will test against the following Python executables: ['python2.6', 'python3.4', 'pypy'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] Traceback (most recent call last): File "./python/run-tests.py", line 196, in <module> main() File "./python/run-tests.py", line 159, in main python_implementation = subprocess.check_output( AttributeError: 'module' object has no attribute 'check_output' ... ``` The cause of this error is using subprocess.check_output function, which exists since Python 2.7. (ref. https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output) Author: cocoatomo <cocoatomo77@gmail.com> Closes apache#7161 from cocoatomo/issues/8763-test-fails-py26 and squashes the following commits: cf4f901 [cocoatomo] [SPARK-8763] backport process.check_output function from Python 2.7
Configuration menu - View commit details
-
Copy full SHA for fdcad6e - Browse repository at this point
Copy the full SHA fdcad6eView commit details -
[SPARK-7714] [SPARKR] SparkR tests should use more specific expectati…
…ons than expect_true 1. Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'. 2. Update the pattern 'expect_true(inherits(a, b))' to 'expect_is(a, b)'. 3. Update the pattern 'expect_true(identical(a, b))' to 'expect_identical(a, b)'. Author: Sun Rui <rui.sun@intel.com> Closes apache#7152 from sun-rui/SPARK-7714 and squashes the following commits: 8ad2440 [Sun Rui] Fix test case errors. 8fe9f0c [Sun Rui] Update the pattern 'expect_true(identical(a, b))' to 'expect_identical(a, b)'. f1b8005 [Sun Rui] Update the pattern 'expect_true(inherits(a, b))' to 'expect_is(a, b)'. f631e94 [Sun Rui] Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'.
Configuration menu - View commit details
-
Copy full SHA for 69c5dee - Browse repository at this point
Copy the full SHA 69c5deeView commit details -
[SPARK-8752][SQL] Add ExpectsInputTypes trait for defining expected i…
…nput types. This patch doesn't actually introduce any code that uses the new ExpectsInputTypes. It just adds the trait so others can use it. Also renamed the old expectsInputTypes function to just inputTypes. We should add implicit type casting also in the future. Author: Reynold Xin <rxin@databricks.com> Closes apache#7151 from rxin/expects-input-types and squashes the following commits: 16cf07b [Reynold Xin] [SPARK-8752][SQL] Add ExpectsInputTypes trait for defining expected input types.
Configuration menu - View commit details
-
Copy full SHA for 4137f76 - Browse repository at this point
Copy the full SHA 4137f76View commit details -
[SPARK-8621] [SQL] support empty string as column name
improve the empty check in `parseAttributeName` so that we can allow empty string as column name. Close apache#7117 Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#7149 from cloud-fan/8621 and squashes the following commits: efa9e3e [Wenchen Fan] support empty string
Configuration menu - View commit details
-
Copy full SHA for 31b4a3d - Browse repository at this point
Copy the full SHA 31b4a3dView commit details -
[SPARK-6263] [MLLIB] Python MLlib API missing items: Utils
Implement missing API in pyspark. MLUtils * appendBias * loadVectors `kFold` is also missing however I am not sure `ClassTag` can be passed or restored through python. Author: lewuathe <lewuathe@me.com> Closes apache#5707 from Lewuathe/SPARK-6263 and squashes the following commits: 16863ea [lewuathe] Merge master 3fc27e7 [lewuathe] Merge branch 'master' into SPARK-6263 6084e9c [lewuathe] Resolv conflict d2aa2a0 [lewuathe] Resolv conflict 9c329d8 [lewuathe] Fix efficiency 3a12a2d [lewuathe] Merge branch 'master' into SPARK-6263 1d4714b [lewuathe] Fix style b29e2bc [lewuathe] Remove scipy dependencies e32eb40 [lewuathe] Merge branch 'master' into SPARK-6263 25d3c9d [lewuathe] Remove unnecessary imports 7ec04db [lewuathe] Resolv conflict 1502d13 [lewuathe] Resolv conflict d6bd416 [lewuathe] Check existence of scipy.sparse 5d555b1 [lewuathe] Construct scipy.sparse matrix c345a44 [lewuathe] Merge branch 'master' into SPARK-6263 b8b5ef7 [lewuathe] Fix unnecessary sort method d254be7 [lewuathe] Merge branch 'master' into SPARK-6263 62a9c7e [lewuathe] Fix appendBias return type 454c73d [lewuathe] Merge branch 'master' into SPARK-6263 a353354 [lewuathe] Remove unnecessary appendBias implementation 44295c2 [lewuathe] Merge branch 'master' into SPARK-6263 64f72ad [lewuathe] Merge branch 'master' into SPARK-6263 c728046 [lewuathe] Fix style 2980569 [lewuathe] [SPARK-6263] Python MLlib API missing items: Utils
Configuration menu - View commit details
-
Copy full SHA for 184de91 - Browse repository at this point
Copy the full SHA 184de91View commit details -
[SPARK-8308] [MLLIB] add missing save load for python example
jira: https://issues.apache.org/jira/browse/SPARK-8308 1. add some missing save/load in python examples. , LogisticRegression, LinearRegression and NaiveBayes 2. tune down iterations for MatrixFactorization, since current number will trigger StackOverflow for default java configuration (>1M) Author: Yuhao Yang <hhbyyh@gmail.com> Closes apache#6760 from hhbyyh/docUpdate and squashes the following commits: 9bd3383 [Yuhao Yang] update scala example 8a44692 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate 077cbb8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate 3e948dc [Yuhao Yang] add missing save load for python example
Configuration menu - View commit details
-
Copy full SHA for 2012913 - Browse repository at this point
Copy the full SHA 2012913View commit details -
[SPARK-8765] [MLLIB] [PYTHON] removed flaky python PIC test
See failure: [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console] CC yanboliang mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#7164 from jkbradley/pic-python-test and squashes the following commits: 156d55b [Joseph K. Bradley] removed flaky python PIC test
Configuration menu - View commit details
-
Copy full SHA for b8faa32 - Browse repository at this point
Copy the full SHA b8faa32View commit details -
[SPARK-8378] [STREAMING] Add the Python API for Flume
Author: zsxwing <zsxwing@gmail.com> Closes apache#6830 from zsxwing/flume-python and squashes the following commits: 78dfdac [zsxwing] Fix the compile error in the test code f1bf3c0 [zsxwing] Address TD's comments 0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly e93736b [zsxwing] Fix the test case for determine_modules_to_test 9d5821e [zsxwing] Fix pyspark_core dependencies f9ee681 [zsxwing] Merge branch 'master' into flume-python 7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py b96b0de [zsxwing] Merge branch 'master' into flume-python ce85e83 [zsxwing] Fix incompatible issues for Python 3 01cbb3d [zsxwing] Add import sys 152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3 14ba0ff [zsxwing] Add flume-assembly for sbt building b8d5551 [zsxwing] Merge branch 'master' into flume-python 4762c34 [zsxwing] Fix the doc 0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API 9f33873 [zsxwing] Add the Python API for Flume
Configuration menu - View commit details
-
Copy full SHA for 75b9fe4 - Browse repository at this point
Copy the full SHA 75b9fe4View commit details -
[SPARK-7820] [BUILD] Fix Java8-tests suite compile and test error und…
…er sbt Author: jerryshao <saisai.shao@intel.com> Closes apache#7120 from jerryshao/SPARK-7820 and squashes the following commits: 6902439 [jerryshao] fix Java8-tests suite compile error under sbt
Configuration menu - View commit details
-
Copy full SHA for 9f7db34 - Browse repository at this point
Copy the full SHA 9f7db34View commit details -
[QUICKFIX] [SQL] fix copy of generated row
copy() of generated Row doesn't check nullability of columns Author: Davies Liu <davies@databricks.com> Closes apache#7163 from davies/fix_copy and squashes the following commits: 661a206 [Davies Liu] fix copy of generated row
Davies Liu committedJul 1, 2015 Configuration menu - View commit details
-
Copy full SHA for 3083e17 - Browse repository at this point
Copy the full SHA 3083e17View commit details -
[SPARK-3444] [CORE] Restore INFO level after log4j test.
Otherwise other tests don't log anything useful... Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#7140 from vanzin/SPARK-3444 and squashes the following commits: de14836 [Marcelo Vanzin] Better fix. 6cff13a [Marcelo Vanzin] [SPARK-3444] [core] Restore INFO level after log4j test.
Configuration menu - View commit details
-
Copy full SHA for 1ce6428 - Browse repository at this point
Copy the full SHA 1ce6428View commit details -
[SPARK-8766] support non-ascii character in column names
Use UTF-8 to encode the name of column in Python 2, or it may failed to encode with default encoding ('ascii'). This PR also fix a bug when there is Java exception without error message. Author: Davies Liu <davies@databricks.com> Closes apache#7165 from davies/non_ascii and squashes the following commits: 02cb61a [Davies Liu] fix tests 3b09d31 [Davies Liu] add encoding in header 867754a [Davies Liu] support non-ascii character in column names
Davies Liu committedJul 1, 2015 Configuration menu - View commit details
-
Copy full SHA for f958f27 - Browse repository at this point
Copy the full SHA f958f27View commit details -
[SPARK-8770][SQL] Create BinaryOperator abstract class.
Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression. This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression. Author: Reynold Xin <rxin@databricks.com> Closes apache#7170 from rxin/binaryoperator and squashes the following commits: 51264a5 [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
Configuration menu - View commit details
-
Copy full SHA for 2727789 - Browse repository at this point
Copy the full SHA 2727789View commit details -
Revert "[SPARK-8770][SQL] Create BinaryOperator abstract class."
This reverts commit 2727789.
Configuration menu - View commit details
-
Copy full SHA for 3a342de - Browse repository at this point
Copy the full SHA 3a342deView commit details
Commits on Jul 2, 2015
-
[SPARK-8770][SQL] Create BinaryOperator abstract class.
Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression. This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression. Author: Reynold Xin <rxin@databricks.com> Closes apache#7174 from rxin/binary-opterator and squashes the following commits: f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class. fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator d8518cf [Reynold Xin] Updated Python tests.
Configuration menu - View commit details
-
Copy full SHA for 9fd13d5 - Browse repository at this point
Copy the full SHA 9fd13d5View commit details -
[SPARK-8660] [MLLIB] removed > symbols from comments in LogisticRegre…
…ssionSuite.scala for ease of copypaste '>' symbols removed from comments in LogisticRegressionSuite.scala, for ease of copypaste also single-lined the multiline commands (is this desirable, or does it violate style?) Author: Rosstin <asterazul@gmail.com> Closes apache#7167 from Rosstin/SPARK-8660-2 and squashes the following commits: f4b9bc8 [Rosstin] SPARK-8660 restored character limit on multiline comments in LogisticRegressionSuite.scala fe6b112 [Rosstin] SPARK-8660 > symbols removed from LogisticRegressionSuite.scala for easy of copypaste 39ddd50 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8661 5a05dee [Rosstin] SPARK-8661 for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments to make it easier to copy-paste the R code. bb9a4b1 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8660 242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala 2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
Configuration menu - View commit details
-
Copy full SHA for 4e4f74b - Browse repository at this point
Copy the full SHA 4e4f74bView commit details -
[SPARK-8227] [SQL] Add function unhex
cc chenghao-intel adrian-wang Author: zhichao.li <zhichao.li@intel.com> Closes apache#7113 from zhichao-li/unhex and squashes the following commits: 379356e [zhichao.li] remove exception checking a4ae6dc [zhichao.li] add udf_unhex to whitelist fe5c14a [zhichao.li] add todigit 607d7a3 [zhichao.li] use checkInputTypes bffd37f [zhichao.li] change to use Hex in apache common package cde73f5 [zhichao.li] update to use AutoCastInputTypes 11945c7 [zhichao.li] style c852d46 [zhichao.li] Add function unhex
Configuration menu - View commit details
-
Copy full SHA for b285ac5 - Browse repository at this point
Copy the full SHA b285ac5View commit details -
[SPARK-8754] [YARN] YarnClientSchedulerBackend doesn't stop gracefull…
…y in failure conditions In YarnClientSchedulerBackend.stop(), added a check for monitorThread. Author: Devaraj K <devaraj@apache.org> Closes apache#7153 from devaraj-kavali/master and squashes the following commits: 66be9ad [Devaraj K] https://issues.apache.org/jira/browse/SPARK-8754 YarnClientSchedulerBackend doesn't stop gracefully in failure conditions
Devaraj K authored and Andrew Or committedJul 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 792fcd8 - Browse repository at this point
Copy the full SHA 792fcd8View commit details -
[SPARK-8688] [YARN] Bug fix: disable the cache fs to gain the HDFS co…
…nnection. If `fs.hdfs.impl.disable.cache` was `false`(default), `FileSystem` will use the cached `DFSClient` which use old token. [AMDelegationTokenRenewer](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala#L196) ```scala val credentials = UserGroupInformation.getCurrentUser.getCredentials credentials.writeTokenStorageFile(tempTokenPath, discachedConfiguration) ``` Although the `credentials` had the new Token, but it still use the cached client and old token. So It's better to set the `fs.hdfs.impl.disable.cache` as `true` to avoid token expired. [Jira](https://issues.apache.org/jira/browse/SPARK-8688) Author: huangzhaowei <carlmartinmax@gmail.com> Closes apache#7069 from SaintBacchus/SPARK-8688 and squashes the following commits: f94cd0b [huangzhaowei] modify function parameter 8fb9eb9 [huangzhaowei] explicit the comment 0cd55c9 [huangzhaowei] Rename function name to be an accurate one cf776a1 [huangzhaowei] [SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection.
Configuration menu - View commit details
-
Copy full SHA for 646366b - Browse repository at this point
Copy the full SHA 646366bView commit details -
[SPARK-8771] [TRIVIAL] Add a version to the deprecated annotation for…
… the actorSystem Author: Holden Karau <holden@pigscanfly.ca> Closes apache#7172 from holdenk/SPARK-8771-actor-system-deprecation-tag-uses-deprecated-deprecation-tag and squashes the following commits: 7f1455b [Holden Karau] Add .0s to the versions for the derpecated anotations in SparkEnv.scala ca13c9d [Holden Karau] Add a version to the deprecated annotation for the actorSystem in SparkEnv
Configuration menu - View commit details
-
Copy full SHA for d14338e - Browse repository at this point
Copy the full SHA d14338eView commit details -
[SPARK-8769] [TRIVIAL] [DOCS] toLocalIterator should mention it resul…
…ts in many jobs Author: Holden Karau <holden@pigscanfly.ca> Closes apache#7171 from holdenk/SPARK-8769-toLocalIterator-documentation-improvement and squashes the following commits: 97ddd99 [Holden Karau] Add note
Configuration menu - View commit details
-
Copy full SHA for 15d41cc - Browse repository at this point
Copy the full SHA 15d41ccView commit details -
[SPARK-8740] [PROJECT INFRA] Support GitHub OAuth tokens in dev/merge…
…_spark_pr.py This commit allows `dev/merge_spark_pr.py` to use personal GitHub OAuth tokens in order to make authenticated requests. This is necessary to work around per-IP rate limiting issues. To use a token, just set the `GITHUB_OAUTH_KEY` environment variable. You can create a personal token at https://github.com/settings/tokens; we only require `public_repo` scope. If the script fails due to a rate-limit issue, it now logs a useful message directing the user to the OAuth token instructions. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7136 from JoshRosen/pr-merge-script-oauth-authentication and squashes the following commits: 4d011bd [Josh Rosen] Fix error message 23d92ff [Josh Rosen] Support GitHub OAuth tokens in dev/merge_spark_pr.py
Configuration menu - View commit details
-
Copy full SHA for 377ff4c - Browse repository at this point
Copy the full SHA 377ff4cView commit details -
[SPARK-3071] Increase default driver memory
I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place. Please let me know if I've missed anything. Will the spark-shell use the value within the command line builder during instantiation? Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes apache#7132 from ilganeli/SPARK-3071 and squashes the following commits: 4074164 [Ilya Ganelin] String fix 271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071 273b6e9 [Ilya Ganelin] Test fix fd67721 [Ilya Ganelin] Update JavaUtils.java 26cc177 [Ilya Ganelin] test fix e5db35d [Ilya Ganelin] Fixed test failure 39732a1 [Ilya Ganelin] merge fix a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each 09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala 19b6f25 [Ilya Ganelin] Missed one doc update 2698a3d [Ilya Ganelin] Updated default value for driver memory
Ilya Ganelin authored and Andrew Or committedJul 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 3697232 - Browse repository at this point
Copy the full SHA 3697232View commit details -
[SPARK-8687] [YARN] Fix bug: Executor can't fetch the new set configu…
…ration in yarn-client Spark initi the properties CoarseGrainedSchedulerBackend.start ```scala // TODO (prashant) send conf instead of properties driverEndpoint = rpcEnv.setupEndpoint( CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new DriverEndpoint(rpcEnv, properties)) ``` Then the yarn logic will set some configuration but not update in this `properties`. So `Executor` won't gain the `properties`. [Jira](https://issues.apache.org/jira/browse/SPARK-8687) Author: huangzhaowei <carlmartinmax@gmail.com> Closes apache#7066 from SaintBacchus/SPARK-8687 and squashes the following commits: 1de4f48 [huangzhaowei] Ensure all necessary properties have already been set before startup ExecutorLaucher
Configuration menu - View commit details
-
Copy full SHA for 1b0c8e6 - Browse repository at this point
Copy the full SHA 1b0c8e6View commit details -
[DOCS] Fix minor wrong lambda expression example.
It's a really minor issue but there is an example with wrong lambda-expression usage in `SQLContext.scala` like as follows. ``` sqlContext.udf().register("myUDF", (Integer arg1, String arg2) -> arg2 + arg1), <- We have an extra `)` here. DataTypes.StringType); ``` Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#7187 from sarutak/fix-minor-wrong-lambda-expression and squashes the following commits: a13196d [Kousuke Saruta] Fixed minor wrong lambda expression example.
Configuration menu - View commit details
-
Copy full SHA for 4158836 - Browse repository at this point
Copy the full SHA 4158836View commit details -
[SPARK-8787] [SQL] Changed parameter order of @deprecated in package …
…object sql Parameter order of deprecated annotation in package object sql is wrong >>deprecated("1.3.0", "use DataFrame") . This has to be changed to deprecated("use DataFrame", "1.3.0") Author: Vinod K C <vinod.kc@huawei.com> Closes apache#7183 from vinodkc/fix_deprecated_param_order and squashes the following commits: 1cbdbe8 [Vinod K C] Modified the message 700911c [Vinod K C] Changed order of parameters
Configuration menu - View commit details
-
Copy full SHA for c572e25 - Browse repository at this point
Copy the full SHA c572e25View commit details -
[SPARK-8746] [SQL] update download link for Hive 0.13.1
updated the [Hive 0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) download link in `sql/README.md` Author: Christian Kadner <ckadner@us.ibm.com> Closes apache#7144 from ckadner/SPARK-8746 and squashes the following commits: 65d80f7 [Christian Kadner] [SPARK-8746][SQL] update download link for Hive 0.13.1
Configuration menu - View commit details
-
Copy full SHA for 1bbdf9e - Browse repository at this point
Copy the full SHA 1bbdf9eView commit details -
[SPARK-8690] [SQL] Add a setting to disable SparkSQL parquet schema m…
…erge by using datasource API The detail problem story is in https://issues.apache.org/jira/browse/SPARK-8690 General speaking, I add a config spark.sql.parquet.mergeSchema to achieve the sqlContext.load("parquet" , Map( "path" -> "..." , "mergeSchema" -> "false" )) It will become a simple flag and without any side affect. Author: Wisely Chen <wiselychen@appier.com> Closes apache#7070 from thegiive/SPARK8690 and squashes the following commits: c6f3e86 [Wisely Chen] Refactor some code style and merge the test case to ParquetSchemaMergeConfigSuite 94c9307 [Wisely Chen] Remove some style problem db8ef1b [Wisely Chen] Change config to SQLConf and add test case b6806fb [Wisely Chen] remove text c0edb8c [Wisely Chen] [SPARK-8690] add a config spark.sql.parquet.mergeSchema to disable datasource API schema merge feature.
Configuration menu - View commit details
-
Copy full SHA for 246265f - Browse repository at this point
Copy the full SHA 246265fView commit details -
[SPARK-8647] [MLLIB] Potential issue with constant hashCode
I added the code, // see [SPARK-8647], this achieves the needed constant hash code without constant no. override def hashCode(): Int = this.getClass.getName.hashCode() does getting the constant hash code as per jira Author: Alok Singh <singhal@Aloks-MacBook-Pro.local> Closes apache#7146 from aloknsingh/aloknsingh_SPARK-8647 and squashes the following commits: e58bccf [Alok Singh] [SPARK-8647][MLlib] to avoid the class derivation issues, change the constant hashCode to override def hashCode(): Int = classOf[MatrixUDT].getName.hashCode() 43cdb89 [Alok Singh] [SPARK-8647][MLlib] Potential issue with constant hashCode
Configuration menu - View commit details
-
Copy full SHA for 99c40cd - Browse repository at this point
Copy the full SHA 99c40cdView commit details -
[SPARK-8758] [MLLIB] Add Python user guide for PowerIterationClustering
Add Python user guide for PowerIterationClustering Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#7155 from yanboliang/spark-8758 and squashes the following commits: 18d803b [Yanbo Liang] address comments dd29577 [Yanbo Liang] Add Python user guide for PowerIterationClustering
Configuration menu - View commit details
-
Copy full SHA for 0a468a4 - Browse repository at this point
Copy the full SHA 0a468a4View commit details -
[SPARK-8223] [SPARK-8224] [SQL] shift left and shift right
Jira: https://issues.apache.org/jira/browse/SPARK-8223 https://issues.apache.org/jira/browse/SPARK-8224 ~~I am aware of apache#7174 and will update this pr, if it's merged.~~ Done I don't know if apache#7034 can simplify this, but we can have a look on it, if it gets merged rxin In the Jira ticket the function as no second argument. I added a `numBits` argument that allows to specify the number of bits. I guess this improves the usability. I wanted to add `shiftleft(value)` as well, but the `selectExpr` dataframe tests crashes, if I have both. I order to do this, I added the following to the functions.scala `def shiftRight(e: Column): Column = ShiftRight(e.expr, lit(1).expr)`, but as I mentioned this doesn't pass tests like `df.selectExpr("shiftRight(a)", ...` (not enough arguments exception). If we need the bitwise shift in order to be hive compatible, I suggest to add `shiftLeft` and something like `shiftLeftX` Author: Tarek Auel <tarek.auel@googlemail.com> Closes apache#7178 from tarekauel/8223 and squashes the following commits: 8023bb5 [Tarek Auel] [SPARK-8223][SPARK-8224] fixed test f3f64e6 [Tarek Auel] [SPARK-8223][SPARK-8224] Integer -> Int f628706 [Tarek Auel] [SPARK-8223][SPARK-8224] removed toString; updated function description 3b56f2a [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223 5189690 [Tarek Auel] [SPARK-8223][SPARK-8224] minor fix and style fix 9434a28 [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223 44ee324 [Tarek Auel] [SPARK-8223][SPARK-8224] docu fix ac7fe9d [Tarek Auel] [SPARK-8223][SPARK-8224] right and left bit shift
Configuration menu - View commit details
-
Copy full SHA for 5b33381 - Browse repository at this point
Copy the full SHA 5b33381View commit details -
[SPARK-8747] [SQL] fix EqualNullSafe for binary type
also improve tests for binary comparison. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#7143 from cloud-fan/binary and squashes the following commits: 28a5b76 [Wenchen Fan] improve test 04ef4b0 [Wenchen Fan] fix equalNullSafe
Configuration menu - View commit details
-
Copy full SHA for afa021e - Browse repository at this point
Copy the full SHA afa021eView commit details -
[SPARK-8407] [SQL] complex type constructors: struct and named_struct
This is a follow up of [SPARK-8283](https://issues.apache.org/jira/browse/SPARK-8283) ([PR-6828](apache#6828)), to support both `struct` and `named_struct` in Spark SQL. After [apache#6725](apache#6828), the semantic of [`CreateStruct`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala#L56) methods have changed a little and do not limited to cols of `NamedExpressions`, it will name non-NamedExpression fields following the hive convention, col1, col2 ... This PR would both loosen [`struct`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L723) to take children of `Expression` type and add `named_struct` support. Author: Yijie Shen <henry.yijieshen@gmail.com> Closes apache#6874 from yijieshen/SPARK-8283 and squashes the following commits: 4cd3375 [Yijie Shen] change struct documentation d599d0b [Yijie Shen] rebase code 9a7039e [Yijie Shen] fix reviews and regenerate golden answers b487354 [Yijie Shen] replace assert using checkAnswer f07e114 [Yijie Shen] tiny fix 9613be9 [Yijie Shen] review fix 7fef712 [Yijie Shen] Fix checkInputTypes' implementation using foldable and nullable 60812a7 [Yijie Shen] Fix type check 828d694 [Yijie Shen] remove unnecessary resolved assertion inside dataType method fd3cd8e [Yijie Shen] remove type check from eval 7a71255 [Yijie Shen] tiny fix ccbbd86 [Yijie Shen] Fix reviews 47da332 [Yijie Shen] remove nameStruct API from DataFrame 917e680 [Yijie Shen] Fix reviews 4bd75ad [Yijie Shen] loosen struct method in functions.scala to take Expression children 0acb7be [Yijie Shen] Add CreateNamedStruct in both DataFrame function API and FunctionRegistery
Configuration menu - View commit details
-
Copy full SHA for 52302a8 - Browse repository at this point
Copy the full SHA 52302a8View commit details -
[SPARK-8708] [MLLIB] Paritition ALS ratings based on both users and p…
…roducts JIRA: https://issues.apache.org/jira/browse/SPARK-8708 Previously the partitions of ratings are only based on the given products. So if the `usersProducts` given for prediction contains only few products or even one product, the generated ratings will be pushed into few or single partition and can't use high parallelism. The following codes are the example reported in the JIRA. Because it asks the predictions for users on product 2. There is only one partition in the result. >>> r1 = (1, 1, 1.0) >>> r2 = (1, 2, 2.0) >>> r3 = (2, 1, 2.0) >>> r4 = (2, 2, 2.0) >>> r5 = (3, 1, 1.0) >>> ratings = sc.parallelize([r1, r2, r3, r4, r5], 5) >>> users = ratings.map(itemgetter(0)).distinct() >>> model = ALS.trainImplicit(ratings, 1, seed=10) >>> predictions_for_2 = model.predictAll(users.map(lambda u: (u, 2))) >>> predictions_for_2.glom().map(len).collect() [0, 0, 3, 0, 0] This PR uses user and product instead of only product to partition the ratings. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Liang-Chi Hsieh <viirya@appier.com> Closes apache#7121 from viirya/mfm_fix_partition and squashes the following commits: 779946d [Liang-Chi Hsieh] Calculate approximate numbers of users and products in one pass. 4336dc2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into mfm_fix_partition 83e56c1 [Liang-Chi Hsieh] Instead of additional join, use the numbers of users and products to decide how to perform join. b534dc8 [Liang-Chi Hsieh] Paritition ratings based on both users and products.
Configuration menu - View commit details
-
Copy full SHA for 0e553a3 - Browse repository at this point
Copy the full SHA 0e553a3View commit details -
[SPARK-8581] [SPARK-8584] Simplify checkpointing code + better error …
…message This patch rewrites the old checkpointing code in a way that is easier to understand. It also adds a guard against an invalid specification of checkpoint directory to provide a clearer error message. Most of the changes here are relatively minor. Author: Andrew Or <andrew@databricks.com> Closes apache#6968 from andrewor14/checkpoint-cleanup and squashes the following commits: 4ef8263 [Andrew Or] Use global synchronized instead 6f6fd84 [Andrew Or] Merge branch 'master' of github.com:apache/spark into checkpoint-cleanup b1437ad [Andrew Or] Warn instead of throw 5484293 [Andrew Or] Merge branch 'master' of github.com:apache/spark into checkpoint-cleanup 7fb4af5 [Andrew Or] Guard against bad settings of checkpoint directory 691da98 [Andrew Or] Simplify checkpoint code / code style / comments
Andrew Or committedJul 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 2e2f326 - Browse repository at this point
Copy the full SHA 2e2f326View commit details -
[SPARK-8479] [MLLIB] Add numNonzeros and numActives to linalg.Matrices
Matrices allow zeros to be stored in values. Sometimes a method is handy to check if the numNonZeros are same as number of Active values. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes apache#6904 from MechCoder/nnz_matrix and squashes the following commits: 252c6b7 [MechCoder] Add to MiMa excludes e2390f5 [MechCoder] Use count instead of foreach 2f62b2f [MechCoder] Add to MiMa excludes d6e96ef [MechCoder] [SPARK-8479] Add numNonzeros and numActives to linalg.Matrices
Configuration menu - View commit details
-
Copy full SHA for 34d448d - Browse repository at this point
Copy the full SHA 34d448dView commit details -
[SPARK-8781] Fix variables in published pom.xml are not resolved
The issue is summarized in the JIRA and is caused by this commit: 984ad60. This patch reverts that commit and fixes the maven build in a different way. We limit the dependencies of `KinesisReceiverSuite` to avoid having to deal with the complexities in how maven deals with transitive test dependencies. Author: Andrew Or <andrew@databricks.com> Closes apache#7193 from andrewor14/fix-kinesis-pom and squashes the following commits: ca3d5d4 [Andrew Or] Limit kinesis test dependencies f24e09c [Andrew Or] Revert "[BUILD] Fix Maven build for Kinesis"
Andrew Or committedJul 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 82cf331 - Browse repository at this point
Copy the full SHA 82cf331View commit details -
[SPARK-1564] [DOCS] Added Javascript to Javadocs to create badges for…
… tags like :: Experimental :: Modified copy_api_dirs.rb and created api-javadocs.js and api-javadocs.css files in order to add badges to javadoc files for :: Experimental ::, :: DeveloperApi ::, and :: AlphaComponent :: tags Author: Deron Eriksson <deron@us.ibm.com> Closes apache#7169 from deroneriksson/SPARK-1564_JavaDocs_badges and squashes the following commits: a8353db [Deron Eriksson] added license headers to api-docs.css and api-javadocs.css 07feb07 [Deron Eriksson] added linebreaks to make jquery more readable when adding html badge tags 65b4930 [Deron Eriksson] Modified copy_api_dirs.rb and created api-javadocs.js and api-javadocs.css files in order to add badges to javadoc files for :: Experimental ::, :: DeveloperApi ::, and :: AlphaComponent :: tags
Configuration menu - View commit details
-
Copy full SHA for fcbcba6 - Browse repository at this point
Copy the full SHA fcbcba6View commit details -
[SPARK-7835] Refactor HeartbeatReceiverSuite for coverage + cleanup
The existing test suite has a lot of duplicate code and doesn't even cover the most fundamental feature of the HeartbeatReceiver, which is expiring hosts that have not responded in a while. This introduces manual clocks in `HeartbeatReceiver` and makes it respond to heartbeats only for registered executors. A few internal messages are moved to `receiveAndReply` to increase determinism of the tests so we don't have to rely on flaky constructs like `eventually`. Author: Andrew Or <andrew@databricks.com> Closes apache#7173 from andrewor14/heartbeat-receiver-tests and squashes the following commits: 4a903d6 [Andrew Or] Increase HeartReceiverSuite coverage and clean up
Andrew Or committedJul 2, 2015 Configuration menu - View commit details
-
Copy full SHA for cd20355 - Browse repository at this point
Copy the full SHA cd20355View commit details -
[SPARK-8772][SQL] Implement implicit type cast for expressions that d…
…efine input types. Author: Reynold Xin <rxin@databricks.com> Closes apache#7175 from rxin/implicitCast and squashes the following commits: 88080a2 [Reynold Xin] Clearer definition of implicit type cast. f0ff97f [Reynold Xin] Added missing file. c65e532 [Reynold Xin] [SPARK-8772][SQL] Implement implicit type cast for expressions that defines input types.
Configuration menu - View commit details
-
Copy full SHA for 52508be - Browse repository at this point
Copy the full SHA 52508beView commit details -
[SPARK-3382] [MLLIB] GradientDescent convergence tolerance
GrandientDescent can receive convergence tolerance value. Default value is 0.0. When loss value becomes less than the tolerance which is set by user, iteration is terminated. Author: lewuathe <lewuathe@me.com> Closes apache#3636 from Lewuathe/gd-convergence-tolerance and squashes the following commits: 0b8a9a8 [lewuathe] Update doc ce91b15 [lewuathe] Merge branch 'master' into gd-convergence-tolerance 4f22c2b [lewuathe] Modify based on SPARK-1503 5e47b82 [lewuathe] Merge branch 'master' into gd-convergence-tolerance abadb7e [lewuathe] Fix LassoSuite 8fadebd [lewuathe] Fix failed unit tests ee5de46 [lewuathe] Merge branch 'master' into gd-convergence-tolerance 8313ba2 [lewuathe] Fix styles 0ead94c [lewuathe] Merge branch 'master' into gd-convergence-tolerance a94cfd5 [lewuathe] Modify some styles 3aef0a2 [lewuathe] Modify converged logic to do relative comparison f7b19d5 [lewuathe] [SPARK-3382] Clarify comparison logic e6c9cd2 [lewuathe] [SPARK-3382] Compare with the diff of solution vector 4b125d2 [lewuathe] [SPARK3382] Fix scala style e7c10dd [lewuathe] [SPARK-3382] format improvements f867eea [lewuathe] [SPARK-3382] Modify warning message statements b9d5e61 [lewuathe] [SPARK-3382] should compare diff inside loss history and convergence tolerance 5433f71 [lewuathe] [SPARK-3382] GradientDescent convergence tolerance
Configuration menu - View commit details
-
Copy full SHA for 7d9cc96 - Browse repository at this point
Copy the full SHA 7d9cc96View commit details -
[SPARK-8784] [SQL] Add Python API for hex and unhex
Also improve the performance of hex/unhex Author: Davies Liu <davies@databricks.com> Closes apache#7181 from davies/hex and squashes the following commits: f032fbb [Davies Liu] Merge branch 'hex' of github.com:davies/spark into hex 49e325f [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex b31fc9a [Davies Liu] Update math.scala 25156b7 [Davies Liu] address comments and fix test c3af78c [Davies Liu] address commments 1a24082 [Davies Liu] Add Python API for hex and unhex
Configuration menu - View commit details
-
Copy full SHA for fc7aebd - Browse repository at this point
Copy the full SHA fc7aebdView commit details -
[SPARK-7104] [MLLIB] Support model save/load in Python's Word2Vec
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes apache#6821 from yu-iskw/SPARK-7104 and squashes the following commits: 975136b [Yu ISHIKAWA] Organize import 0ef58b6 [Yu ISHIKAWA] Use rmtree, instead of removedirs cb21653 [Yu ISHIKAWA] Add an explicit type for `Word2VecModelWrapper.save` 1d468ef [Yu ISHIKAWA] [SPARK-7104][MLlib] Support model save/load in Python's Word2Vec
Configuration menu - View commit details
-
Copy full SHA for 488bad3 - Browse repository at this point
Copy the full SHA 488bad3View commit details -
Revert "[SPARK-8784] [SQL] Add Python API for hex and unhex"
This reverts commit fc7aebd.
Configuration menu - View commit details
-
Copy full SHA for e589e71 - Browse repository at this point
Copy the full SHA e589e71View commit details
Commits on Jul 3, 2015
-
[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL
This fixes code generation for queries containing `ORDER BY NULL`. Previously, the generated code would fail to compile. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7179 from JoshRosen/generate-order-fixes and squashes the following commits: 6ef49a6 [Josh Rosen] Fix ORDER BY NULL 0036696 [Josh Rosen] Add regression test for SPARK-8782 (ORDER BY NULL)
Configuration menu - View commit details
-
Copy full SHA for d983819 - Browse repository at this point
Copy the full SHA d983819View commit details -
[SPARK-6980] [CORE] Akka timeout exceptions indicate which conf contr…
…ols them (RPC Layer) Latest changes after refactoring to the RPC layer. I rebased against trunk to make sure to get any recent changes since it had been a while. I wasn't crazy about the name `ConfigureTimeout` and `RpcTimeout` seemed to fit better, but I'm open to suggestions! I ran most of the tests and they pass, but others would get stuck with "WARN TaskSchedulerImpl: Initial job has not accepted any resources". I think its just my machine, so I'd though I would push what I have anyway. Still left to do: * I only added a couple unit tests so far, there are probably some more cases to test * Make sure all uses require a `RpcTimeout` * Right now, both the `ask` and `Await.result` use the same timeout, should we differentiate between these in the TimeoutException message? * I wrapped `Await.result` in `RpcTimeout`, should we also wrap `Await.ready`? * Proper scoping of classes and methods hardmettle, feel free to help out with any of these! Author: Bryan Cutler <bjcutler@us.ibm.com> Author: Harsh Gupta <harsh@Harshs-MacBook-Pro.local> Author: BryanCutler <cutlerb@gmail.com> Closes apache#6205 from BryanCutler/configTimeout-6980 and squashes the following commits: 46c8d48 [Bryan Cutler] [SPARK-6980] Changed RpcEnvSuite test to never reply instead of just sleeping, to avoid possible sync issues 06afa53 [Bryan Cutler] [SPARK-6980] RpcTimeout class extends Serializable, was causing error in MasterSuite 7bb70f1 [Bryan Cutler] Merge branch 'master' into configTimeout-6980 dbd5f73 [Bryan Cutler] [SPARK-6980] Changed RpcUtils askRpcTimeout and lookupRpcTimeout scope to private[spark] and improved deprecation warning msg 4e89c75 [Bryan Cutler] [SPARK-6980] Missed one usage of deprecated RpcUtils.askTimeout in YarnSchedulerBackend although it is not being used, and fixed SparkConfSuite UT to not use deprecated RpcUtils functions 6a1c50d [Bryan Cutler] [SPARK-6980] Minor cleanup of test case 7f4d78e [Bryan Cutler] [SPARK-6980] Fixed scala style checks 287059a [Bryan Cutler] [SPARK-6980] Removed extra import in AkkaRpcEnvSuite 3d8b1ff [Bryan Cutler] [SPARK-6980] Cleaned up imports in AkkaRpcEnvSuite 3a168c7 [Bryan Cutler] [SPARK-6980] Rewrote Akka RpcTimeout UTs in RpcEnvSuite 7636189 [Bryan Cutler] [SPARK-6980] Fixed call to askWithReply in DAGScheduler to use RpcTimeout - this was being compiled by auto-tupling and changing the message type of BlockManagerHeartbeat be11c4e [Bryan Cutler] Merge branch 'master' into configTimeout-6980 039afed [Bryan Cutler] [SPARK-6980] Corrected import organization 218aa50 [Bryan Cutler] [SPARK-6980] Corrected issues from feedback fadaf6f [Bryan Cutler] [SPARK-6980] Put back in deprecated RpcUtils askTimeout and lookupTimout to fix MiMa errors fa6ed82 [Bryan Cutler] [SPARK-6980] Had to increase timeout on positive test case because a processor slowdown could trigger an Future TimeoutException b05d449 [Bryan Cutler] [SPARK-6980] Changed constructor to use val duration instead of getter function, changed name of string property from conf to timeoutProp for consistency c6cfd33 [Bryan Cutler] [SPARK-6980] Changed UT ask message timeout to explicitly intercept a SparkException 1394de6 [Bryan Cutler] [SPARK-6980] Moved MessagePrefix to createRpcTimeoutException directly 1517721 [Bryan Cutler] [SPARK-6980] RpcTimeout object scope should be private[spark] 2206b4d [Bryan Cutler] [SPARK-6980] Added unit test for ask then immediat awaitReply 1b9beab [Bryan Cutler] [SPARK-6980] Cleaned up import ordering 08f5afc [Bryan Cutler] [SPARK-6980] Added UT for constructing RpcTimeout with default value d3754d1 [Bryan Cutler] [SPARK-6980] Added akkaConf to prevent dead letter logging 995d196 [Bryan Cutler] [SPARK-6980] Cleaned up import ordering, comments, spacing from PR feedback 7774d56 [Bryan Cutler] [SPARK-6980] Cleaned up UT imports 4351c48 [Bryan Cutler] [SPARK-6980] Added UT for addMessageIfTimeout, cleaned up UTs 1607a5f [Bryan Cutler] [SPARK-6980] Changed addMessageIfTimeout to PartialFunction, cleanup from PR comments 2f94095 [Bryan Cutler] [SPARK-6980] Added addMessageIfTimeout for when a Future is completed with TimeoutException 235919b [Bryan Cutler] [SPARK-6980] Resolved conflicts after master merge c07d05c [Bryan Cutler] Merge branch 'master' into configTimeout-6980-tmp b7fb99f [BryanCutler] Merge pull request #2 from hardmettle/configTimeoutUpdates_6980 4be3a8d [Harsh Gupta] Modifying loop condition to find property match 0ee5642 [Harsh Gupta] Changing the loop condition to halt at the first match in the property list for RpcEnv exception catch f74064d [Harsh Gupta] Retrieving properties from property list using iterator and while loop instead of chained functions a294569 [Bryan Cutler] [SPARK-6980] Added creation of RpcTimeout with Seq of property keys 23d2f26 [Bryan Cutler] [SPARK-6980] Fixed await result not being handled by RpcTimeout 49f9f04 [Bryan Cutler] [SPARK-6980] Minor cleanup and scala style fix 5b59a44 [Bryan Cutler] [SPARK-6980] Added some RpcTimeout unit tests 78a2c0a [Bryan Cutler] [SPARK-6980] Using RpcTimeout.awaitResult for future in AppClient now 97523e0 [Bryan Cutler] [SPARK-6980] Akka ask timeout description refactored to RPC layer
Configuration menu - View commit details
-
Copy full SHA for aa7bbc1 - Browse repository at this point
Copy the full SHA aa7bbc1View commit details -
[SPARK-8213][SQL]Add function factorial
Author: zhichao.li <zhichao.li@intel.com> Closes apache#6822 from zhichao-li/factorial and squashes the following commits: 26edf4f [zhichao.li] add factorial
Configuration menu - View commit details
-
Copy full SHA for 1a7a7d7 - Browse repository at this point
Copy the full SHA 1a7a7d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for dfd8bac - Browse repository at this point
Copy the full SHA dfd8bacView commit details -
[SPARK-8501] [SQL] Avoids reading schema from empty ORC files
ORC writes empty schema (`struct<>`) to ORC files containing zero rows. This is OK for Hive since the table schema is managed by the metastore. But it causes trouble when reading raw ORC files via Spark SQL since we have to discover the schema from the files. Notice that the ORC data source always avoids writing empty ORC files, but it's still problematic when reading Hive tables which contain empty part-files. Author: Cheng Lian <lian@databricks.com> Closes apache#7199 from liancheng/spark-8501 and squashes the following commits: bb8cd95 [Cheng Lian] Addresses comments a290221 [Cheng Lian] Avoids reading schema from empty ORC files
Configuration menu - View commit details
-
Copy full SHA for 20a4d7d - Browse repository at this point
Copy the full SHA 20a4d7dView commit details -
[SPARK-8801][SQL] Support TypeCollection in ExpectsInputTypes
This patch adds a new TypeCollection AbstractDataType that can be used by expressions to specify more than one expected input types. Author: Reynold Xin <rxin@databricks.com> Closes apache#7202 from rxin/type-collection and squashes the following commits: c714ca1 [Reynold Xin] Fixed style. a0c0d12 [Reynold Xin] Fixed bugs and unit tests. d8b8ae7 [Reynold Xin] Added TypeCollection.
Configuration menu - View commit details
-
Copy full SHA for a59d14f - Browse repository at this point
Copy the full SHA a59d14fView commit details -
[SPARK-8776] Increase the default MaxPermSize
I am increasing the perm gen size to 256m. https://issues.apache.org/jira/browse/SPARK-8776 Author: Yin Huai <yhuai@databricks.com> Closes apache#7196 from yhuai/SPARK-8776 and squashes the following commits: 60901b4 [Yin Huai] Fix test. d44b713 [Yin Huai] Make sparkShell and hiveConsole use 256m PermGen size. 30aaf8e [Yin Huai] Increase the default PermGen size to 256m.
Configuration menu - View commit details
-
Copy full SHA for f743c79 - Browse repository at this point
Copy the full SHA f743c79View commit details -
[SPARK-8803] handle special characters in elements in crosstab
cc rxin Having back ticks or null as elements causes problems. Since elements become column names, we have to drop them from the element as back ticks are special characters. Having null throws exceptions, we could replace them with empty strings. Handling back ticks should be improved for 1.5 Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#7201 from brkyvz/weird-ct-elements and squashes the following commits: e06b840 [Burak Yavuz] fix scalastyle 93a0d3f [Burak Yavuz] added tests for NaN and Infinity 9dba6ce [Burak Yavuz] address cr1 db71dbd [Burak Yavuz] handle special characters in elements in crosstab
Configuration menu - View commit details
-
Copy full SHA for 9b23e92 - Browse repository at this point
Copy the full SHA 9b23e92View commit details -
[SPARK-8809][SQL] Remove ConvertNaNs analyzer rule.
"NaN" from string to double is already handled by Cast expression itself. Author: Reynold Xin <rxin@databricks.com> Closes apache#7206 from rxin/convertnans and squashes the following commits: 3d99c33 [Reynold Xin] [SPARK-8809][SQL] Remove ConvertNaNs analyzer rule.
Configuration menu - View commit details
-
Copy full SHA for 2848f4d - Browse repository at this point
Copy the full SHA 2848f4dView commit details -
[SPARK-8226] [SQL] Add function shiftrightunsigned
Author: zhichao.li <zhichao.li@intel.com> Closes apache#7035 from zhichao-li/shiftRightUnsigned and squashes the following commits: 6bcca5a [zhichao.li] change coding style 3e9f5ae [zhichao.li] python style d85ae0b [zhichao.li] add shiftrightunsigned
Configuration menu - View commit details
-
Copy full SHA for ab535b9 - Browse repository at this point
Copy the full SHA ab535b9View commit details -
[SPARK-7401] [MLLIB] [PYSPARK] Vectorize dot product and sq_dist betw…
…een SparseVector and DenseVector Currently we iterate over indices which can be vectorized. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes apache#5946 from MechCoder/spark-7203 and squashes the following commits: 034d086 [MechCoder] Vectorize dot calculation for numpy arrays for ndim=2 bce2b07 [MechCoder] fix doctest fcad0a3 [MechCoder] Remove type checks for list, pyarray etc 0ee5dd4 [MechCoder] Add tests and other isinstance changes e5f1de0 [MechCoder] [SPARK-7401] Vectorize dot product and sq_dist
Configuration menu - View commit details
-
Copy full SHA for f0fac2a - Browse repository at this point
Copy the full SHA f0fac2aView commit details
Commits on Jul 4, 2015
-
[SPARK-8810] [SQL] Added several UDF unit tests for Spark SQL
One test for each of the GROUP BY, WHERE and HAVING clauses, and one that combines all three with an additional UDF in the SELECT. (Since this is my first attempt at contributing to SPARK, meta-level guidance on anything I've screwed up would be greatly appreciated, whether important or minor.) Author: Spiro Michaylov <spiro@michaylov.com> Closes apache#7207 from spirom/udf-test-branch and squashes the following commits: 6bbba9e [Spiro Michaylov] Responded to review comments on UDF unit tests 1a3c5ff [Spiro Michaylov] Added several UDF unit tests for Spark SQL
Configuration menu - View commit details
-
Copy full SHA for e92c24d - Browse repository at this point
Copy the full SHA e92c24dView commit details -
[SPARK-8572] [SQL] Type coercion for ScalaUDFs
Implemented type coercion for udf arguments in Scala. The changes include- * Add `with ExpectsInputTypes ` to `ScalaUDF` class. * Pass down argument types info from `UDFRegistration` and `functions`. With this patch, the example query in [SPARK-8572](https://issues.apache.org/jira/browse/SPARK-8572) no longer throws a type cast error at runtime. Also added a unit test to `UDFSuite` in which a decimal type is passed to a udf that expects an int. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes apache#7203 from piaozhexiu/SPARK-8572 and squashes the following commits: 2d0ed15 [Cheolsoo Park] Incorporate comments dce1efd [Cheolsoo Park] Fix unit tests and update the codegen script 066deed [Cheolsoo Park] Type coercion for udf inputs
Configuration menu - View commit details
-
Copy full SHA for 4a22bce - Browse repository at this point
Copy the full SHA 4a22bceView commit details -
[SPARK-8192] [SPARK-8193] [SQL] udf current_date, current_timestamp
Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes apache#6985 from adrian-wang/udfcurrent and squashes the following commits: 6a20b64 [Daoyuan Wang] remove codegen and add lazy in testsuite 27c9f95 [Daoyuan Wang] refine tests.. e11ae75 [Daoyuan Wang] refine tests 61ed3d5 [Daoyuan Wang] add in functions 98e8550 [Daoyuan Wang] fix sytle 427d9dc [Daoyuan Wang] add tests and codegen 0b69a1f [Daoyuan Wang] udf current
Configuration menu - View commit details
-
Copy full SHA for 9fb6b83 - Browse repository at this point
Copy the full SHA 9fb6b83View commit details -
[SPARK-8777] [SQL] Add random data generator test utilities to Spark SQL
This commit adds a set of random data generation utilities to Spark SQL, for use in its own unit tests. - `RandomDataGenerator.forType(DataType)` returns an `Option[() => Any]` that, if defined, contains a function for generating random values for the given DataType. The random values use the external representations for the given DataType (for example, for DateType we return `java.sql.Date` instances instead of longs). - `DateTypeTestUtilities` defines some convenience fields for looping over instances of data types. For example, `numericTypes` holds `DataType` instances for all supported numeric types. These constants will help us to raise the level of abstraction in our tests. For example, it's now very easy to write a test which is parameterized by all common data types. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#7176 from JoshRosen/sql-random-data-generators and squashes the following commits: f71634d [Josh Rosen] Roll back ScalaCheck usage e0d7d49 [Josh Rosen] Bump ScalaCheck version in LICENSE 89d86b1 [Josh Rosen] Bump ScalaCheck version. 0c20905 [Josh Rosen] Initial attempt at using ScalaCheck. b55875a [Josh Rosen] Generate doubles and floats over entire possible range. 5acdd5c [Josh Rosen] Infinity and NaN are interesting. ab76cbd [Josh Rosen] Move code to Catalyst package. d2b4a4a [Josh Rosen] Add random data generator test utilities to Spark SQL.
Configuration menu - View commit details
-
Copy full SHA for f32487b - Browse repository at this point
Copy the full SHA f32487bView commit details -
[SPARK-8238][SPARK-8239][SPARK-8242][SPARK-8243][SPARK-8268][SQL]Add …
…ascii/base64/unbase64/encode/decode functions Add `ascii`,`base64`,`unbase64`,`encode` and `decode` expressions. Author: Cheng Hao <hao.cheng@intel.com> Closes apache#6843 from chenghao-intel/str_funcs2 and squashes the following commits: 78dee7d [Cheng Hao] base 64 -> base64 9d6f9f4 [Cheng Hao] remove the toString method for expressions ed5c19c [Cheng Hao] update code as comments 96170fc [Cheng Hao] scalastyle issues e2df768 [Cheng Hao] remove the unused import 491ce7b [Cheng Hao] add ascii/base64/unbase64/encode/decode functions
Configuration menu - View commit details
-
Copy full SHA for f35b0c3 - Browse repository at this point
Copy the full SHA f35b0c3View commit details -
[SPARK-8270][SQL] levenshtein distance
Jira: https://issues.apache.org/jira/browse/SPARK-8270 Info: I can not build the latest master, it stucks during the build process: `[INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml` Author: Tarek Auel <tarek.auel@googlemail.com> Closes apache#7214 from tarekauel/SPARK-8270 and squashes the following commits: ab348b9 [Tarek Auel] Merge branch 'master' into SPARK-8270 a2ad318 [Tarek Auel] [SPARK-8270] changed order of fields d91b12c [Tarek Auel] [SPARK-8270] python fix adbd075 [Tarek Auel] [SPARK-8270] fixed typo 23185c9 [Tarek Auel] [SPARK-8270] levenshtein distance
Configuration menu - View commit details
-
Copy full SHA for 6b3574e - Browse repository at this point
Copy the full SHA 6b3574eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 48f7aed - Browse repository at this point
Copy the full SHA 48f7aedView commit details -
[SQL] More unit tests for implicit type cast & add simpleString to Ab…
…stractDataType. Author: Reynold Xin <rxin@databricks.com> Closes apache#7221 from rxin/implicit-cast-tests and squashes the following commits: 64b13bd [Reynold Xin] Fixed a bug .. 489b732 [Reynold Xin] [SQL] More unit tests for implicit type cast & add simpleString to AbstractDataType.
Configuration menu - View commit details
-
Copy full SHA for 347cab8 - Browse repository at this point
Copy the full SHA 347cab8View commit details -
[SPARK-8822][SQL] clean up type checking in math.scala.
Author: Reynold Xin <rxin@databricks.com> Closes apache#7220 from rxin/SPARK-8822 and squashes the following commits: 0cda076 [Reynold Xin] Test cases. 22d0463 [Reynold Xin] Fixed type precedence. beb2a97 [Reynold Xin] [SPARK-8822][SQL] clean up type checking in math.scala.
Configuration menu - View commit details
-
Copy full SHA for c991ef5 - Browse repository at this point
Copy the full SHA c991ef5View commit details
Commits on Jul 5, 2015
-
[MINOR] [SQL] Minor fix for CatalystSchemaConverter
ping liancheng Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#7224 from viirya/few_fix_catalystschema and squashes the following commits: d994330 [Liang-Chi Hsieh] Minor fix for CatalystSchemaConverter.
Configuration menu - View commit details
-
Copy full SHA for 2b820f2 - Browse repository at this point
Copy the full SHA 2b820f2View commit details -
[SPARK-7137] [ML] Update SchemaUtils checkInputColumn to print more i…
…nfo if needed Author: Joshi <rekhajoshm@gmail.com> Author: Rekha Joshi <rekhajoshm@gmail.com> Closes apache#5992 from rekhajoshm/fix/SPARK-7137 and squashes the following commits: 8c42b57 [Joshi] update checkInputColumn to print more info if needed 33ddd2e [Joshi] update checkInputColumn to print more info if needed acf3e17 [Joshi] update checkInputColumn to print more info if needed 8993c0e [Joshi] SPARK-7137: Add checkInputColumn back to Params and print more info e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
Configuration menu - View commit details
-
Copy full SHA for f9c448d - Browse repository at this point
Copy the full SHA f9c448dView commit details
Commits on Jul 6, 2015
-
[SPARK-8549] [SPARKR] Fix the line length of SparkR
[[SPARK-8549] Fix the line length of SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8549) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes apache#7204 from yu-iskw/SPARK-8549 and squashes the following commits: 6fb131a [Yu ISHIKAWA] Fix the typo 1737598 [Yu ISHIKAWA] [SPARK-8549][SparkR] Fix the line length of SparkR
Configuration menu - View commit details
-
Copy full SHA for a0cb111 - Browse repository at this point
Copy the full SHA a0cb111View commit details -
[SQL][Minor] Update the DataFrame API for encode/decode
This is a the follow up of apache#6843. Author: Cheng Hao <hao.cheng@intel.com> Closes apache#7230 from chenghao-intel/str_funcs2_followup and squashes the following commits: 52cc553 [Cheng Hao] update the code as comment
Configuration menu - View commit details
-
Copy full SHA for 6d0411b - Browse repository at this point
Copy the full SHA 6d0411bView commit details -
[SPARK-8831][SQL] Support AbstractDataType in TypeCollection.
Otherwise it is impossible to declare an expression supporting DecimalType. Author: Reynold Xin <rxin@databricks.com> Closes apache#7232 from rxin/typecollection-adt and squashes the following commits: 934d3d1 [Reynold Xin] [SPARK-8831][SQL] Support AbstractDataType in TypeCollection.
Configuration menu - View commit details
-
Copy full SHA for 86768b7 - Browse repository at this point
Copy the full SHA 86768b7View commit details -
[SPARK-8841] [SQL] Fix partition pruning percentage log message
When pruning partitions for a query plan, a message is logged indicating what how many partitions were selected based on predicate criteria, and what percent were pruned. The current release erroneously uses `1 - total/selected` to compute this quantity, leading to nonsense messages like "pruned -1000% partitions". The fix is simple and obvious. Author: Steve Lindemann <steve.lindemann@engineersgatelp.com> Closes apache#7227 from srlindemann/master and squashes the following commits: c788061 [Steve Lindemann] fix percentPruned log message
Configuration menu - View commit details
-
Copy full SHA for 39e4e7e - Browse repository at this point
Copy the full SHA 39e4e7eView commit details -
[SPARK-8124] [SPARKR] Created more examples on SparkR DataFrames
Here are more examples on SparkR DataFrames including creating a Spark Contect and a SQL context, loading data and simple data manipulation. Author: Daniel Emaasit (PhD Student) <daniel.emaasit@gmail.com> Closes apache#6668 from Emaasit/dan-dev and squashes the following commits: 3a97867 [Daniel Emaasit (PhD Student)] Used fewer rows for createDataFrame f7227f9 [Daniel Emaasit (PhD Student)] Using command line arguments a550f70 [Daniel Emaasit (PhD Student)] Used base R functions 33f9882 [Daniel Emaasit (PhD Student)] Renamed file b6603e3 [Daniel Emaasit (PhD Student)] changed "Describe" function to "describe" 90565dd [Daniel Emaasit (PhD Student)] Deleted the getting-started file b95a103 [Daniel Emaasit (PhD Student)] Deleted this file cc55cd8 [Daniel Emaasit (PhD Student)] combined all the code into one .R file c6933af [Daniel Emaasit (PhD Student)] changed variable name to SQLContext 8e0fe14 [Daniel Emaasit (PhD Student)] provided two options for creating DataFrames 2653573 [Daniel Emaasit (PhD Student)] Updates to a comment and variable name 275b787 [Daniel Emaasit (PhD Student)] Added the Apache License at the top of the file 2e8f724 [Daniel Emaasit (PhD Student)] Added the Apache License at the top of the file 486f44e [Daniel Emaasit (PhD Student)] Added the Apache License at the file d705112 [Daniel Emaasit (PhD Student)] Created more examples on SparkR DataFrames
Configuration menu - View commit details
-
Copy full SHA for 293225e - Browse repository at this point
Copy the full SHA 293225eView commit details -
[SPARK-8837][SPARK-7114][SQL] support using keyword in column name
Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#7237 from cloud-fan/parser and squashes the following commits: e7b49bb [Wenchen Fan] support using keyword in column name
Configuration menu - View commit details
-
Copy full SHA for 0e19464 - Browse repository at this point
Copy the full SHA 0e19464View commit details -
Small update in the readme file
Just change the attribute from -PsparkR to -Psparkr Author: Dirceu Semighini Filho <dirceu.semighini@gmail.com> Closes apache#7242 from dirceusemighini/patch-1 and squashes the following commits: fad5991 [Dirceu Semighini Filho] Small update in the readme file
Configuration menu - View commit details
-
Copy full SHA for 57c72fc - Browse repository at this point
Copy the full SHA 57c72fcView commit details -
[SPARK-8784] [SQL] Add Python API for hex and unhex
Add Python API for hex/unhex, also cleanup Hex/Unhex Author: Davies Liu <davies@databricks.com> Closes apache#7223 from davies/hex and squashes the following commits: 6f1249d [Davies Liu] no explicit rule to cast string into binary 711a6ed [Davies Liu] fix test f9fe5a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex f032fbb [Davies Liu] Merge branch 'hex' of github.com:davies/spark into hex 49e325f [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex b31fc9a [Davies Liu] Update math.scala 25156b7 [Davies Liu] address comments and fix test c3af78c [Davies Liu] address commments 1a24082 [Davies Liu] Add Python API for hex and unhex
Configuration menu - View commit details
-
Copy full SHA for 37e4d92 - Browse repository at this point
Copy the full SHA 37e4d92View commit details -
[SPARK-4485] [SQL] 1) Add broadcast hash outer join, (2) Fix SparkPla…
…nTest This pull request (1) extracts common functions used by hash outer joins and put it in interface HashOuterJoin (2) adds ShuffledHashOuterJoin and BroadcastHashOuterJoin (3) adds test cases for shuffled and broadcast hash outer join (3) makes SparkPlanTest to support binary or more complex operators, and fixes bugs in plan composition in SparkPlanTest Author: kai <kaizeng@eecs.berkeley.edu> Closes apache#7162 from kai-zeng/outer and squashes the following commits: 3742359 [kai] Fix not-serializable exception for code-generated keys in broadcasted relations 14e4bf8 [kai] Use CanBroadcast in broadcast outer join planning dc5127e [kai] code style fixes b5a4efa [kai] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest
Configuration menu - View commit details
-
Copy full SHA for 2471c0b - Browse repository at this point
Copy the full SHA 2471c0bView commit details -
[MINOR] [SQL] remove unused code in Exchange
Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes apache#7234 from adrian-wang/exchangeclean and squashes the following commits: b093ec9 [Daoyuan Wang] remove unused code
Configuration menu - View commit details
-
Copy full SHA for 132e7fc - Browse repository at this point
Copy the full SHA 132e7fcView commit details -
[SPARK-8656] [WEBUI] Fix the webUI and JSON API number is not synced
Spark standalone master web UI show "Alive Workers" total core, total used cores and "Alive workers" total memory, memory used. But the JSON API page "http://MASTERURL:8088/json" shows "ALL workers" core, memory number. This webUI data is not sync with the JSON API. The proper way is to sync the number with webUI and JSON API. Author: Wisely Chen <wiselychen@appier.com> Closes apache#7038 from thegiive/SPARK-8656 and squashes the following commits: 9e54bf0 [Wisely Chen] Change variable name to camel case 2c8ea89 [Wisely Chen] Change some styling and add local variable 431d2b0 [Wisely Chen] Worker List should contain DEAD node also 8b3b8e8 [Wisely Chen] [SPARK-8656] Fix the webUI and JSON API number is not synced
Wisely Chen authored and Andrew Or committedJul 6, 2015 Configuration menu - View commit details
-
Copy full SHA for 9ff2033 - Browse repository at this point
Copy the full SHA 9ff2033View commit details -
[SPARK-6707] [CORE] [MESOS] Mesos Scheduler should allow the user to …
…specify constraints based on slave attributes Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources when trying to determine the usablility of a resource offer from a mesos slave node. It may be preferable for the user to be able to ensure that the spark jobs are only started on a certain set of nodes (based on attributes). For example, If the user sets a property, let's say `spark.mesos.constraints` is set to `tachyon=true;us-east-1=false`, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors. Author: Ankur Chauhan <achauhan@brightcove.com> Closes apache#5563 from ankurcha/mesos_attribs and squashes the following commits: 902535b [Ankur Chauhan] Fix line length d83801c [Ankur Chauhan] Update code as per code review comments 8b73f2d [Ankur Chauhan] Fix imports c3523e7 [Ankur Chauhan] Added docs 1a24d0b [Ankur Chauhan] Expand scope of attributes matching to include all data types 482fd71 [Ankur Chauhan] Update access modifier to private[this] for offer constraints 5ccc32d [Ankur Chauhan] Fix nit pick whitespace 1bce782 [Ankur Chauhan] Fix nit pick whitespace c0cbc75 [Ankur Chauhan] Use offer id value for debug message 7fee0ea [Ankur Chauhan] Add debug statements fc7eb5b [Ankur Chauhan] Fix import codestyle 00be252 [Ankur Chauhan] Style changes as per code review comments 662535f [Ankur Chauhan] Incorporate code review comments + use SparkFunSuite fdc0937 [Ankur Chauhan] Decline offers that did not meet criteria 67b58a0 [Ankur Chauhan] Add documentation for spark.mesos.constraints 63f53f4 [Ankur Chauhan] Update codestyle - uniform style for config values 02031e4 [Ankur Chauhan] Fix scalastyle warnings in tests c09ed84 [Ankur Chauhan] Fixed the access modifier on offerConstraints val to private[mesos] 0c64df6 [Ankur Chauhan] Rename overhead fractions to memory_*, fix spacing 8cc1e8f [Ankur Chauhan] Make exception message more explicit about the source of the error addedba [Ankur Chauhan] Added test case for malformed constraint string ec9d9a6 [Ankur Chauhan] Add tests for parse constraint string 72fe88a [Ankur Chauhan] Fix up tests + remove redundant method override, combine utility class into new mesos scheduler util trait 92b47fd [Ankur Chauhan] Add attributes based constraints support to MesosScheduler
Ankur Chauhan authored and Andrew Or committedJul 6, 2015 Configuration menu - View commit details
-
Copy full SHA for 1165b17 - Browse repository at this point
Copy the full SHA 1165b17View commit details -
Revert "[SPARK-7212] [MLLIB] Add sequence learning flag"
This reverts commit 25f574e. After speaking to some users and developers, we realized that FP-growth doesn't meet the requirement for frequent sequence mining. PrefixSpan (SPARK-6487) would be the correct algorithm for it. feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes apache#7240 from mengxr/SPARK-7212.revert and squashes the following commits: 2b3d66b [Xiangrui Meng] Revert "[SPARK-7212] [MLLIB] Add sequence learning flag"
Configuration menu - View commit details
-
Copy full SHA for 96c5eee - Browse repository at this point
Copy the full SHA 96c5eeeView commit details -
Configuration menu - View commit details
-
Copy full SHA for ee232db - Browse repository at this point
Copy the full SHA ee232dbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 93e3d4e - Browse repository at this point
Copy the full SHA 93e3d4eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6984bf4 - Browse repository at this point
Copy the full SHA 6984bf4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7f812fd - Browse repository at this point
Copy the full SHA 7f812fdView commit details -
Configuration menu - View commit details
-
Copy full SHA for af61f2e - Browse repository at this point
Copy the full SHA af61f2eView commit details -
Configuration menu - View commit details
-
Copy full SHA for fdb2ae4 - Browse repository at this point
Copy the full SHA fdb2ae4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7114a47 - Browse repository at this point
Copy the full SHA 7114a47View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2844a8e - Browse repository at this point
Copy the full SHA 2844a8eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 92ed7a6 - Browse repository at this point
Copy the full SHA 92ed7a6View commit details -
Configuration menu - View commit details
-
Copy full SHA for feb1129 - Browse repository at this point
Copy the full SHA feb1129View commit details