-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Master #8767
Master #8767
Commits on Aug 24, 2015
-
[SPARK-9791] [PACKAGE] Change private class to private class to preve…
…nt unnecessary classes from showing up in the docs In addition, some random cleanup of import ordering Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8387 from tdas/SPARK-9791 and squashes the following commits: 67f3ee9 [Tathagata Das] Change private class to private[package] class to prevent them from showing up in the docs
Configuration menu - View commit details
-
Copy full SHA for 7478c8b - Browse repository at this point
Copy the full SHA 7478c8bView commit details -
[SPARK-7710] [SPARK-7998] [DOCS] Docs for DataFrameStatFunctions
This PR contains examples on how to use some of the Stat Functions available for DataFrames under `df.stat`. rxin Author: Burak Yavuz <brkyvz@gmail.com> Closes #8378 from brkyvz/update-sql-docs.
Configuration menu - View commit details
-
Copy full SHA for 9ce0c7a - Browse repository at this point
Copy the full SHA 9ce0c7aView commit details -
[SPARK-10144] [UI] Actually show peak execution memory by default
The peak execution memory metric was introduced in SPARK-8735. That was before Tungsten was enabled by default, so it assumed that `spark.sql.unsafe.enabled` must be explicitly set to true. The result is that the memory is not displayed by default. Author: Andrew Or <andrew@databricks.com> Closes #8345 from andrewor14/show-memory-default.
Configuration menu - View commit details
-
Copy full SHA for 662bb96 - Browse repository at this point
Copy the full SHA 662bb96View commit details -
[SPARK-8580] [SQL] Refactors ParquetHiveCompatibilitySuite and adds m…
…ore test cases This PR refactors `ParquetHiveCompatibilitySuite` so that it's easier to add new test cases. Hit two bugs, SPARK-10177 and HIVE-11625, while working on this, added test cases for them and marked as ignored for now. SPARK-10177 will be addressed in a separate PR. Author: Cheng Lian <lian@databricks.com> Closes #8392 from liancheng/spark-8580/parquet-hive-compat-tests.
Configuration menu - View commit details
-
Copy full SHA for a2f4cdc - Browse repository at this point
Copy the full SHA a2f4cdcView commit details -
[SPARK-9758] [TEST] [SQL] Compilation issue for hive test / wrong pac…
…kage? Move `test.org.apache.spark.sql.hive` package tests to apparent intended `org.apache.spark.sql.hive` as they don't intend to test behavior from outside org.apache.spark.* Alternate take, per discussion at #8051 I think this is what vanzin and I had in mind but also CC rxin to cross-check, as this does indeed depend on whether these tests were accidentally in this package or not. Testing from a `test.org.apache.spark` package is legitimate but didn't seem to be the intent here. Author: Sean Owen <sowen@cloudera.com> Closes #8307 from srowen/SPARK-9758.
Configuration menu - View commit details
-
Copy full SHA for cb2d2e1 - Browse repository at this point
Copy the full SHA cb2d2e1View commit details -
[SPARK-10061] [DOC] ML ensemble docs
User guide for spark.ml GBTs and Random Forests. The examples are copied from the decision tree guide and modified to run. I caught some issues I had somehow missed in the tree guide as well. I have run all examples, including Java ones. (Of course, I thought I had previously as well...) CC: mengxr manishamde yanboliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8369 from jkbradley/ml-ensemble-docs.
Configuration menu - View commit details
-
Copy full SHA for 13db11c - Browse repository at this point
Copy the full SHA 13db11cView commit details -
[SPARK-10190] Fix NPE in CatalystTypeConverters Decimal toScala conve…
…rter This adds a missing null check to the Decimal `toScala` converter in `CatalystTypeConverters`, fixing an NPE. Author: Josh Rosen <joshrosen@databricks.com> Closes #8401 from JoshRosen/SPARK-10190.
Configuration menu - View commit details
-
Copy full SHA for d7b4c09 - Browse repository at this point
Copy the full SHA d7b4c09View commit details
Commits on Aug 25, 2015
-
[SPARK-10165] [SQL] Await child resolution in ResolveFunctions
Currently, we eagerly attempt to resolve functions, even before their children are resolved. However, this is not valid in cases where we need to know the types of the input arguments (i.e. when resolving Hive UDFs). As a fix, this PR delays function resolution until the functions children are resolved. This change also necessitates a change to the way we resolve aggregate expressions that are not in aggregate operators (e.g., in `HAVING` or `ORDER BY` clauses). Specifically, we can't assume that these misplaced functions will be resolved, allowing us to differentiate aggregate functions from normal functions. To compensate for this change we now attempt to resolve these unresolved expressions in the context of the aggregate operator, before checking to see if any aggregate expressions are present. Author: Michael Armbrust <michael@databricks.com> Closes #8371 from marmbrus/hiveUDFResolution.
Configuration menu - View commit details
-
Copy full SHA for 2bf338c - Browse repository at this point
Copy the full SHA 2bf338cView commit details -
[SPARK-10118] [SPARKR] [DOCS] Improve SparkR API docs for 1.5 release
cc: shivaram ## Summary - Modify `tdname` of expression functions. i.e. `ascii`: `rdname functions` => `rdname ascii` - Replace the dynamical function definitions to the static ones because of thir documentations. ## Generated PDF File https://drive.google.com/file/d/0B9biIZIU47lLX2t6ZjRoRnBTSEU/view?usp=sharing ## JIRA [[SPARK-10118] Improve SparkR API docs for 1.5 release - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10118) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8386 from yu-iskw/SPARK-10118.
Configuration menu - View commit details
-
Copy full SHA for 6511bf5 - Browse repository at this point
Copy the full SHA 6511bf5View commit details -
[SQL] [MINOR] [DOC] Clarify docs for inferring DataFrame from RDD of …
…Products * Makes `SQLImplicits.rddToDataFrameHolder` scaladoc consistent with `SQLContext.createDataFrame[A <: Product](rdd: RDD[A])` since the former is essentially a wrapper for the latter * Clarifies `createDataFrame[A <: Product]` scaladoc to apply for any `RDD[Product]`, not just case classes Author: Feynman Liang <fliang@databricks.com> Closes #8406 from feynmanliang/sql-doc-fixes.
Configuration menu - View commit details
-
Copy full SHA for 642c43c - Browse repository at this point
Copy the full SHA 642c43cView commit details -
[SPARK-10121] [SQL] Thrift server always use the latest class loader …
…provided by the conf of executionHive's state https://issues.apache.org/jira/browse/SPARK-10121 Looks like the problem is that if we add a jar through another thread, the thread handling the JDBC session will not get the latest classloader. Author: Yin Huai <yhuai@databricks.com> Closes #8368 from yhuai/SPARK-10121.
Configuration menu - View commit details
-
Copy full SHA for a0c0aae - Browse repository at this point
Copy the full SHA a0c0aaeView commit details -
[SPARK-10178] [SQL] HiveComparisionTest should print out dependent ta…
…bles In `HiveComparisionTest`s it is possible to fail a query of the form `SELECT * FROM dest1`, where `dest1` is the query that is actually computing the incorrect results. To aid debugging this patch improves the harness to also print these query plans and their results. Author: Michael Armbrust <michael@databricks.com> Closes #8388 from marmbrus/generatedTables.
Configuration menu - View commit details
-
Copy full SHA for 5175ca0 - Browse repository at this point
Copy the full SHA 5175ca0View commit details -
[SPARK-9786] [STREAMING] [KAFKA] fix backpressure so it works with defa…
…ult maxRatePerPartition setting of 0 Author: cody koeninger <cody@koeninger.org> Closes #8413 from koeninger/backpressure-testing-master.
Configuration menu - View commit details
-
Copy full SHA for d9c25de - Browse repository at this point
Copy the full SHA d9c25deView commit details -
[SPARK-10137] [STREAMING] Avoid to restart receivers if scheduleRecei…
…vers returns balanced results This PR fixes the following cases for `ReceiverSchedulingPolicy`. 1) Assume there are 4 executors: host1, host2, host3, host4, and 5 receivers: r1, r2, r3, r4, r5. Then `ReceiverSchedulingPolicy.scheduleReceivers` will return (r1 -> host1, r2 -> host2, r3 -> host3, r4 -> host4, r5 -> host1). Let's assume r1 starts at first on `host1` as `scheduleReceivers` suggested, and try to register with ReceiverTracker. But the previous `ReceiverSchedulingPolicy.rescheduleReceiver` will return (host2, host3, host4) according to the current executor weights (host1 -> 1.0, host2 -> 0.5, host3 -> 0.5, host4 -> 0.5), so ReceiverTracker will reject `r1`. This is unexpected since r1 is starting exactly where `scheduleReceivers` suggested. This case can be fixed by ignoring the information of the receiver that is rescheduling in `receiverTrackingInfoMap`. 2) Assume there are 3 executors (host1, host2, host3) and each executors has 3 cores, and 3 receivers: r1, r2, r3. Assume r1 is running on host1. Now r2 is restarting, the previous `ReceiverSchedulingPolicy.rescheduleReceiver` will always return (host1, host2, host3). So it's possible that r2 will be scheduled to host1 by TaskScheduler. r3 is similar. Then at last, it's possible that there are 3 receivers running on host1, while host2 and host3 are idle. This issue can be fixed by returning only executors that have the minimum wight rather than returning at least 3 executors. Author: zsxwing <zsxwing@gmail.com> Closes #8340 from zsxwing/fix-receiver-scheduling.
Configuration menu - View commit details
-
Copy full SHA for f023aa2 - Browse repository at this point
Copy the full SHA f023aa2View commit details -
[SPARK-10196] [SQL] Correctly saving decimals in internal rows to JSON.
https://issues.apache.org/jira/browse/SPARK-10196 Author: Yin Huai <yhuai@databricks.com> Closes #8408 from yhuai/DecimalJsonSPARK-10196.
Configuration menu - View commit details
-
Copy full SHA for df7041d - Browse repository at this point
Copy the full SHA df7041dView commit details -
[SPARK-10136] [SQL] A more robust fix for SPARK-10136
PR #8341 is a valid fix for SPARK-10136, but it didn't catch the real root cause. The real problem can be rather tricky to explain, and requires audiences to be pretty familiar with parquet-format spec, especially details of `LIST` backwards-compatibility rules. Let me have a try to give an explanation here. The structure of the problematic Parquet schema generated by parquet-avro is something like this: ``` message m { <repetition> group f (LIST) { // Level 1 repeated group array (LIST) { // Level 2 repeated <primitive-type> array; // Level 3 } } } ``` (The schema generated by parquet-thrift is structurally similar, just replace the `array` at level 2 with `f_tuple`, and the other one at level 3 with `f_tuple_tuple`.) This structure consists of two nested legacy 2-level `LIST`-like structures: 1. The repeated group type at level 2 is the element type of the outer array defined at level 1 This group should map to an `CatalystArrayConverter.ElementConverter` when building converters. 2. The repeated primitive type at level 3 is the element type of the inner array defined at level 2 This group should also map to an `CatalystArrayConverter.ElementConverter`. The root cause of SPARK-10136 is that, the group at level 2 isn't properly recognized as the element type of level 1. Thus, according to parquet-format spec, the repeated primitive at level 3 is left as a so called "unannotated repeated primitive type", and is recognized as a required list of required primitive type, thus a `RepeatedPrimitiveConverter` instead of a `CatalystArrayConverter.ElementConverter` is created for it. According to parquet-format spec, unannotated repeated type shouldn't appear in a `LIST`- or `MAP`-annotated group. PR #8341 fixed this issue by allowing such unannotated repeated type appear in `LIST`-annotated groups, which is a non-standard, hacky, but valid fix. (I didn't realize this when authoring #8341 though.) As for the reason why level 2 isn't recognized as a list element type, it's because of the following `LIST` backwards-compatibility rule defined in the parquet-format spec: > If the repeated field is a group with one field and is named either `array` or uses the `LIST`-annotated group's name with `_tuple` appended then the repeated type is the element type and elements are required. (The `array` part is for parquet-avro compatibility, while the `_tuple` part is for parquet-thrift.) This rule is implemented in [`CatalystSchemaConverter.isElementType`] [1], but neglected in [`CatalystRowConverter.isElementType`] [2]. This PR delivers a more robust fix by adding this rule in the latter method. Note that parquet-avro 1.7.0 also suffers from this issue. Details can be found at [PARQUET-364] [3]. [1]: https://github.com/apache/spark/blob/85f9a61357994da5023b08b0a8a2eb09388ce7f8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystSchemaConverter.scala#L259-L305 [2]: https://github.com/apache/spark/blob/85f9a61357994da5023b08b0a8a2eb09388ce7f8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala#L456-L463 [3]: https://issues.apache.org/jira/browse/PARQUET-364 Author: Cheng Lian <lian@databricks.com> Closes #8361 from liancheng/spark-10136/proper-version.
Configuration menu - View commit details
-
Copy full SHA for bf03fe6 - Browse repository at this point
Copy the full SHA bf03fe6View commit details -
[SPARK-9293] [SPARK-9813] Analysis should check that set operations a…
…re only performed on tables with equal numbers of columns This patch adds an analyzer rule to ensure that set operations (union, intersect, and except) are only applied to tables with the same number of columns. Without this rule, there are scenarios where invalid queries can return incorrect results instead of failing with error messages; SPARK-9813 provides one example of this problem. In other cases, the invalid query can crash at runtime with extremely confusing exceptions. I also performed a bit of cleanup to refactor some of those logical operators' code into a common `SetOperation` base class. Author: Josh Rosen <joshrosen@databricks.com> Closes #7631 from JoshRosen/SPARK-9293.
Configuration menu - View commit details
-
Copy full SHA for 82268f0 - Browse repository at this point
Copy the full SHA 82268f0View commit details -
[SPARK-10214] [SPARKR] [DOCS] Improve SparkR Column, DataFrame API docs
cc: shivaram ## Summary - Add name tags to each methods in DataFrame.R and column.R - Replace `rdname column` with `rdname {each_func}`. i.e. alias method : `rdname column` => `rdname alias` ## Generated PDF File https://drive.google.com/file/d/0B9biIZIU47lLNHN2aFpnQXlSeGs/view?usp=sharing ## JIRA [[SPARK-10214] Improve SparkR Column, DataFrame API docs - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10214) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8414 from yu-iskw/SPARK-10214.
Configuration menu - View commit details
-
Copy full SHA for d4549fe - Browse repository at this point
Copy the full SHA d4549feView commit details -
[SPARK-6196] [BUILD] Remove MapR profiles in favor of hadoop-provided
Follow up to #7047 pwendell mentioned that MapR should use `hadoop-provided` now, and indeed the new build script does not produce `mapr3`/`mapr4` artifacts anymore. Hence the action seems to be to remove the profiles, which are now not used. CC trystanleftwich Author: Sean Owen <sowen@cloudera.com> Closes #8338 from srowen/SPARK-6196.
Configuration menu - View commit details
-
Copy full SHA for 57b960b - Browse repository at this point
Copy the full SHA 57b960bView commit details -
[SPARK-10210] [STREAMING] Filter out non-existent blocks before creat…
…ing BlockRDD When write ahead log is not enabled, a recovered streaming driver still tries to run jobs using pre-failure block ids, and fails as the block do not exists in-memory any more (and cannot be recovered as receiver WAL is not enabled). This occurs because the driver-side WAL of ReceivedBlockTracker is recovers that past block information, and ReceiveInputDStream creates BlockRDDs even if those blocks do not exist. The solution in this PR is to filter out block ids that do not exist before creating the BlockRDD. In addition, it adds unit tests to verify other logic in ReceiverInputDStream. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8405 from tdas/SPARK-10210.
Configuration menu - View commit details
-
Copy full SHA for 1fc3758 - Browse repository at this point
Copy the full SHA 1fc3758View commit details -
[SPARK-10177] [SQL] fix reading Timestamp in parquet from Hive
We misunderstood the Julian days and nanoseconds of the day in parquet (as TimestampType) from Hive/Impala, they are overlapped, so can't be added together directly. In order to avoid the confusing rounding when do the converting, we use `2440588` as the Julian Day of epoch of unix timestamp (which should be 2440587.5). Author: Davies Liu <davies@databricks.com> Author: Cheng Lian <lian@databricks.com> Closes #8400 from davies/timestamp_parquet.
Configuration menu - View commit details
-
Copy full SHA for 2f493f7 - Browse repository at this point
Copy the full SHA 2f493f7View commit details -
[SPARK-10195] [SQL] Data sources Filter should not expose internal types
Spark SQL's data sources API exposes Catalyst's internal types through its Filter interfaces. This is a problem because types like UTF8String are not stable developer APIs and should not be exposed to third-parties. This issue caused incompatibilities when upgrading our `spark-redshift` library to work against Spark 1.5.0. To avoid these issues in the future we should only expose public types through these Filter objects. This patch accomplishes this by using CatalystTypeConverters to add the appropriate conversions. Author: Josh Rosen <joshrosen@databricks.com> Closes #8403 from JoshRosen/datasources-internal-vs-external-types.
Configuration menu - View commit details
-
Copy full SHA for 7bc9a8c - Browse repository at this point
Copy the full SHA 7bc9a8cView commit details -
[SPARK-10197] [SQL] Add null check in wrapperFor (inside HiveInspecto…
…rs). https://issues.apache.org/jira/browse/SPARK-10197 Author: Yin Huai <yhuai@databricks.com> Closes #8407 from yhuai/ORCSPARK-10197.
Configuration menu - View commit details
-
Copy full SHA for 0e6368f - Browse repository at this point
Copy the full SHA 0e6368fView commit details -
[DOC] add missing parameters in SparkContext.scala for scala doc
Author: Zhang, Liye <liye.zhang@intel.com> Closes #8412 from liyezhang556520/minorDoc.
Configuration menu - View commit details
-
Copy full SHA for 5c14890 - Browse repository at this point
Copy the full SHA 5c14890View commit details -
Author: ehnalis <zoltan.zvara@gmail.com> Closes #8308 from ehnalis/master.
Configuration menu - View commit details
-
Copy full SHA for 7f1e507 - Browse repository at this point
Copy the full SHA 7f1e507View commit details -
[SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existi…
…ng uses to JavaConverters Replace `JavaConversions` implicits with `JavaConverters` Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet. Author: Sean Owen <sowen@cloudera.com> Closes #8033 from srowen/SPARK-9613.
Configuration menu - View commit details
-
Copy full SHA for 69c9c17 - Browse repository at this point
Copy the full SHA 69c9c17View commit details -
[SPARK-10198] [SQL] Turn off partition verification by default
Author: Michael Armbrust <michael@databricks.com> Closes #8404 from marmbrus/turnOffPartitionVerification.
1Configuration menu - View commit details
-
Copy full SHA for 5c08c86 - Browse repository at this point
Copy the full SHA 5c08c86View commit details -
[SPARK-8531] [ML] Update ML user guide for MinMaxScaler
jira: https://issues.apache.org/jira/browse/SPARK-8531 Update ML user guide for MinMaxScaler Author: Yuhao Yang <hhbyyh@gmail.com> Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com> Closes #7211 from hhbyyh/minmaxdoc.
Configuration menu - View commit details
-
Copy full SHA for b37f0cc - Browse repository at this point
Copy the full SHA b37f0ccView commit details -
[SPARK-10230] [MLLIB] Rename optimizeAlpha to optimizeDocConcentration
See [discussion](#8254 (comment)) CC jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8422 from feynmanliang/SPARK-10230.
Configuration menu - View commit details
-
Copy full SHA for 881208a - Browse repository at this point
Copy the full SHA 881208aView commit details -
[SPARK-10231] [MLLIB] update @SInCE annotation for mllib.classification
Update `Since` annotation in `mllib.classification`: 1. add version to classes, objects, constructors, and public variables declared in constructors 2. correct some versions 3. remove `Since` on `toString` MechCoder dbtsai Author: Xiangrui Meng <meng@databricks.com> Closes #8421 from mengxr/SPARK-10231 and squashes the following commits: b2dce80 [Xiangrui Meng] update @SInCE annotation for mllib.classification
Configuration menu - View commit details
-
Copy full SHA for 16a2be1 - Browse repository at this point
Copy the full SHA 16a2be1View commit details -
[SPARK-10048] [SPARKR] Support arbitrary nested Java array in serde.
This PR: 1. supports transferring arbitrary nested array from JVM to R side in SerDe; 2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types from a DataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #8276 from sun-rui/SPARK-10048.
Configuration menu - View commit details
-
Copy full SHA for 71a138c - Browse repository at this point
Copy the full SHA 71a138cView commit details -
[SPARK-9800] Adds docs for GradientDescent$.runMiniBatchSGD alias
* Adds doc for alias of runMIniBatchSGD documenting default value for convergeTol * Cleans up a note in code Author: Feynman Liang <fliang@databricks.com> Closes #8425 from feynmanliang/SPARK-9800.
Configuration menu - View commit details
-
Copy full SHA for c0e9ff1 - Browse repository at this point
Copy the full SHA c0e9ff1View commit details -
[SPARK-10237] [MLLIB] update since versions in mllib.fpm
Same as #8421 but for `mllib.fpm`. cc feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8429 from mengxr/SPARK-10237.
Configuration menu - View commit details
-
Copy full SHA for c619c75 - Browse repository at this point
Copy the full SHA c619c75View commit details -
[SPARK-9797] [MLLIB] [DOC] StreamingLinearRegressionWithSGD.setConver…
…genceTol default value Adds default convergence tolerance (0.001, set in `GradientDescent.convergenceTol`) to `setConvergenceTol`'s scaladoc Author: Feynman Liang <fliang@databricks.com> Closes #8424 from feynmanliang/SPARK-9797.
Configuration menu - View commit details
-
Copy full SHA for 9205907 - Browse repository at this point
Copy the full SHA 9205907View commit details -
[SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pm…
…ml and mllib.util Same as #8421 but for `mllib.pmml` and `mllib.util`. cc dbtsai Author: Xiangrui Meng <meng@databricks.com> Closes #8430 from mengxr/SPARK-10239 and squashes the following commits: a189acf [Xiangrui Meng] update since versions in mllib.pmml and mllib.util
Configuration menu - View commit details
-
Copy full SHA for 00ae4be - Browse repository at this point
Copy the full SHA 00ae4beView commit details -
[SPARK-10245] [SQL] Fix decimal literals with precision < scale
In BigDecimal or java.math.BigDecimal, the precision could be smaller than scale, for example, BigDecimal("0.001") has precision = 1 and scale = 3. But DecimalType require that the precision should be larger than scale, so we should use the maximum of precision and scale when inferring the schema from decimal literal. Author: Davies Liu <davies@databricks.com> Closes #8428 from davies/smaller_decimal.
Configuration menu - View commit details
-
Copy full SHA for ec89bd8 - Browse repository at this point
Copy the full SHA ec89bd8View commit details -
[SPARK-10215] [SQL] Fix precision of division (follow the rule in Hive)
Follow the rule in Hive for decimal division. see https://github.com/apache/hive/blob/ac755ebe26361a4647d53db2a28500f71697b276/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPDivide.java#L113 cc chenghao-intel Author: Davies Liu <davies@databricks.com> Closes #8415 from davies/decimal_div2.
Configuration menu - View commit details
-
Copy full SHA for 7467b52 - Browse repository at this point
Copy the full SHA 7467b52View commit details
Commits on Aug 26, 2015
-
[SPARK-9888] [MLLIB] User guide for new LDA features
* Adds two new sections to LDA's user guide; one for each optimizer/model * Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization) * Cleans up a TODO and sets a default parameter in LDA code jkbradley hhbyyh Author: Feynman Liang <fliang@databricks.com> Closes #8254 from feynmanliang/SPARK-9888.
Configuration menu - View commit details
-
Copy full SHA for 125205c - Browse repository at this point
Copy the full SHA 125205cView commit details -
[SPARK-10233] [MLLIB] update since version in mllib.evaluation
Same as #8421 but for `mllib.evaluation`. cc avulanov Author: Xiangrui Meng <meng@databricks.com> Closes #8423 from mengxr/SPARK-10233.
Configuration menu - View commit details
-
Copy full SHA for 8668ead - Browse repository at this point
Copy the full SHA 8668eadView commit details -
[SPARK-10238] [MLLIB] update since versions in mllib.linalg
Same as #8421 but for `mllib.linalg`. cc dbtsai Author: Xiangrui Meng <meng@databricks.com> Closes #8440 from mengxr/SPARK-10238 and squashes the following commits: b38437e [Xiangrui Meng] update since versions in mllib.linalg
Configuration menu - View commit details
-
Copy full SHA for ab431f8 - Browse repository at this point
Copy the full SHA ab431f8View commit details -
[SPARK-10240] [SPARK-10242] [MLLIB] update since versions in mlilb.ra…
…ndom and mllib.stat The same as #8241 but for `mllib.stat` and `mllib.random`. cc feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8439 from mengxr/SPARK-10242.
Configuration menu - View commit details
-
Copy full SHA for c3a5484 - Browse repository at this point
Copy the full SHA c3a5484View commit details -
[SPARK-10234] [MLLIB] update since version in mllib.clustering
Same as #8421 but for `mllib.clustering`. cc feynmanliang yu-iskw Author: Xiangrui Meng <meng@databricks.com> Closes #8435 from mengxr/SPARK-10234.
Configuration menu - View commit details
-
Copy full SHA for d703372 - Browse repository at this point
Copy the full SHA d703372View commit details -
[SPARK-10243] [MLLIB] update since versions in mllib.tree
Same as #8421 but for `mllib.tree`. cc jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #8442 from mengxr/SPARK-10236.
Configuration menu - View commit details
-
Copy full SHA for fb7e12f - Browse repository at this point
Copy the full SHA fb7e12fView commit details -
[SPARK-10235] [MLLIB] update since versions in mllib.regression
Same as #8421 but for `mllib.regression`. cc freeman-lab dbtsai Author: Xiangrui Meng <meng@databricks.com> Closes #8426 from mengxr/SPARK-10235 and squashes the following commits: 6cd28e4 [Xiangrui Meng] update since versions in mllib.regression
Configuration menu - View commit details
-
Copy full SHA for 4657fa1 - Browse repository at this point
Copy the full SHA 4657fa1View commit details -
[SPARK-10236] [MLLIB] update since versions in mllib.feature
Same as #8421 but for `mllib.feature`. cc dbtsai Author: Xiangrui Meng <meng@databricks.com> Closes #8449 from mengxr/SPARK-10236.feature and squashes the following commits: 0e8d658 [Xiangrui Meng] remove unnecessary comment ad70b03 [Xiangrui Meng] update since versions in mllib.feature
Configuration menu - View commit details
-
Copy full SHA for 321d775 - Browse repository at this point
Copy the full SHA 321d775View commit details -
[SPARK-9316] [SPARKR] Add support for filtering using
[
(synonym fo……r filter / select) Add support for ``` df[df$name == "Smith", c(1,2)] df[df$age %in% c(19, 30), 1:2] ``` shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #8394 from felixcheung/rsubset.
Configuration menu - View commit details
-
Copy full SHA for 75d4773 - Browse repository at this point
Copy the full SHA 75d4773View commit details -
Configuration menu - View commit details
-
Copy full SHA for bb16405 - Browse repository at this point
Copy the full SHA bb16405View commit details -
[SPARK-9665] [MLLIB] audit MLlib API annotations
I only found `ml.NaiveBayes` missing `Experimental` annotation. This PR doesn't cover Python APIs. cc jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #8452 from mengxr/SPARK-9665.
Configuration menu - View commit details
-
Copy full SHA for 6519fd0 - Browse repository at this point
Copy the full SHA 6519fd0View commit details -
Configuration menu - View commit details
-
Copy full SHA for de7209c - Browse repository at this point
Copy the full SHA de7209cView commit details -
[SPARK-10241] [MLLIB] update since versions in mllib.recommendation
Same as #8421 but for `mllib.recommendation`. cc srowen coderxiang Author: Xiangrui Meng <meng@databricks.com> Closes #8432 from mengxr/SPARK-10241.
Configuration menu - View commit details
-
Copy full SHA for 086d468 - Browse repository at this point
Copy the full SHA 086d468View commit details -
[SPARK-10305] [SQL] fix create DataFrame from Python class
cc jkbradley Author: Davies Liu <davies@databricks.com> Closes #8470 from davies/fix_create_df.
Configuration menu - View commit details
-
Copy full SHA for d41d6c4 - Browse repository at this point
Copy the full SHA d41d6c4View commit details
Commits on Aug 27, 2015
-
[SPARK-10308] [SPARKR] Add %in% to the exported namespace
I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine. cc yu-iskw Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8473 from shivaram/in-namespace.
Configuration menu - View commit details
-
Copy full SHA for ad7f0f1 - Browse repository at this point
Copy the full SHA ad7f0f1View commit details -
[MINOR] [SPARKR] Fix some validation problems in SparkR
Getting rid of some validation problems in SparkR #7883 cc shivaram ``` inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous. expect_equal(class(x), "character") ^~ inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous. ^~ R/DataFrame.R:664:1: style: Trailing whitespace is superfluous. ^~~~~~~~~~~~~~ R/DataFrame.R:670:55: style: Trailing whitespace is superfluous. df <- data.frame(row.names = 1 : nrow) ^~~~~~~~~~~~~~~~ R/DataFrame.R:672:1: style: Trailing whitespace is superfluous. ^~~~~~~~~~~~~~ R/DataFrame.R:686:49: style: Trailing whitespace is superfluous. df[[names[colIndex]]] <- vec ^~~~~~~~~~~~~~~~~~ ``` Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8474 from yu-iskw/minor-fix-sparkr.
Configuration menu - View commit details
-
Copy full SHA for 773ca03 - Browse repository at this point
Copy the full SHA 773ca03View commit details -
[SPARK-9424] [SQL] Parquet programming guide updates for 1.5
Author: Cheng Lian <lian@databricks.com> Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.
3Configuration menu - View commit details
-
Copy full SHA for 0fac144 - Browse repository at this point
Copy the full SHA 0fac144View commit details -
[SPARK-9964] [PYSPARK] [SQL] PySpark DataFrameReader accept RDD of St…
…ring for JSON PySpark DataFrameReader should could accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path. If this PR is merged, it should be duplicated to cover the other input types (not just JSON). Author: Yanbo Liang <ybliang8@gmail.com> Closes #8444 from yanboliang/spark-9964.
Configuration menu - View commit details
-
Copy full SHA for ce97834 - Browse repository at this point
Copy the full SHA ce97834View commit details -
[SPARK-10219] [SPARKR] Fix varargsToEnv and add test case
cc sun-rui davies Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8475 from shivaram/varargs-fix.
Configuration menu - View commit details
-
Copy full SHA for e936cf8 - Browse repository at this point
Copy the full SHA e936cf8View commit details -
[SPARK-10251] [CORE] some common types are not registered for Kryo Se…
…rializat… …ion by default Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #8465 from harsha2010/SPARK-10251.
Configuration menu - View commit details
-
Copy full SHA for de02782 - Browse repository at this point
Copy the full SHA de02782View commit details -
[DOCS] [STREAMING] [KAFKA] Fix typo in exactly once semantics
Fix Typo in exactly once semantics [Semantics of output operations] link Author: Moussa Taifi <moutai10@gmail.com> Closes #8468 from moutai/patch-3.
Configuration menu - View commit details
-
Copy full SHA for 9625d13 - Browse repository at this point
Copy the full SHA 9625d13View commit details -
[SPARK-10254] [ML] Removes Guava dependencies in spark.ml.feature Jav…
…aTests * Replaces `com.google.common` dependencies with `java.util.Arrays` * Small clean up in `JavaNormalizerSuite` Author: Feynman Liang <fliang@databricks.com> Closes #8445 from feynmanliang/SPARK-10254.
Configuration menu - View commit details
-
Copy full SHA for 1650f6f - Browse repository at this point
Copy the full SHA 1650f6fView commit details -
[SPARK-10255] [ML] Removes Guava dependencies from spark.ml.param Jav…
…aTests Author: Feynman Liang <fliang@databricks.com> Closes #8446 from feynmanliang/SPARK-10255.
Configuration menu - View commit details
-
Copy full SHA for 75d6230 - Browse repository at this point
Copy the full SHA 75d6230View commit details -
[SPARK-10256] [ML] Removes guava dependency from spark.ml.classificat…
…ion JavaTests Author: Feynman Liang <fliang@databricks.com> Closes #8447 from feynmanliang/SPARK-10256.
Configuration menu - View commit details
-
Copy full SHA for 1a446f7 - Browse repository at this point
Copy the full SHA 1a446f7View commit details -
[SPARK-9613] [HOTFIX] Fix usage of JavaConverters removed in Scala 2.11
Fix for [JavaConverters.asJavaListConverter](http://www.scala-lang.org/api/2.10.5/index.html#scala.collection.JavaConverters$) being removed in 2.11.7 and hence the build fails with the 2.11 profile enabled. Tested with the default 2.10 and 2.11 profiles. BUILD SUCCESS in both cases. Build for 2.10: ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -DskipTests clean install and 2.11: ./dev/change-scala-version.sh 2.11 ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Dscala-2.11 -DskipTests clean install Author: Jacek Laskowski <jacek@japila.pl> Closes #8479 from jaceklaskowski/SPARK-9613-hotfix.
Configuration menu - View commit details
-
Copy full SHA for b02e818 - Browse repository at this point
Copy the full SHA b02e818View commit details -
[SPARK-10257] [MLLIB] Removes Guava from all spark.mllib Java tests
* Replaces instances of `Lists.newArrayList` with `Arrays.asList` * Replaces `commons.lang.StringUtils` over `com.google.collections.Strings` * Replaces `List` interface over `ArrayList` implementations This PR along with #8445 #8446 #8447 completely removes all `com.google.collections.Lists` dependencies within mllib's Java tests. Author: Feynman Liang <fliang@databricks.com> Closes #8451 from feynmanliang/SPARK-10257.
Configuration menu - View commit details
-
Copy full SHA for e1f4de4 - Browse repository at this point
Copy the full SHA e1f4de4View commit details -
[SPARK-10182] [MLLIB] GeneralizedLinearModel doesn't unpersist cached…
… data `GeneralizedLinearModel` creates a cached RDD when building a model. It's inconvenient, since these RDDs flood the memory when building several models in a row, so useful data might get evicted from the cache. The proposed solution is to always cache the dataset & remove the warning. There's a caveat though: input dataset gets evaluated twice, in line 270 when fitting `StandardScaler` for the first time, and when running optimizer for the second time. So, it might worth to return removed warning. Another possible solution is to disable caching entirely & return removed warning. I don't really know what approach is better. Author: Vyacheslav Baranov <slavik.baranov@gmail.com> Closes #8395 from SlavikBaranov/SPARK-10182.
Configuration menu - View commit details
-
Copy full SHA for fdd466b - Browse repository at this point
Copy the full SHA fdd466bView commit details -
[SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide
Author: Michael Armbrust <michael@databricks.com> Closes #8441 from marmbrus/documentation.
Configuration menu - View commit details
-
Copy full SHA for dc86a22 - Browse repository at this point
Copy the full SHA dc86a22View commit details -
[SPARK-10315] remove document on spark.akka.failure-detector.threshold
https://issues.apache.org/jira/browse/SPARK-10315 this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold' Author: CodingCat <zhunansjtu@gmail.com> Closes #8483 from CodingCat/SPARK_10315.
Configuration menu - View commit details
-
Copy full SHA for 84baa5e - Browse repository at this point
Copy the full SHA 84baa5eView commit details -
[SPARK-9901] User guide for RowMatrix Tall-and-skinny QR
jira: https://issues.apache.org/jira/browse/SPARK-9901 The jira covers only the document update. I can further provide example code for QR (like the ones for SVD and PCA) in a separate PR. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8462 from hhbyyh/qrDoc.
Configuration menu - View commit details
-
Copy full SHA for 6185cdd - Browse repository at this point
Copy the full SHA 6185cddView commit details -
[SPARK-9906] [ML] User guide for LogisticRegressionSummary
User guide for LogisticRegression summaries Author: MechCoder <manojkumarsivaraj334@gmail.com> Author: Manoj Kumar <mks542@nyu.edu> Author: Feynman Liang <fliang@databricks.com> Closes #8197 from MechCoder/log_summary_user_guide.
Configuration menu - View commit details
-
Copy full SHA for c94ecdf - Browse repository at this point
Copy the full SHA c94ecdfView commit details -
[SPARK-9680] [MLLIB] [DOC] StopWordsRemovers user guide and Java comp…
…atibility test * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine * Cleans up scaladocs for public methods * Adds test for Java compatibility * Follow up Python user guide code example is tracked by SPARK-10249 Author: Feynman Liang <fliang@databricks.com> Closes #8436 from feynmanliang/SPARK-10230.
Configuration menu - View commit details
-
Copy full SHA for 5bfe9e1 - Browse repository at this point
Copy the full SHA 5bfe9e1View commit details -
[SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path
https://issues.apache.org/jira/browse/SPARK-10287 After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet). Author: Yin Huai <yhuai@databricks.com> Closes #8469 from yhuai/jsonRefresh.
Configuration menu - View commit details
-
Copy full SHA for b3dd569 - Browse repository at this point
Copy the full SHA b3dd569View commit details -
[SPARK-10321] sizeInBytes in HadoopFsRelation
Having sizeInBytes in HadoopFsRelation to enable broadcast join. cc marmbrus Author: Davies Liu <davies@databricks.com> Closes #8490 from davies/sizeInByte.
Configuration menu - View commit details
-
Copy full SHA for 54cda0d - Browse repository at this point
Copy the full SHA 54cda0dView commit details
Commits on Aug 28, 2015
-
[SPARK-8505] [SPARKR] Add settings to kick
lint-r
from `./dev/run-t……est.py` JoshRosen we'd like to check the SparkR source code with the `dev/lint-r` script on the Jenkins. I tried to incorporate the script into `dev/run-test.py`. Could you review it when you have time? shivaram I modified `dev/lint-r` and `dev/lint-r.R` to install lintr package into a local directory(`R/lib/`) and to exit with a lint status. Could you review it? - [[SPARK-8505] Add settings to kick `lint-r` from `./dev/run-test.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8505) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #7883 from yu-iskw/SPARK-8505.
Configuration menu - View commit details
-
Copy full SHA for 1f90c5e - Browse repository at this point
Copy the full SHA 1f90c5eView commit details -
[SPARK-9911] [DOC] [ML] Update Userguide for Evaluator
I added a small note about the different types of evaluator and the metrics used. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #8304 from MechCoder/multiclass_evaluator.
Configuration menu - View commit details
-
Copy full SHA for 30734d4 - Browse repository at this point
Copy the full SHA 30734d4View commit details -
[SPARK-9905] [ML] [DOC] Adds LinearRegressionSummary user guide
* Adds user guide for `LinearRegressionSummary` * Fixes unresolved issues in #8197 CC jkbradley mengxr Author: Feynman Liang <fliang@databricks.com> Closes #8491 from feynmanliang/SPARK-9905.
Configuration menu - View commit details
-
Copy full SHA for af0e124 - Browse repository at this point
Copy the full SHA af0e124View commit details -
[SPARK-SQL] [MINOR] Fixes some typos in HiveContext
Author: Cheng Lian <lian@databricks.com> Closes #8481 from liancheng/hive-context-typo.
Configuration menu - View commit details
-
Copy full SHA for 89b9434 - Browse repository at this point
Copy the full SHA 89b9434View commit details -
[SPARK-10188] [PYSPARK] Pyspark CrossValidator with RMSE selects inco…
…rrect model * Added isLargerBetter() method to Pyspark Evaluator to match the Scala version. * JavaEvaluator delegates isLargerBetter() to underlying Scala object. * Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax. * Added test cases for where smaller is better (RMSE) and larger is better (R-Squared). (This contribution is my original work and that I license the work to the project under Sparks' open source license) Author: noelsmith <mail@noelsmith.com> Closes #8399 from noel-smith/pyspark-rmse-xval-fix.
Configuration menu - View commit details
-
Copy full SHA for 7583681 - Browse repository at this point
Copy the full SHA 7583681View commit details -
[SPARK-10328] [SPARKR] Fix generic for na.omit
S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com> Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8495 from shivaram/na-omit-fix.
Configuration menu - View commit details
-
Copy full SHA for 2f99c37 - Browse repository at this point
Copy the full SHA 2f99c37View commit details -
[SPARK-10260] [ML] Add @SInCE annotation to ml.clustering
### JIRA [[SPARK-10260] Add Since annotation to ml.clustering - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10260) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8455 from yu-iskw/SPARK-10260.
Configuration menu - View commit details
-
Copy full SHA for 4eeda8d - Browse repository at this point
Copy the full SHA 4eeda8dView commit details -
[SPARK-10295] [CORE] Dynamic allocation in Mesos does not release whe…
…n RDDs are cached Remove obsolete warning about dynamic allocation not working with cached RDDs See discussion in https://issues.apache.org/jira/browse/SPARK-10295 Author: Sean Owen <sowen@cloudera.com> Closes #8489 from srowen/SPARK-10295.
Configuration menu - View commit details
-
Copy full SHA for cc39803 - Browse repository at this point
Copy the full SHA cc39803View commit details -
Fix DynamodDB/DynamoDB typo in Kinesis Integration doc
Fix DynamodDB/DynamoDB typo in Kinesis Integration doc Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8501 from yosssi/patch-1.
Configuration menu - View commit details
-
Copy full SHA for 18294cd - Browse repository at this point
Copy the full SHA 18294cdView commit details -
Author: Dharmesh Kakadia <dharmeshkakadia@users.noreply.github.com> Closes #8497 from dharmeshkakadia/patch-2.
Configuration menu - View commit details
-
Copy full SHA for 71a077f - Browse repository at this point
Copy the full SHA 71a077fView commit details -
[YARN] [MINOR] Avoid hard code port number in YarnShuffleService test
Current port number is fixed as default (7337) in test, this will introduce port contention exception, better to change to a random number in unit test. squito , seems you're author of this unit test, mind taking a look at this fix? Thanks a lot. ``` [info] - executor state kept across NM restart *** FAILED *** (597 milliseconds) [info] org.apache.hadoop.service.ServiceStateException: java.net.BindException: Address already in use [info] at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) [info] at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) [info] at org.apache.spark.network.yarn.YarnShuffleServiceSuite$$anonfun$1.apply$mcV$sp(YarnShuffleServiceSuite.scala:72) [info] at org.apache.spark.network.yarn.YarnShuffleServiceSuite$$anonfun$1.apply(YarnShuffleServiceSuite.scala:70) [info] at org.apache.spark.network.yarn.YarnShuffleServiceSuite$$anonfun$1.apply(YarnShuffleServiceSuite.scala:70) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) ... ``` Author: jerryshao <sshao@hortonworks.com> Closes #8502 from jerryshao/avoid-hardcode-port.
Configuration menu - View commit details
-
Copy full SHA for 1502a0f - Browse repository at this point
Copy the full SHA 1502a0fView commit details -
[SPARK-9890] [DOC] [ML] User guide for CountVectorizer
jira: https://issues.apache.org/jira/browse/SPARK-9890 document with Scala and java examples Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8487 from hhbyyh/cvDoc.
Configuration menu - View commit details
-
Copy full SHA for e2a8430 - Browse repository at this point
Copy the full SHA e2a8430View commit details -
[SPARK-8952] [SPARKR] - Wrap normalizePath calls with suppressWarnings
This is based on davies comment on SPARK-8952 which suggests to only call normalizePath() when path starts with '~' Author: Luciano Resende <lresende@apache.org> Closes #8343 from lresende/SPARK-8952.
Configuration menu - View commit details
-
Copy full SHA for 499e8e1 - Browse repository at this point
Copy the full SHA 499e8e1View commit details -
[SPARK-10325] Override hashCode() for public Row
This commit fixes an issue where the public SQL `Row` class did not override `hashCode`, causing it to violate the hashCode() + equals() contract. To fix this, I simply ported the `hashCode` implementation from the 1.4.x version of `Row`. Author: Josh Rosen <joshrosen@databricks.com> Closes #8500 from JoshRosen/SPARK-10325 and squashes the following commits: 51ffea1 [Josh Rosen] Override hashCode() for public Row.
Configuration menu - View commit details
-
Copy full SHA for d3f87dc - Browse repository at this point
Copy the full SHA d3f87dcView commit details -
[SPARK-9284] [TESTS] Allow all tests to run without an assembly.
This change aims at speeding up the dev cycle a little bit, by making sure that all tests behave the same w.r.t. where the code to be tested is loaded from. Namely, that means that tests don't rely on the assembly anymore, rather loading all needed classes from the build directories. The main change is to make sure all build directories (classes and test-classes) are added to the classpath of child processes when running tests. YarnClusterSuite required some custom code since the executors are run differently (i.e. not through the launcher library, like standalone and Mesos do). I also found a couple of tests that could leak a SparkContext on failure, and added code to handle those. With this patch, it's possible to run the following command from a clean source directory and have all tests pass: mvn -Pyarn -Phadoop-2.4 -Phive-thriftserver install Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7629 from vanzin/SPARK-9284.
Marcelo Vanzin committedAug 28, 2015 Configuration menu - View commit details
-
Copy full SHA for c53c902 - Browse repository at this point
Copy the full SHA c53c902View commit details -
[SPARK-10336][example] fix not being able to set intercept in LR example
`fitIntercept` is a command line option but not set in the main program. dbtsai Author: Shuo Xiang <sxiang@pinterest.com> Closes #8510 from coderxiang/intercept and squashes the following commits: 57c9b7d [Shuo Xiang] fix not being able to set intercept in LR example
Shuo Xiang authored and DB Tsai committedAug 28, 2015 Configuration menu - View commit details
-
Copy full SHA for 4572321 - Browse repository at this point
Copy the full SHA 4572321View commit details -
[SPARK-9671] [MLLIB] re-org user guide and add migration guide
This PR updates the MLlib user guide and adds migration guide for 1.4->1.5. * merge migration guide for `spark.mllib` and `spark.ml` packages * remove dependency section from `spark.ml` guide * move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml` * move Sam's talk to footnote to make the section focus on dependencies Minor changes to code examples and other wording will be in a separate PR. jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8498 from mengxr/SPARK-9671.
Configuration menu - View commit details
-
Copy full SHA for 88032ec - Browse repository at this point
Copy the full SHA 88032ecView commit details -
[SPARK-10323] [SQL] fix nullability of In/InSet/ArrayContain
After this PR, In/InSet/ArrayContain will return null if value is null, instead of false. They also will return null even if there is a null in the set/array. Author: Davies Liu <davies@databricks.com> Closes #8492 from davies/fix_in.
Configuration menu - View commit details
-
Copy full SHA for bb7f352 - Browse repository at this point
Copy the full SHA bb7f352View commit details
Commits on Aug 29, 2015
-
[SPARK-9803] [SPARKR] Add subset and transform + tests
Add subset and transform Also reorganize `[` & `[[` to subset instead of select Note: for transform, transform is very similar to mutate. Spark doesn't seem to replace existing column with the name in mutate (ie. `mutate(df, age = df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform. Though it is clearly stated it should replace column with matching name (should I open a JIRA for mutate/transform?) Author: felixcheung <felixcheung_m@hotmail.com> Closes #8503 from felixcheung/rsubset_transform.
Configuration menu - View commit details
-
Copy full SHA for 2a4e00c - Browse repository at this point
Copy the full SHA 2a4e00cView commit details -
[SPARK-9910] [ML] User guide for train validation split
Author: martinzapletal <zapletal-martin@email.cz> Closes #8377 from zapletal-martin/SPARK-9910.
Configuration menu - View commit details
-
Copy full SHA for e8ea5ba - Browse repository at this point
Copy the full SHA e8ea5baView commit details -
[SPARK-10350] [DOC] [SQL] Removed duplicated option description from …
…SQL guide Author: GuoQiang Li <witgo@qq.com> Closes #8520 from witgo/SPARK-10350.
Configuration menu - View commit details
-
Copy full SHA for 5369be8 - Browse repository at this point
Copy the full SHA 5369be8View commit details -
[SPARK-10289] [SQL] A direct write API for testing Parquet
This PR introduces a direct write API for testing Parquet. It's a DSL flavored version of the [`writeDirect` method] [1] comes with parquet-avro testing code. With this API, it's much easier to construct arbitrary Parquet structures. It's especially useful when adding regression tests for various compatibility corner cases. Sample usage of this API can be found in the new test case added in `ParquetThriftCompatibilitySuite`. [1]: https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.1/parquet-avro/src/test/java/org/apache/parquet/avro/TestArrayCompatibility.java#L945-L972 Author: Cheng Lian <lian@databricks.com> Closes #8454 from liancheng/spark-10289/parquet-testing-direct-write-api.
Configuration menu - View commit details
-
Copy full SHA for 24ffa85 - Browse repository at this point
Copy the full SHA 24ffa85View commit details -
[SPARK-10344] [SQL] Add tests for extraStrategies
Actually using this API requires access to a lot of classes that we might make private by accident. I've added some tests to prevent this. Author: Michael Armbrust <michael@databricks.com> Closes #8516 from marmbrus/extraStrategiesTests.
Configuration menu - View commit details
-
Copy full SHA for 5c3d16a - Browse repository at this point
Copy the full SHA 5c3d16aView commit details -
[SPARK-10226] [SQL] Fix exclamation mark issue in SparkSQL
When I tested the latest version of spark with exclamation mark, I got some errors. Then I reseted the spark version and found that commit id "a2409d1c8e8ddec04b529ac6f6a12b5993f0eeda" brought the bug. With jline version changing from 0.9.94 to 2.12 after this commit, exclamation mark would be treated as a special character in ConsoleReader. Author: wangwei <wangwei82@huawei.com> Closes #8420 from small-wang/jline-SPARK-10226.
Configuration menu - View commit details
-
Copy full SHA for 277148b - Browse repository at this point
Copy the full SHA 277148bView commit details -
[SPARK-10330] Use SparkHadoopUtil TaskAttemptContext reflection metho…
…ds in more places SparkHadoopUtil contains methods that use reflection to work around TaskAttemptContext binary incompatibilities between Hadoop 1.x and 2.x. We should use these methods in more places. Author: Josh Rosen <joshrosen@databricks.com> Closes #8499 from JoshRosen/use-hadoop-reflection-in-more-places.
Configuration menu - View commit details
-
Copy full SHA for 6a6f3c9 - Browse repository at this point
Copy the full SHA 6a6f3c9View commit details -
[SPARK-10339] [SPARK-10334] [SPARK-10301] [SQL] Partitioned table sca…
…n can OOM driver and throw a better error message when users need to enable parquet schema merging This fixes the problem that scanning partitioned table causes driver have a high memory pressure and takes down the cluster. Also, with this fix, we will be able to correctly show the query plan of a query consuming partitioned tables. https://issues.apache.org/jira/browse/SPARK-10339 https://issues.apache.org/jira/browse/SPARK-10334 Finally, this PR squeeze in a "quick fix" for SPARK-10301. It is not a real fix, but it just throw a better error message to let user know what to do. Author: Yin Huai <yhuai@databricks.com> Closes #8515 from yhuai/partitionedTableScan.
2Configuration menu - View commit details
-
Copy full SHA for 097a7e3 - Browse repository at this point
Copy the full SHA 097a7e3View commit details
Commits on Aug 30, 2015
-
[SPARK-9986] [SPARK-9991] [SPARK-9993] [SQL] Create a simple test fra…
…mework for local operators This PR includes the following changes: - Add `LocalNodeTest` for local operator tests and add unit tests for FilterNode and ProjectNode. - Add `LimitNode` and `UnionNode` and their unit tests to show how to use `LocalNodeTest`. (SPARK-9991, SPARK-9993) Author: zsxwing <zsxwing@gmail.com> Closes #8464 from zsxwing/local-execution.
Configuration menu - View commit details
-
Copy full SHA for 13f5f8e - Browse repository at this point
Copy the full SHA 13f5f8eView commit details -
[SPARK-10348] [MLLIB] updates ml-guide
* replace `ML Dataset` by `DataFrame` to unify the abstraction * ML algorithms -> pipeline components to describe the main concept * remove Scala API doc links from the main guide * `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide * modified lines break at 100 chars or periods jkbradley feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8517 from mengxr/SPARK-10348.
1Configuration menu - View commit details
-
Copy full SHA for 905fbe4 - Browse repository at this point
Copy the full SHA 905fbe4View commit details -
[SPARK-10331] [MLLIB] Update example code in ml-guide
* The example code was added in 1.2, before `createDataFrame`. This PR switches to `createDataFrame`. Java code still uses JavaBean. * assume `sqlContext` is available * fix some minor issues from previous code review jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8518 from mengxr/SPARK-10331.
Configuration menu - View commit details
-
Copy full SHA for ca69fc8 - Browse repository at this point
Copy the full SHA ca69fc8View commit details -
[SPARK-10184] [CORE] Optimization for bounds determination in RangePa…
…rtitioner JIRA Issue: https://issues.apache.org/jira/browse/SPARK-10184 Change `cumWeight > target` to `cumWeight >= target` in `RangePartitioner.determineBounds` method to make the output partitions more balanced. Author: ihainan <ihainan72@gmail.com> Closes #8397 from ihainan/opt_for_rangepartitioner.
Configuration menu - View commit details
-
Copy full SHA for 1bfd934 - Browse repository at this point
Copy the full SHA 1bfd934View commit details -
[SPARK-10353] [MLLIB] BLAS gemm not scaling when beta = 0.0 for some …
…subset of matrix multiplications mengxr jkbradley rxin It would be great if this fix made it into RC3! Author: Burak Yavuz <brkyvz@gmail.com> Closes #8525 from brkyvz/blas-scaling.
1Configuration menu - View commit details
-
Copy full SHA for 8d2ab75 - Browse repository at this point
Copy the full SHA 8d2ab75View commit details
Commits on Aug 31, 2015
-
SPARK-9545, SPARK-9547: Use Maven in PRB if title contains "[test-mav…
…en]" This is just some small glue code to actually make use of the AMPLAB_JENKINS_BUILD_TOOL switch. As far as I can tell, we actually don't currently use the Maven support in the tool even though it exists. This patch switches to Maven when the PR title contains "test-maven". There are a few small other pieces of cleanup in the patch as well. Author: Patrick Wendell <patrick@databricks.com> Closes #7878 from pwendell/maven-tests.
Configuration menu - View commit details
-
Copy full SHA for 35e896a - Browse repository at this point
Copy the full SHA 35e896aView commit details -
[SPARK-10351] [SQL] Fixes UTF8String.fromAddress to handle off-heap m…
…emory CC rxin marmbrus Author: Feynman Liang <fliang@databricks.com> Closes #8523 from feynmanliang/SPARK-10351.
Configuration menu - View commit details
-
Copy full SHA for 8694c3a - Browse repository at this point
Copy the full SHA 8694c3aView commit details -
[SPARK-100354] [MLLIB] fix some apparent memory issues in k-means|| i…
…nitializaiton * do not cache first cost RDD * change following cost RDD cache level to MEMORY_AND_DISK * remove Vector wrapper to save a object per instance Further improvements will be addressed in SPARK-10329 cc: yu-iskw HuJiayin Author: Xiangrui Meng <meng@databricks.com> Closes #8526 from mengxr/SPARK-10354.
Configuration menu - View commit details
-
Copy full SHA for f0f563a - Browse repository at this point
Copy the full SHA f0f563aView commit details -
[SPARK-8730] Fixes - Deser objects containing a primitive class attri…
…bute Author: EugenCepoi <cepoi.eugen@gmail.com> Closes #7122 from EugenCepoi/master.
Configuration menu - View commit details
-
Copy full SHA for 72f6dbf - Browse repository at this point
Copy the full SHA 72f6dbfView commit details -
[SPARK-10369] [STREAMING] Don't remove ReceiverTrackingInfo when dere…
…gisterReceivering since we may reuse it later `deregisterReceiver` should not remove `ReceiverTrackingInfo`. Otherwise, it will throw `java.util.NoSuchElementException: key not found` when restarting it. Author: zsxwing <zsxwing@gmail.com> Closes #8538 from zsxwing/SPARK-10369.
1Configuration menu - View commit details
-
Copy full SHA for 4a5fe09 - Browse repository at this point
Copy the full SHA 4a5fe09View commit details -
[SPARK-10170] [SQL] Add DB2 JDBC dialect support.
Data frame write to DB2 database is failing because by default JDBC data source implementation is generating a table schema with DB2 unsupported data types TEXT for String, and BIT1(1) for Boolean. This patch registers DB2 JDBC Dialect that maps String, Boolean to valid DB2 data types. Author: sureshthalamati <suresh.thalamati@gmail.com> Closes #8393 from sureshthalamati/db2_dialect_spark-10170.
Configuration menu - View commit details
-
Copy full SHA for a2d5c72 - Browse repository at this point
Copy the full SHA a2d5c72View commit details -
[SPARK-9954] [MLLIB] use first 128 nonzeros to compute Vector.hashCode
This could help reduce hash collisions, e.g., in `RDD[Vector].repartition`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #8182 from mengxr/SPARK-9954.
Configuration menu - View commit details
-
Copy full SHA for 23e39cc - Browse repository at this point
Copy the full SHA 23e39ccView commit details -
[SPARK-8472] [ML] [PySpark] Python API for DCT
Add Python API for ml.feature.DCT. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8485 from yanboliang/spark-8472.
Configuration menu - View commit details
-
Copy full SHA for 5b3245d - Browse repository at this point
Copy the full SHA 5b3245dView commit details -
[SPARK-10341] [SQL] fix memory starving in unsafe SMJ
In SMJ, the first ExternalSorter could consume all the memory before spilling, then the second can not even acquire the first page. Before we have a better memory allocator, SMJ should call prepare() before call any compute() of it's children. cc rxin JoshRosen Author: Davies Liu <davies@databricks.com> Closes #8511 from davies/smj_memory.
Configuration menu - View commit details
-
Copy full SHA for 540bdee - Browse repository at this point
Copy the full SHA 540bdeeView commit details -
[SPARK-10349] [ML] OneVsRest use 'when ... otherwise' not UDF to gene…
…rate new label at binary reduction Currently OneVsRest use UDF to generate new binary label during training. Considering that [SPARK-7321](https://issues.apache.org/jira/browse/SPARK-7321) has been merged, we can use ```when ... otherwise``` which will be more efficiency. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8519 from yanboliang/spark-10349.
Configuration menu - View commit details
-
Copy full SHA for fe16fd0 - Browse repository at this point
Copy the full SHA fe16fd0View commit details -
[SPARK-10355] [ML] [PySpark] Add Python API for SQLTransformer
Add Python API for SQLTransformer Author: Yanbo Liang <ybliang8@gmail.com> Closes #8527 from yanboliang/spark-10355.
Configuration menu - View commit details
-
Copy full SHA for 52ea399 - Browse repository at this point
Copy the full SHA 52ea399View commit details
Commits on Sep 1, 2015
-
[SPARK-10378][SQL][Test] Remove HashJoinCompatibilitySuite.
They don't bring much value since we now have better unit test coverage for hash joins. This will also help reduce the test time. Author: Reynold Xin <rxin@databricks.com> Closes #8542 from rxin/SPARK-10378.
Configuration menu - View commit details
-
Copy full SHA for d65656c - Browse repository at this point
Copy the full SHA d65656cView commit details -
[SPARK-10301] [SQL] Fixes schema merging for nested structs
This PR can be quite challenging to review. I'm trying to give a detailed description of the problem as well as its solution here. When reading Parquet files, we need to specify a potentially nested Parquet schema (of type `MessageType`) as requested schema for column pruning. This Parquet schema is translated from a Catalyst schema (of type `StructType`), which is generated by the query planner and represents all requested columns. However, this translation can be fairly complicated because of several reasons: 1. Requested schema must conform to the real schema of the physical file to be read. This means we have to tailor the actual file schema of every individual physical Parquet file to be read according to the given Catalyst schema. Fortunately we are already doing this in Spark 1.5 by pushing request schema conversion to executor side in PR #7231. 1. Support for schema merging. A single Parquet dataset may consist of multiple physical Parquet files come with different but compatible schemas. This means we may request for a column path that doesn't exist in a physical Parquet file. All requested column paths can be nested. For example, for a Parquet file schema ``` message root { required group f0 { required group f00 { required int32 f000; required binary f001 (UTF8); } } } ``` we may request for column paths defined in the following schema: ``` message root { required group f0 { required group f00 { required binary f001 (UTF8); required float f002; } } optional double f1; } ``` Notice that we pruned column path `f0.f00.f000`, but added `f0.f00.f002` and `f1`. The good news is that Parquet handles non-existing column paths properly and always returns null for them. 1. The map from `StructType` to `MessageType` is a one-to-many map. This is the most unfortunate part. Due to historical reasons (dark histories!), schemas of Parquet files generated by different libraries have different "flavors". For example, to handle a schema with a single non-nullable column, whose type is an array of non-nullable integers, parquet-protobuf generates the following Parquet schema: ``` message m0 { repeated int32 f; } ``` while parquet-avro generates another version: ``` message m1 { required group f (LIST) { repeated int32 array; } } ``` and parquet-thrift spills this: ``` message m1 { required group f (LIST) { repeated int32 f_tuple; } } ``` All of them can be mapped to the following _unique_ Catalyst schema: ``` StructType( StructField( "f", ArrayType(IntegerType, containsNull = false), nullable = false)) ``` This greatly complicates Parquet requested schema construction, since the path of a given column varies in different cases. To read the array elements from files with the above schemas, we must use `f` for `m0`, `f.array` for `m1`, and `f.f_tuple` for `m2`. In earlier Spark versions, we didn't try to fix this issue properly. Spark 1.4 and prior versions simply translate the Catalyst schema in a way more or less compatible with parquet-hive and parquet-avro, but is broken in many other cases. Earlier revisions of Spark 1.5 only try to tailor the Parquet file schema at the first level, and ignore nested ones. This caused [SPARK-10301] [spark-10301] as well as [SPARK-10005] [spark-10005]. In PR #8228, I tried to avoid the hard part of the problem and made a minimum change in `CatalystRowConverter` to fix SPARK-10005. However, when taking SPARK-10301 into consideration, keeping hacking `CatalystRowConverter` doesn't seem to be a good idea. So this PR is an attempt to fix the problem in a proper way. For a given physical Parquet file with schema `ps` and a compatible Catalyst requested schema `cs`, we use the following algorithm to tailor `ps` to get the result Parquet requested schema `ps'`: For a leaf column path `c` in `cs`: - if `c` exists in `cs` and a corresponding Parquet column path `c'` can be found in `ps`, `c'` should be included in `ps'`; - otherwise, we convert `c` to a Parquet column path `c"` using `CatalystSchemaConverter`, and include `c"` in `ps'`; - no other column paths should exist in `ps'`. Then comes the most tedious part: > Given `cs`, `ps`, and `c`, how to locate `c'` in `ps`? Unfortunately, there's no quick answer, and we have to enumerate all possible structures defined in parquet-format spec. They are: 1. the standard structure of nested types, and 1. cases defined in all backwards-compatibility rules for `LIST` and `MAP`. The core part of this PR is `CatalystReadSupport.clipParquetType()`, which tailors a given Parquet file schema according to a requested schema in its Catalyst form. Backwards-compatibility rules of `LIST` and `MAP` are covered in `clipParquetListType()` and `clipParquetMapType()` respectively. The column path selection algorithm is implemented in `clipParquetGroupFields()`. With this PR, we no longer need to do schema tailoring in `CatalystReadSupport` and `CatalystRowConverter`. Another benefit is that, now we can also read Parquet datasets consist of files with different physical Parquet schema but share the same logical schema, for example, files generated by different Parquet libraries. This situation is illustrated by [this test case] [test-case]. [spark-10301]: https://issues.apache.org/jira/browse/SPARK-10301 [spark-10005]: https://issues.apache.org/jira/browse/SPARK-10005 [test-case]: liancheng@38644d8#diff-a9b98e28ce3ae30641829dffd1173be2R26 Author: Cheng Lian <lian@databricks.com> Closes #8509 from liancheng/spark-10301/fix-parquet-requested-schema.
Configuration menu - View commit details
-
Copy full SHA for 391e6be - Browse repository at this point
Copy the full SHA 391e6beView commit details -
[SPARK-9679] [ML] [PYSPARK] Add Python API for Stop Words Remover
Add a python API for the Stop Words Remover. Author: Holden Karau <holden@pigscanfly.ca> Closes #8118 from holdenk/SPARK-9679-python-StopWordsRemover.
Configuration menu - View commit details
-
Copy full SHA for e6e483c - Browse repository at this point
Copy the full SHA e6e483cView commit details -
[SPARK-10398] [DOCS] Migrate Spark download page to use new lua mirro…
…ring scripts Migrate Apache download closer.cgi refs to new closer.lua This is the bit of the change that affects the project docs; I'm implementing the changes to the Apache site separately. Author: Sean Owen <sowen@cloudera.com> Closes #8557 from srowen/SPARK-10398.
Configuration menu - View commit details
-
Copy full SHA for 3f63bd6 - Browse repository at this point
Copy the full SHA 3f63bd6View commit details -
[SPARK-4223] [CORE] Support * in acls.
SPARK-4223. Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. Manual tests to verify that: "*" works for any user in: a. Spark ui: view and kill stage. Done. b. Spark history server. Done. c. Yarn application killing. Done. Author: zhuol <zhuol@yahoo-inc.com> Closes #8398 from zhuoliu/4223.
3Configuration menu - View commit details
-
Copy full SHA for ec01280 - Browse repository at this point
Copy the full SHA ec01280View commit details -
[SPARK-10162] [SQL] Fix the timezone omitting for PySpark Dataframe f…
…ilter function This PR addresses [SPARK-10162](https://issues.apache.org/jira/browse/SPARK-10162) The issue is with DataFrame filter() function, if datetime.datetime is passed to it: * Timezone information of this datetime is ignored * This datetime is assumed to be in local timezone, which depends on the OS timezone setting Fix includes both code change and regression test. Problem reproduction code on master: ```python import pytz from datetime import datetime from pyspark.sql import * from pyspark.sql.types import * sqc = SQLContext(sc) df = sqc.createDataFrame([], StructType([StructField("dt", TimestampType())])) m1 = pytz.timezone('UTC') m2 = pytz.timezone('Etc/GMT+3') df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() ``` It gives the same timestamp ignoring time zone: ``` >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() Filter (dt#0 > 946713600000000) Scan PhysicalRDD[dt#0] >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() Filter (dt#0 > 946713600000000) Scan PhysicalRDD[dt#0] ``` After the fix: ``` >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() Filter (dt#0 > 946684800000000) Scan PhysicalRDD[dt#0] >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() Filter (dt#0 > 946695600000000) Scan PhysicalRDD[dt#0] ``` PR [8536](#8536) was occasionally closed by me dropping the repo Author: 0x0FFF <programmerag@gmail.com> Closes #8555 from 0x0FFF/SPARK-10162.
Configuration menu - View commit details
-
Copy full SHA for bf550a4 - Browse repository at this point
Copy the full SHA bf550a4View commit details -
[SPARK-10392] [SQL] Pyspark - Wrong DateType support on JDBC connection
This PR addresses issue [SPARK-10392](https://issues.apache.org/jira/browse/SPARK-10392) The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the `datetime.date` due to implementation of its return statement Issue reproduction on master: ``` >>> from pyspark.sql.types import * >>> a = DateType() >>> a.fromInternal(0) 0 >>> a.fromInternal(1) datetime.date(1970, 1, 2) ``` Author: 0x0FFF <programmerag@gmail.com> Closes #8556 from 0x0FFF/SPARK-10392.
Configuration menu - View commit details
-
Copy full SHA for 00d9af5 - Browse repository at this point
Copy the full SHA 00d9af5View commit details
Commits on Sep 2, 2015
-
[SPARK-7336] [HISTORYSERVER] Fix bug that applications status incorre…
…ct on JobHistory UI. Author: ArcherShao <shaochuan@huawei.com> Closes #5886 from ArcherShao/SPARK-7336.
Configuration menu - View commit details
-
Copy full SHA for c3b881a - Browse repository at this point
Copy the full SHA c3b881aView commit details -
[SPARK-10034] [SQL] add regression test for Sort on Aggregate
Before #8371, there was a bug for `Sort` on `Aggregate` that we can't use aggregate expressions named `_aggOrdering` and can't use more than one ordering expressions which contains aggregate functions. The reason of this bug is that: The aggregate expression in `SortOrder` never get resolved, we alias it with `_aggOrdering` and call `toAttribute` which gives us an `UnresolvedAttribute`. So actually we are referencing aggregate expression by name, not by exprId like we thought. And if there is already an aggregate expression named `_aggOrdering` or there are more than one ordering expressions having aggregate functions, we will have conflict names and can't search by name. However, after #8371 got merged, the `SortOrder`s are guaranteed to be resolved and we are always referencing aggregate expression by exprId. The Bug doesn't exist anymore and this PR add regression tests for it. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8231 from cloud-fan/sort-agg.
Configuration menu - View commit details
-
Copy full SHA for 56c4c17 - Browse repository at this point
Copy the full SHA 56c4c17View commit details -
[SPARK-10389] [SQL] support order by non-attribute grouping expressio…
…n on Aggregate For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this in Spark SQL. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8548 from cloud-fan/support-order-by-non-attribute.
Configuration menu - View commit details
-
Copy full SHA for fc48307 - Browse repository at this point
Copy the full SHA fc48307View commit details -
[SPARK-10004] [SHUFFLE] Perform auth checks when clients read shuffle…
… data. To correctly isolate applications, when requests to read shuffle data arrive at the shuffle service, proper authorization checks need to be performed. This change makes sure that only the application that created the shuffle data can read from it. Such checks are only enabled when "spark.authenticate" is enabled, otherwise there's no secure way to make sure that the client is really who it says it is. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8218 from vanzin/SPARK-10004.
Marcelo Vanzin committedSep 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 2da3a9e - Browse repository at this point
Copy the full SHA 2da3a9eView commit details -
[SPARK-10417] [SQL] Iterating through Column results in infinite loop
`pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance) Issue reproduction: ``` df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}'])) for i in df["name"]: print i ``` Author: 0x0FFF <programmerag@gmail.com> Closes #8574 from 0x0FFF/SPARK-10417.
Configuration menu - View commit details
-
Copy full SHA for 6cd98c1 - Browse repository at this point
Copy the full SHA 6cd98c1View commit details
Commits on Sep 3, 2015
-
[SPARK-10422] [SQL] String column in InMemoryColumnarCache needs to o…
…verride clone method https://issues.apache.org/jira/browse/SPARK-10422 Author: Yin Huai <yhuai@databricks.com> Closes #8578 from yhuai/SPARK-10422.
Configuration menu - View commit details
-
Copy full SHA for 03f3e91 - Browse repository at this point
Copy the full SHA 03f3e91View commit details -
[SPARK-9723] [ML] params getordefault should throw more useful error
Params.getOrDefault should throw a more meaningful exception than what you get from a bad key lookup. Author: Holden Karau <holden@pigscanfly.ca> Closes #8567 from holdenk/SPARK-9723-params-getordefault-should-throw-more-useful-error.
Configuration menu - View commit details
-
Copy full SHA for 44948a2 - Browse repository at this point
Copy the full SHA 44948a2View commit details -
[SPARK-5945] Spark should not retry a stage infinitely on a FetchFail…
…edException The ```Stage``` class now tracks whether there were a sufficient number of consecutive failures of that stage to trigger an abort. To avoid an infinite loop of stage retries, we abort the job completely after 4 consecutive stage failures for one stage. We still allow more than 4 consecutive stage failures if there is an intervening successful attempt for the stage, so that in very long-lived applications, where a stage may get reused many times, we don't abort the job after failures that have been recovered from successfully. I've added test cases to exercise the most obvious scenarios. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #5636 from ilganeli/SPARK-5945.
Ilya Ganelin authored and Andrew Or committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for 4bd85d0 - Browse repository at this point
Copy the full SHA 4bd85d0View commit details -
[SPARK-8707] RDD#toDebugString fails if any cached RDD has invalid pa…
…rtitions Added numPartitions(evaluate: Boolean) to RDD. With "evaluate=true" the method is same with "partitions.length". With "evaluate=false", it checks checked-out or already evaluated partitions in the RDD to get number of partition. If it's not those cases, returns -1. RDDInfo.partitionNum calls numPartition only when it's accessed. Author: navis.ryu <navis@apache.org> Closes #7127 from navis/SPARK-8707.
Configuration menu - View commit details
-
Copy full SHA for 0985d2c - Browse repository at this point
Copy the full SHA 0985d2cView commit details -
Removed code duplication in ShuffleBlockFetcherIterator
Added fetchUpToMaxBytes() to prevent having to update both code blocks when a change is made. Author: Evan Racah <ejracah@gmail.com> Closes #8514 from eracah/master.
Configuration menu - View commit details
-
Copy full SHA for f6c447f - Browse repository at this point
Copy the full SHA f6c447fView commit details -
[SPARK-10247] [CORE] improve readability of a test case in DAGSchedul…
…erSuite This is pretty minor, just trying to improve the readability of `DAGSchedulerSuite`, I figure every bit helps. Before whenever I read this test, I never knew what "should work" and "should be ignored" really meant -- this adds some asserts & updates comments to make it more clear. Also some reformatting per a suggestion from markhamstra on #7699 Author: Imran Rashid <irashid@cloudera.com> Closes #8434 from squito/SPARK-10247.
Configuration menu - View commit details
-
Copy full SHA for 3ddb9b3 - Browse repository at this point
Copy the full SHA 3ddb9b3View commit details -
[SPARK-10379] preserve first page in UnsafeShuffleExternalSorter
Author: Davies Liu <davies@databricks.com> Closes #8543 from davies/preserve_page.
Davies Liu authored and Andrew Or committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for 62b4690 - Browse repository at this point
Copy the full SHA 62b4690View commit details -
[SPARK-10411] [SQL] Move visualization above explain output and hide …
…explain by default New screenshots after this fix: <img width="627" alt="s1" src="https://cloud.githubusercontent.com/assets/1000778/9625782/4b2dba36-518b-11e5-9104-c713ff026e3d.png"> Default: <img width="462" alt="s2" src="https://cloud.githubusercontent.com/assets/1000778/9625817/92366e50-518b-11e5-9981-cdfb774d66b8.png"> After clicking `+details`: <img width="377" alt="s3" src="https://cloud.githubusercontent.com/assets/1000778/9625784/4ba24342-518b-11e5-8522-846a16a95d44.png"> Author: zsxwing <zsxwing@gmail.com> Closes #8570 from zsxwing/SPARK-10411.
Configuration menu - View commit details
-
Copy full SHA for 0349b5b - Browse repository at this point
Copy the full SHA 0349b5bView commit details -
[SPARK-10332] [CORE] Fix yarn spark executor validation
From Jira: Running spark-submit with yarn with number-executors equal to 0 when not using dynamic allocation should error out. In spark 1.5.0 it continues and ends up hanging. yarn.ClientArguments still has the check so something else must have changed. spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --num-executors 0 .... spark 1.4.1 errors with: java.lang.IllegalArgumentException: Number of executors was 0, but must be at least 1 (or 0 if dynamic executor allocation is enabled). Author: Holden Karau <holden@pigscanfly.ca> Closes #8580 from holdenk/SPARK-10332-spark-submit-to-yarn-executors-0-message.
Configuration menu - View commit details
-
Copy full SHA for 67580f1 - Browse repository at this point
Copy the full SHA 67580f1View commit details -
[SPARK-9596] [SQL] treat hadoop classes as shared one in IsolatedClie…
…ntLoader https://issues.apache.org/jira/browse/SPARK-9596 Author: WangTaoTheTonic <wangtao111@huawei.com> Closes #7931 from WangTaoTheTonic/SPARK-9596.
Configuration menu - View commit details
-
Copy full SHA for 3abc0d5 - Browse repository at this point
Copy the full SHA 3abc0d5View commit details -
[SPARK-8951] [SPARKR] support Unicode characters in collect()
Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK. I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R. Author: CHOIJAEHONG <redrock07@naver.com> Closes #7494 from CHOIJAEHONG1/SPARK-8951.
Configuration menu - View commit details
-
Copy full SHA for af0e312 - Browse repository at this point
Copy the full SHA af0e312View commit details -
[SPARK-10432] spark.port.maxRetries documentation is unclear
Author: Tom Graves <tgraves@yahoo-inc.com> Closes #8585 from tgravescs/SPARK-10432.
Tom Graves authored and Andrew Or committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for 49aff7b - Browse repository at this point
Copy the full SHA 49aff7bView commit details -
[SPARK-10431] [CORE] Fix intermittent test failure. Wait for event qu…
…eue to be clear Author: robbins <robbins@uk.ibm.com> Closes #8582 from robbinspg/InputOutputMetricsSuite.
robbins authored and Andrew Or committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for d911c68 - Browse repository at this point
Copy the full SHA d911c68View commit details -
[SPARK-9869] [STREAMING] Wait for all event notifications before asse…
…rting results Author: robbins <robbins@uk.ibm.com> Closes #8589 from robbinspg/InputStreamSuite-fix.
robbins authored and Andrew Or committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for 754f853 - Browse repository at this point
Copy the full SHA 754f853View commit details -
[SPARK-9672] [MESOS] Don’t include SPARK_ENV_LOADED when passing env …
…vars This contribution is my original work and I license the work to the project under the project's open source license. Author: Pat Shields <yeoldefortran@gmail.com> Closes #7979 from pashields/env-loading-on-driver.
Configuration menu - View commit details
-
Copy full SHA for e62f4a4 - Browse repository at this point
Copy the full SHA e62f4a4View commit details -
[SPARK-10430] [CORE] Added hashCode methods in AccumulableInfo and RD…
…DOperationScope Author: Vinod K C <vinod.kc@huawei.com> Closes #8581 from vinodkc/fix_RDDOperationScope_Hashcode.
Vinod K C authored and Andrew Or committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for 11ef32c - Browse repository at this point
Copy the full SHA 11ef32cView commit details -
[SPARK-9591] [CORE] Job may fail for exception during getting remote …
…block [SPARK-9591](https://issues.apache.org/jira/browse/SPARK-9591) When we getting the broadcast variable, we can fetch the block form several location,but now when connecting the lost blockmanager(idle for enough time removed by driver when using dynamic resource allocate and so on) will cause task fail,and the worse case will cause the job fail. Author: jeanlyn <jeanlyn92@gmail.com> Closes #7927 from jeanlyn/catch_exception.
Configuration menu - View commit details
-
Copy full SHA for db4c130 - Browse repository at this point
Copy the full SHA db4c130View commit details -
[SPARK-10435] Spark submit should fail fast for Mesos cluster mode wi…
…th R It's not supported yet so we should error with a clear message. Author: Andrew Or <andrew@databricks.com> Closes #8590 from andrewor14/mesos-cluster-r-guard.
Andrew Or committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for 08b0750 - Browse repository at this point
Copy the full SHA 08b0750View commit details -
[SPARK-10421] [BUILD] Exclude curator artifacts from tachyon dependen…
…cies. This avoids them being mistakenly pulled instead of the newer ones that Spark actually uses. Spark only depends on these artifacts transitively, so sometimes maven just decides to pick tachyon's version of the dependency for whatever reason. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8577 from vanzin/SPARK-10421.
Marcelo Vanzin committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for 208fbca - Browse repository at this point
Copy the full SHA 208fbcaView commit details
Commits on Sep 4, 2015
-
[SPARK-10003] Improve readability of DAGScheduler
Note: this is not intended to be in Spark 1.5! This patch rewrites some code in the `DAGScheduler` to make it more readable. In particular - there were blocks of code that are unnecessary and removed for simplicity - there were abstractions that are unnecessary and made the code hard to navigate - other minor changes Author: Andrew Or <andrew@databricks.com> Closes #8217 from andrewor14/dag-scheduler-readability and squashes the following commits: 57abca3 [Andrew Or] Move comment back into if case 574fb1e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-scheduler-readability 64a9ed2 [Andrew Or] Remove unnecessary code + minor code rewrites
Configuration menu - View commit details
-
Copy full SHA for cf42138 - Browse repository at this point
Copy the full SHA cf42138View commit details -
[MINOR] Minor style fix in SparkR
`dev/lintr-r` passes on my machine now Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8601 from shivaram/sparkr-style-fix.
Configuration menu - View commit details
-
Copy full SHA for 143e521 - Browse repository at this point
Copy the full SHA 143e521View commit details -
MAINTENANCE: Automated closing of pull requests.
This commit exists to close the following pull requests on Github: Closes #1890 (requested by andrewor14, JoshRosen) Closes #3558 (requested by JoshRosen, marmbrus) Closes #3890 (requested by marmbrus) Closes #3895 (requested by andrewor14, marmbrus) Closes #4055 (requested by andrewor14) Closes #4105 (requested by andrewor14) Closes #4812 (requested by marmbrus) Closes #5109 (requested by andrewor14) Closes #5178 (requested by andrewor14) Closes #5298 (requested by marmbrus) Closes #5393 (requested by marmbrus) Closes #5449 (requested by andrewor14) Closes #5468 (requested by marmbrus) Closes #5715 (requested by marmbrus) Closes #6192 (requested by marmbrus) Closes #6319 (requested by marmbrus) Closes #6326 (requested by marmbrus) Closes #6349 (requested by marmbrus) Closes #6380 (requested by andrewor14) Closes #6554 (requested by marmbrus) Closes #6696 (requested by marmbrus) Closes #6868 (requested by marmbrus) Closes #6951 (requested by marmbrus) Closes #7129 (requested by marmbrus) Closes #7188 (requested by marmbrus) Closes #7358 (requested by marmbrus) Closes #7379 (requested by marmbrus) Closes #7628 (requested by marmbrus) Closes #7715 (requested by marmbrus) Closes #7782 (requested by marmbrus) Closes #7914 (requested by andrewor14) Closes #8051 (requested by andrewor14) Closes #8269 (requested by andrewor14) Closes #8448 (requested by andrewor14) Closes #8576 (requested by andrewor14)
Configuration menu - View commit details
-
Copy full SHA for 804a012 - Browse repository at this point
Copy the full SHA 804a012View commit details -
[SPARK-10176] [SQL] Show partially analyzed plans when checkAnswer fa…
…ils to analyze This PR takes over #8389. This PR improves `checkAnswer` to print the partially analyzed plan in addition to the user friendly error message, in order to aid debugging failing tests. In doing so, I ran into a conflict with the various ways that we bring a SQLContext into the tests. Depending on the trait we refer to the current context as `sqlContext`, `_sqlContext`, `ctx` or `hiveContext` with access modifiers `public`, `protected` and `private` depending on the defining class. I propose we refactor as follows: 1. All tests should only refer to a `protected sqlContext` when testing general features, and `protected hiveContext` when it is a method that only exists on a `HiveContext`. 2. All tests should only import `testImplicits._` (i.e., don't import `TestHive.implicits._`) Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8584 from cloud-fan/cleanupTests.
Configuration menu - View commit details
-
Copy full SHA for c3c0e43 - Browse repository at this point
Copy the full SHA c3c0e43View commit details -
[SPARK-10450] [SQL] Minor improvements to readability / style / typos…
… etc. Author: Andrew Or <andrew@databricks.com> Closes #8603 from andrewor14/minor-sql-changes.
Andrew Or committedSep 4, 2015 Configuration menu - View commit details
-
Copy full SHA for 3339e6f - Browse repository at this point
Copy the full SHA 3339e6fView commit details -
[SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode.
Support running pyspark with cluster mode on Mesos! This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI. Author: Timothy Chen <tnachen@gmail.com> Closes #8349 from tnachen/mesos_python.
Configuration menu - View commit details
-
Copy full SHA for b087d23 - Browse repository at this point
Copy the full SHA b087d23View commit details -
[SPARK-10454] [SPARK CORE] wait for empty event queue
Author: robbins <robbins@uk.ibm.com> Closes #8605 from robbinspg/DAGSchedulerSuite-fix.
robbins authored and Andrew Or committedSep 4, 2015 Configuration menu - View commit details
-
Copy full SHA for 2e1c175 - Browse repository at this point
Copy the full SHA 2e1c175View commit details -
[SPARK-10311] [STREAMING] Reload appId and attemptId when app starts …
…with checkpoint file in cluster mode Author: xutingjun <xutingjun@huawei.com> Closes #8477 from XuTingjun/streaming-attempt.
Configuration menu - View commit details
-
Copy full SHA for eafe372 - Browse repository at this point
Copy the full SHA eafe372View commit details
Commits on Sep 5, 2015
-
[SPARK-10402] [DOCS] [ML] Add defaults to the scaladoc for params in ml/
We should make sure the scaladoc for params includes their default values through the models in ml/ Author: Holden Karau <holden@pigscanfly.ca> Closes #8591 from holdenk/SPARK-10402-add-scaladoc-for-default-values-of-params-in-ml.
Configuration menu - View commit details
-
Copy full SHA for 22eab70 - Browse repository at this point
Copy the full SHA 22eab70View commit details -
[SPARK-9925] [SQL] [TESTS] Set SQLConf.SHUFFLE_PARTITIONS.key correct…
…ly for tests This PR fix the failed test and conflict for #8155 https://issues.apache.org/jira/browse/SPARK-9925 Closes #8155 Author: Yin Huai <yhuai@databricks.com> Author: Davies Liu <davies@databricks.com> Closes #8602 from davies/shuffle_partitions.
Configuration menu - View commit details
-
Copy full SHA for 47058ca - Browse repository at this point
Copy the full SHA 47058caView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6c75194 - Browse repository at this point
Copy the full SHA 6c75194View commit details -
[SPARK-10440] [STREAMING] [DOCS] Update python API stuff in the progr…
…amming guides and python docs - Fixed information around Python API tags in streaming programming guides - Added missing stuff in python docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8595 from tdas/SPARK-10440.
1Configuration menu - View commit details
-
Copy full SHA for 7a4f326 - Browse repository at this point
Copy the full SHA 7a4f326View commit details -
[SPARK-10434] [SQL] Fixes Parquet schema of arrays that may contain null
To keep full compatibility of Parquet write path with Spark 1.4, we should rename the innermost field name of arrays that may contain null from "array_element" to "array". Please refer to [SPARK-10434] [1] for more details. [1]: https://issues.apache.org/jira/browse/SPARK-10434 Author: Cheng Lian <lian@databricks.com> Closes #8586 from liancheng/spark-10434/fix-parquet-array-type.
Configuration menu - View commit details
-
Copy full SHA for bca8c07 - Browse repository at this point
Copy the full SHA bca8c07View commit details -
[SPARK-10013] [ML] [JAVA] [TEST] remove java assert from java unit tests
From Jira: We should use assertTrue, etc. instead to make sure the asserts are not ignored in tests. Author: Holden Karau <holden@pigscanfly.ca> Closes #8607 from holdenk/SPARK-10013-remove-java-assert-from-java-unit-tests.
Configuration menu - View commit details
-
Copy full SHA for 871764c - Browse repository at this point
Copy the full SHA 871764cView commit details
Commits on Sep 7, 2015
-
[SPARK-9767] Remove ConnectionManager.
We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it. Author: Reynold Xin <rxin@databricks.com> Closes #8161 from rxin/SPARK-9767.
Configuration menu - View commit details
-
Copy full SHA for 5ffe752 - Browse repository at this point
Copy the full SHA 5ffe752View commit details
Commits on Sep 8, 2015
-
[DOC] Added R to the list of languages with "high-level API" support …
…in the… … main README. Author: Stephen Hopper <shopper@shopper-osx.local> Closes #8646 from enragedginger/master.
1Configuration menu - View commit details
-
Copy full SHA for 9d8e838 - Browse repository at this point
Copy the full SHA 9d8e838View commit details -
Author: Jacek Laskowski <jacek@japila.pl> Closes #8629 from jaceklaskowski/docs-fixes.
Configuration menu - View commit details
-
Copy full SHA for 6ceed85 - Browse repository at this point
Copy the full SHA 6ceed85View commit details -
[SPARK-9170] [SQL] Use OrcStructInspector to be case preserving when …
…writing ORC files JIRA: https://issues.apache.org/jira/browse/SPARK-9170 `StandardStructObjectInspector` will implicitly lowercase column names. But I think Orc format doesn't have such requirement. In fact, there is a `OrcStructInspector` specified for Orc format. We should use it when serialize rows to Orc file. It can be case preserving when writing ORC files. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7520 from viirya/use_orcstruct.
Configuration menu - View commit details
-
Copy full SHA for 990c9f7 - Browse repository at this point
Copy the full SHA 990c9f7View commit details -
[SPARK-10480] [ML] Fix ML.LinearRegressionModel.copy()
This PR fix two model ```copy()``` related issues: [SPARK-10480](https://issues.apache.org/jira/browse/SPARK-10480) ```ML.LinearRegressionModel.copy()``` ignored argument ```extra```, it will not take effect when users setting this parameter. [SPARK-10479](https://issues.apache.org/jira/browse/SPARK-10479) ```ML.LogisticRegressionModel.copy()``` should copy model summary if available. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8641 from yanboliang/linear-regression-copy.
Configuration menu - View commit details
-
Copy full SHA for 5b2192e - Browse repository at this point
Copy the full SHA 5b2192eView commit details -
[SPARK-10316] [SQL] respect nondeterministic expressions in PhysicalO…
…peration We did a lot of special handling for non-deterministic expressions in `Optimizer`. However, `PhysicalOperation` just collects all Projects and Filters and mess it up. We should respect the operators order caused by non-deterministic expressions in `PhysicalOperation`. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8486 from cloud-fan/fix.
Configuration menu - View commit details
-
Copy full SHA for 5fd5795 - Browse repository at this point
Copy the full SHA 5fd5795View commit details -
[SPARK-10470] [ML] ml.IsotonicRegressionModel.copy should set parent
Copied model must have the same parent, but ml.IsotonicRegressionModel.copy did not set parent. Here fix it and add test case. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8637 from yanboliang/spark-10470.
Configuration menu - View commit details
-
Copy full SHA for f7b55db - Browse repository at this point
Copy the full SHA f7b55dbView commit details -
[SPARK-10441] [SQL] Save data correctly to json.
https://issues.apache.org/jira/browse/SPARK-10441 Author: Yin Huai <yhuai@databricks.com> Closes #8597 from yhuai/timestampJson.
Configuration menu - View commit details
-
Copy full SHA for 7a9dcbc - Browse repository at this point
Copy the full SHA 7a9dcbcView commit details -
[SPARK-10468] [ MLLIB ] Verify schema before Dataframe select API call
Loader.checkSchema was called to verify the schema after dataframe.select(...). Schema verification should be done before dataframe.select(...) Author: Vinod K C <vinod.kc@huawei.com> Closes #8636 from vinodkc/fix_GaussianMixtureModel_load_verification.
Configuration menu - View commit details
-
Copy full SHA for e6f8d36 - Browse repository at this point
Copy the full SHA e6f8d36View commit details -
[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentat…
…ion about rate limiting and backpressure Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure
3Configuration menu - View commit details
-
Copy full SHA for 52b24a6 - Browse repository at this point
Copy the full SHA 52b24a6View commit details -
[SPARK-10327] [SQL] Cache Table is not working while subquery has ali…
…as in its project list ```scala import org.apache.spark.sql.hive.execution.HiveTableScan sql("select key, value, key + 1 from src").registerTempTable("abc") cacheTable("abc") val sparkPlan = sql( """select a.key, b.key, c.key from |abc a join abc b on a.key=b.key |join abc c on a.key=c.key""".stripMargin).queryExecution.sparkPlan assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 3) // failed assert(sparkPlan.collect { case e: HiveTableScan => e }.size === 0) // failed ``` The actual plan is: ``` == Parsed Logical Plan == 'Project [unresolvedalias('a.key),unresolvedalias('b.key),unresolvedalias('c.key)] 'Join Inner, Some(('a.key = 'c.key)) 'Join Inner, Some(('a.key = 'b.key)) 'UnresolvedRelation [abc], Some(a) 'UnresolvedRelation [abc], Some(b) 'UnresolvedRelation [abc], Some(c) == Analyzed Logical Plan == key: int, key: int, key: int Project [key#14,key#61,key#66] Join Inner, Some((key#14 = key#66)) Join Inner, Some((key#14 = key#61)) Subquery a Subquery abc Project [key#14,value#15,(key#14 + 1) AS _c2#16] MetastoreRelation default, src, None Subquery b Subquery abc Project [key#61,value#62,(key#61 + 1) AS _c2#58] MetastoreRelation default, src, None Subquery c Subquery abc Project [key#66,value#67,(key#66 + 1) AS _c2#63] MetastoreRelation default, src, None == Optimized Logical Plan == Project [key#14,key#61,key#66] Join Inner, Some((key#14 = key#66)) Project [key#14,key#61] Join Inner, Some((key#14 = key#61)) Project [key#14] InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc) Project [key#61] MetastoreRelation default, src, None Project [key#66] MetastoreRelation default, src, None == Physical Plan == TungstenProject [key#14,key#61,key#66] BroadcastHashJoin [key#14], [key#66], BuildRight TungstenProject [key#14,key#61] BroadcastHashJoin [key#14], [key#61], BuildRight ConvertToUnsafe InMemoryColumnarTableScan [key#14], (InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc)) ConvertToUnsafe HiveTableScan [key#61], (MetastoreRelation default, src, None) ConvertToUnsafe HiveTableScan [key#66], (MetastoreRelation default, src, None) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #8494 from chenghao-intel/weird_cache.
Configuration menu - View commit details
-
Copy full SHA for d637a66 - Browse repository at this point
Copy the full SHA d637a66View commit details -
[HOTFIX] Fix build break caused by #8494
Author: Michael Armbrust <michael@databricks.com> Closes #8659 from marmbrus/testBuildBreak.
Configuration menu - View commit details
-
Copy full SHA for 2143d59 - Browse repository at this point
Copy the full SHA 2143d59View commit details
Commits on Sep 9, 2015
-
[RELEASE] Add more contributors & only show names in release notes.
Author: Reynold Xin <rxin@databricks.com> Closes #8660 from rxin/contrib.
Configuration menu - View commit details
-
Copy full SHA for ae74c3f - Browse repository at this point
Copy the full SHA ae74c3fView commit details -
[SPARK-10071] [STREAMING] Output a warning when writing QueueInputDSt…
…ream and throw a better exception when reading QueueInputDStream Output a warning when serializing QueueInputDStream rather than throwing an exception to allow unit tests use it. Moreover, this PR also throws an better exception when deserializing QueueInputDStream to make the user find out the problem easily. The previous exception is hard to understand: https://issues.apache.org/jira/browse/SPARK-8553 Author: zsxwing <zsxwing@gmail.com> Closes #8624 from zsxwing/SPARK-10071 and squashes the following commits: 847cfa8 [zsxwing] Output a warning when writing QueueInputDStream and throw a better exception when reading QueueInputDStream
Configuration menu - View commit details
-
Copy full SHA for 820913f - Browse repository at this point
Copy the full SHA 820913fView commit details -
[SPARK-9834] [MLLIB] implement weighted least squares via normal equa…
…tion The goal of this PR is to have a weighted least squares implementation that takes the normal equation approach, and hence to be able to provide R-like summary statistics and support IRLS (used by GLMs). The tests match R's lm and glmnet. There are couple TODOs that can be addressed in future PRs: * consolidate summary statistics aggregators * move `dspr` to `BLAS` * etc It would be nice to have this merged first because it blocks couple other features. dbtsai Author: Xiangrui Meng <meng@databricks.com> Closes #8588 from mengxr/SPARK-9834.
Configuration menu - View commit details
-
Copy full SHA for 52fe32f - Browse repository at this point
Copy the full SHA 52fe32fView commit details -
[SPARK-10464] [MLLIB] Add WeibullGenerator for RandomDataGenerator
Add WeibullGenerator for RandomDataGenerator. #8611 need use WeibullGenerator to generate random data based on Weibull distribution. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8622 from yanboliang/spark-10464.
Configuration menu - View commit details
-
Copy full SHA for a157348 - Browse repository at this point
Copy the full SHA a157348View commit details -
[SPARK-10373] [PYSPARK] move @SInCE into pyspark from sql
cc mengxr Author: Davies Liu <davies@databricks.com> Closes #8657 from davies/move_since.
Configuration menu - View commit details
-
Copy full SHA for 3a11e50 - Browse repository at this point
Copy the full SHA 3a11e50View commit details -
[SPARK-10094] Pyspark ML Feature transformers marked as experimental
Modified class-level docstrings to mark all feature transformers in pyspark.ml as experimental. Author: noelsmith <mail@noelsmith.com> Closes #8623 from noel-smith/SPARK-10094-mark-pyspark-ml-trans-exp.
Configuration menu - View commit details
-
Copy full SHA for 0e2f216 - Browse repository at this point
Copy the full SHA 0e2f216View commit details -
[SPARK-9654] [ML] [PYSPARK] Add IndexToString to PySpark
Adds IndexToString to PySpark. Author: Holden Karau <holden@pigscanfly.ca> Closes #7976 from holdenk/SPARK-9654-add-string-indexer-inverse-in-pyspark.
Configuration menu - View commit details
-
Copy full SHA for 2f6fd52 - Browse repository at this point
Copy the full SHA 2f6fd52View commit details -
[SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover …
…User Guide jira: https://issues.apache.org/jira/browse/SPARK-10249 update user guide since python support added. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8620 from hhbyyh/swPyDocExample.
Configuration menu - View commit details
-
Copy full SHA for 91a577d - Browse repository at this point
Copy the full SHA 91a577dView commit details -
[SPARK-10227] fatal warnings with sbt on Scala 2.11
The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary. But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations. The remainder are some potential bugs, and deprecated syntax. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #8433 from skyluc/issue/sbt-2.11.
Configuration menu - View commit details
-
Copy full SHA for c1bc4f4 - Browse repository at this point
Copy the full SHA c1bc4f4View commit details -
[SPARK-10117] [MLLIB] Implement SQL data source API for reading LIBSV…
…M data It is convenient to implement data source API for LIBSVM format to have a better integration with DataFrames and ML pipeline API. Two option is implemented. * `numFeatures`: Specify the dimension of features vector * `featuresType`: Specify the type of output vector. `sparse` is default. Author: lewuathe <lewuathe@me.com> Closes #8537 from Lewuathe/SPARK-10117 and squashes the following commits: 986999d [lewuathe] Change unit test phrase 11d513f [lewuathe] Fix some reviews 21600a4 [lewuathe] Merge branch 'master' into SPARK-10117 9ce63c7 [lewuathe] Rewrite service loader file 1fdd2df [lewuathe] Merge branch 'SPARK-10117' of github.com:Lewuathe/spark into SPARK-10117 ba3657c [lewuathe] Merge branch 'master' into SPARK-10117 0ea1c1c [lewuathe] LibSVMRelation is registered into META-INF 4f40891 [lewuathe] Improve test suites 5ab62ab [lewuathe] Merge branch 'master' into SPARK-10117 8660d0e [lewuathe] Fix Java unit test b56a948 [lewuathe] Merge branch 'master' into SPARK-10117 2c12894 [lewuathe] Remove unnecessary tag 7d693c2 [lewuathe] Resolv conflict 62010af [lewuathe] Merge branch 'master' into SPARK-10117 a97ee97 [lewuathe] Fix some points aef9564 [lewuathe] Fix 70ee4dd [lewuathe] Add Java test 3fd8dce [lewuathe] [SPARK-10117] Implement SQL data source API for reading LIBSVM data 40d3027 [lewuathe] Add Java test 7056d4a [lewuathe] Merge branch 'master' into SPARK-10117 99accaa [lewuathe] [SPARK-10117] Implement SQL data source API for reading LIBSVM data
1Configuration menu - View commit details
-
Copy full SHA for 2ddeb63 - Browse repository at this point
Copy the full SHA 2ddeb63View commit details -
[SPARK-10481] [YARN] SPARK_PREPEND_CLASSES make spark-yarn related ja…
…r could n… Throw a more readable exception. Please help review. Thanks Author: Jeff Zhang <zjffdu@apache.org> Closes #8649 from zjffdu/SPARK-10481.
Configuration menu - View commit details
-
Copy full SHA for c0052d8 - Browse repository at this point
Copy the full SHA c0052d8View commit details -
[SPARK-10461] [SQL] make sure
input.primitive
is always variable na……me not code at `GenerateUnsafeProjection` When we generate unsafe code inside `createCodeForXXX`, we always assign the `input.primitive` to a temp variable in case `input.primitive` is expression code. This PR did some refactor to make sure `input.primitive` is always variable name, and some other typo and style fixes. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8613 from cloud-fan/minor.
Configuration menu - View commit details
-
Copy full SHA for 71da163 - Browse repository at this point
Copy the full SHA 71da163View commit details -
[SPARK-9730] [SQL] Add Full Outer Join support for SortMergeJoin
This PR is based on #8383 , thanks to viirya JIRA: https://issues.apache.org/jira/browse/SPARK-9730 This patch adds the Full Outer Join support for SortMergeJoin. A new class SortMergeFullJoinScanner is added to scan rows from left and right iterators. FullOuterIterator is simply a wrapper of type RowIterator to consume joined rows from SortMergeFullJoinScanner. Closes #8383 Author: Liang-Chi Hsieh <viirya@appier.com> Author: Davies Liu <davies@databricks.com> Closes #8579 from davies/smj_fullouter.
Configuration menu - View commit details
-
Copy full SHA for 45de518 - Browse repository at this point
Copy the full SHA 45de518View commit details
Commits on Sep 10, 2015
-
[SPARK-9772] [PYSPARK] [ML] Add Python API for ml.feature.VectorSlicer
Add Python API for ml.feature.VectorSlicer. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8102 from yanboliang/SPARK-9772.
Configuration menu - View commit details
-
Copy full SHA for 56a0fe5 - Browse repository at this point
Copy the full SHA 56a0fe5View commit details -
[MINOR] [MLLIB] [ML] [DOC] fixed typo: label for negative result shou…
…ld be 0.0 (original: 1.0) Small typo in the example for `LabelledPoint` in the MLLib docs. Author: Sean Paradiso <seanparadiso@gmail.com> Closes #8680 from sparadiso/docs_mllib_smalltypo.
Configuration menu - View commit details
-
Copy full SHA for 1dc7548 - Browse repository at this point
Copy the full SHA 1dc7548View commit details -
[SPARK-10497] [BUILD] [TRIVIAL] Handle both locations for JIRAError w…
…ith python-jira Location of JIRAError has moved between old and new versions of python-jira package. Longer term it probably makes sense to pin to specific versions (as mentioned in https://issues.apache.org/jira/browse/SPARK-10498 ) but for now, making release tools works with both new and old versions of python-jira. Author: Holden Karau <holden@pigscanfly.ca> Closes #8661 from holdenk/SPARK-10497-release-utils-does-not-work-with-new-jira-python.
Configuration menu - View commit details
-
Copy full SHA for 48817cc - Browse repository at this point
Copy the full SHA 48817ccView commit details -
[SPARK-10065] [SQL] avoid the extra copy when generate unsafe array
The reason for this extra copy is that we iterate the array twice: calculate elements data size and copy elements to array buffer. A simple solution is to follow `createCodeForStruct`, we can dynamically grow the buffer when needed and thus don't need to know the data size ahead. This PR also include some typo and style fixes, and did some minor refactor to make sure `input.primitive` is always variable name not code when generate unsafe code. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8496 from cloud-fan/avoid-copy.
Configuration menu - View commit details
-
Copy full SHA for 4f1daa1 - Browse repository at this point
Copy the full SHA 4f1daa1View commit details -
[SPARK-7142] [SQL] Minor enhancement to BooleanSimplification Optimiz…
…er rule Use these in the optimizer as well: A and (not(A) or B) => A and B not(A and B) => not(A) or not(B) not(A or B) => not(A) and not(B) Author: Yash Datta <Yash.Datta@guavus.com> Closes #5700 from saucam/bool_simp.
Configuration menu - View commit details
-
Copy full SHA for f892d92 - Browse repository at this point
Copy the full SHA f892d92View commit details -
[SPARK-10301] [SPARK-10428] [SQL] Addresses comments of PR #8583 and #…
…8509 for master Author: Cheng Lian <lian@databricks.com> Closes #8670 from liancheng/spark-10301/address-pr-comments.
Configuration menu - View commit details
-
Copy full SHA for 49da38e - Browse repository at this point
Copy the full SHA 49da38eView commit details -
[SPARK-10466] [SQL] UnsafeRow SerDe exception with data spill
Data Spill with UnsafeRow causes assert failure. ``` java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.writeKey(UnsafeRowSerializer.scala:75) at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:180) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:688) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:687) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:687) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:683) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:683) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:80) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ``` To reproduce that with code (thanks andrewor14): ```scala bin/spark-shell --master local --conf spark.shuffle.memoryFraction=0.005 --conf spark.shuffle.sort.bypassMergeThreshold=0 sc.parallelize(1 to 2 * 1000 * 1000, 10) .map { i => (i, i) }.toDF("a", "b").groupBy("b").avg().count() ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #8635 from chenghao-intel/unsafe_spill.
Configuration menu - View commit details
-
Copy full SHA for e048111 - Browse repository at this point
Copy the full SHA e048111View commit details -
[SPARK-10469] [DOC] Try and document the three options
From JIRA: Add documentation for tungsten-sort. From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'hash' two options)." Author: Holden Karau <holden@pigscanfly.ca> Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.
1Configuration menu - View commit details
-
Copy full SHA for a76bde9 - Browse repository at this point
Copy the full SHA a76bde9View commit details -
[SPARK-8167] Make tasks that fail from YARN preemption not fail job
The architecture is that, in YARN mode, if the driver detects that an executor has disconnected, it asks the ApplicationMaster why the executor died. If the ApplicationMaster is aware that the executor died because of preemption, all tasks associated with that executor are not marked as failed. The executor is still removed from the driver's list of available executors, however. There's a few open questions: 1. Should standalone mode have a similar "get executor loss reason" as well? I localized this change as much as possible to affect only YARN, but there could be a valid case to differentiate executor losses in standalone mode as well. 2. I make a pretty strong assumption in YarnAllocator that getExecutorLossReason(executorId) will only be called once per executor id; I do this so that I can remove the metadata from the in-memory map to avoid object accumulation. It's not clear if I'm being overly zealous to save space, however. cc vanzin specifically for review because it collided with some earlier YARN scheduling work. cc JoshRosen because it's similar to output commit coordination we did in the past cc andrewor14 for our discussion on how to get executor exit codes and loss reasons Author: mcheah <mcheah@palantir.com> Closes #8007 from mccheah/feature/preemption-handling.
Configuration menu - View commit details
-
Copy full SHA for af3bc59 - Browse repository at this point
Copy the full SHA af3bc59View commit details -
[SPARK-6350] [MESOS] Fine-grained mode scheduler respects mesosExecut…
Configuration menu - View commit details
-
Copy full SHA for f0562e8 - Browse repository at this point
Copy the full SHA f0562e8View commit details -
[SPARK-10514] [MESOS] waiting for min no of total cores acquired by S…
…park by implementing the sufficientResourcesRegistered method spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode. If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true. There are no existing test for YARN mode too. Hence not added test for the same. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8672 from SleepyThread/master.
Configuration menu - View commit details
-
Copy full SHA for a5ef2d0 - Browse repository at this point
Copy the full SHA a5ef2d0View commit details -
[SPARK-9990] [SQL] Create local hash join operator
This PR includes the following changes: - Add SQLConf to LocalNode - Add HashJoinNode - Add ConvertToUnsafeNode and ConvertToSafeNode.scala to test unsafe hash join. Author: zsxwing <zsxwing@gmail.com> Closes #8535 from zsxwing/SPARK-9990.
Configuration menu - View commit details
-
Copy full SHA for d88abb7 - Browse repository at this point
Copy the full SHA d88abb7View commit details -
[SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataF…
…rame. this PR : 1. Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side. 2. Enhance the SerDe to support transferring a Scala seq to R side. Data of ArrayType in DataFrame after collection is observed to be of Scala Seq type. 3. Support ArrayType in createDataFrame(). Author: Sun Rui <rui.sun@intel.com> Closes #8458 from sun-rui/SPARK-10049.
Configuration menu - View commit details
-
Copy full SHA for 45e3be5 - Browse repository at this point
Copy the full SHA 45e3be5View commit details -
[SPARK-10443] [SQL] Refactor SortMergeOuterJoin to reduce duplication
`LeftOutputIterator` and `RightOutputIterator` are symmetrically identical and can share a lot of code. If someone makes a change in one but forgets to do the same thing in the other we'll end up with inconsistent behavior. This patch also adds inline comments to clarify the intention of the code. Author: Andrew Or <andrew@databricks.com> Closes #8596 from andrewor14/smoj-cleanup.
Configuration menu - View commit details
-
Copy full SHA for 3db7255 - Browse repository at this point
Copy the full SHA 3db7255View commit details -
Add 1.5 to master branch EC2 scripts
This change brings it to par with `branch-1.5` (and 1.5.0 release) Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8704 from shivaram/ec2-1.5-update.
Configuration menu - View commit details
-
Copy full SHA for 4204757 - Browse repository at this point
Copy the full SHA 4204757View commit details -
[SPARK-7544] [SQL] [PySpark] pyspark.sql.types.Row implements __getit…
…em__ pyspark.sql.types.Row implements ```__getitem__``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #8333 from yanboliang/spark-7544.
Configuration menu - View commit details
-
Copy full SHA for 89562a1 - Browse repository at this point
Copy the full SHA 89562a1View commit details
Commits on Sep 11, 2015
-
[SPARK-9043] Serialize key, value and combiner classes in ShuffleDepe…
…ndency ShuffleManager implementations are currently not given type information for the key, value and combiner classes. Serialization of shuffle objects relies on objects being JavaSerializable, with methods defined for reading/writing the object or, alternatively, serialization via Kryo which uses reflection. Serialization systems like Avro, Thrift and Protobuf generate classes with zero argument constructors and explicit schema information (e.g. IndexedRecords in Avro have get, put and getSchema methods). By serializing the key, value and combiner class names in ShuffleDependency, shuffle implementations will have access to schema information when registerShuffle() is called. Author: Matt Massie <massie@cs.berkeley.edu> Closes #7403 from massie/shuffle-classtags.
Configuration menu - View commit details
-
Copy full SHA for 0eabea8 - Browse repository at this point
Copy the full SHA 0eabea8View commit details -
[SPARK-10023] [ML] [PySpark] Unified DecisionTreeParams checkpointInt…
…erval between Scala and Python API. "checkpointInterval" is member of DecisionTreeParams in Scala API which is inconsistency with Python API, we should unified them. ``` member of DecisionTreeParams <-> Scala API shared param for all ML Transformer/Estimator <-> Python API ``` Proposal: "checkpointInterval" is also used by ALS, so we make it shared params at Scala. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8528 from yanboliang/spark-10023.
Configuration menu - View commit details
-
Copy full SHA for 339a527 - Browse repository at this point
Copy the full SHA 339a527View commit details -
[SPARK-10027] [ML] [PySpark] Add Python API missing methods for ml.fe…
…ature Missing method of ml.feature are listed here: ```StringIndexer``` lacks of parameter ```handleInvalid```. ```StringIndexerModel``` lacks of method ```labels```. ```VectorIndexerModel``` lacks of methods ```numFeatures``` and ```categoryMaps```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8313 from yanboliang/spark-10027.
Configuration menu - View commit details
-
Copy full SHA for a140dd7 - Browse repository at this point
Copy the full SHA a140dd7View commit details -
[SPARK-10472] [SQL] Fixes DataType.typeName for UDT
Before this fix, `MyDenseVectorUDT.typeName` gives `mydensevecto`, which is not desirable. Author: Cheng Lian <lian@databricks.com> Closes #8640 from liancheng/spark-10472/udt-type-name.
Configuration menu - View commit details
-
Copy full SHA for e1d7f64 - Browse repository at this point
Copy the full SHA e1d7f64View commit details -
[SPARK-10556] Remove explicit Scala version for sbt project build files
Previously, project/plugins.sbt explicitly set scalaVersion to 2.10.4. This can cause issues when using a version of sbt that is compiled against a different version of Scala (for example sbt 0.13.9 uses 2.10.5). Removing this explicit setting will cause build files to be compiled and run against the same version of Scala that sbt is compiled against. Note that this only applies to the project build files (items in project/), it is distinct from the version of Scala we target for the actual spark compilation. Author: Ahir Reddy <ahirreddy@gmail.com> Closes #8709 from ahirreddy/sbt-scala-version-fix.
Configuration menu - View commit details
-
Copy full SHA for 9bbe33f - Browse repository at this point
Copy the full SHA 9bbe33fView commit details -
[SPARK-10518] [DOCS] Update code examples in spark.ml user guide to u…
…se LIBSVM data source instead of MLUtils I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils Author: y-shimizu <y.shimizu0429@gmail.com> Closes #8697 from y-shimizu/SPARK-10518.
Configuration menu - View commit details
-
Copy full SHA for c268ca4 - Browse repository at this point
Copy the full SHA c268ca4View commit details -
[SPARK-10026] [ML] [PySpark] Implement some common Params for regress…
…ion in PySpark LinearRegression and LogisticRegression lack of some Params for Python, and some Params are not shared classes which lead we need to write them for each class. These kinds of Params are list here: ```scala HasElasticNetParam HasFitIntercept HasStandardization HasThresholds ``` Here we implement them in shared params at Python side and make LinearRegression/LogisticRegression parameters peer with Scala one. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8508 from yanboliang/spark-10026.
Configuration menu - View commit details
-
Copy full SHA for b656e61 - Browse repository at this point
Copy the full SHA b656e61View commit details -
[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronCl…
…assifier Add Python API for ```MultilayerPerceptronClassifier```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8067 from yanboliang/SPARK-9773.
Configuration menu - View commit details
-
Copy full SHA for b01b262 - Browse repository at this point
Copy the full SHA b01b262View commit details -
[SPARK-10537] [ML] document LIBSVM source options in public API doc a…
…nd some minor improvements We should document options in public API doc. Otherwise, it is hard to find out the options without looking at the code. I tried to make `DefaultSource` private and put the documentation to package doc. However, since then there exists no public class under `source.libsvm`, the Java package doc doesn't show up in the generated html file (http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4492654). So I put the doc to `DefaultSource` instead. There are several minor updates in this PR: 1. Do `vectorType == "sparse"` only once. 2. Update `hashCode` and `equals`. 3. Remove inherited doc. 4. Delete temp dir in `afterAll`. Lewuathe Author: Xiangrui Meng <meng@databricks.com> Closes #8699 from mengxr/SPARK-10537.
Configuration menu - View commit details
-
Copy full SHA for 960d2d0 - Browse repository at this point
Copy the full SHA 960d2d0View commit details -
[MINOR] [MLLIB] [ML] [DOC] Minor doc fixes for StringIndexer and Meta…
…dataUtils Changes: * Make Scala doc for StringIndexerInverse clearer. Also remove Scala doc from transformSchema, so that the doc is inherited. * MetadataUtils.scala: “ Helper utilities for tree-based algorithms” —> not just trees anymore CC: holdenk mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #8679 from jkbradley/doc-fixes-1.5.
Configuration menu - View commit details
-
Copy full SHA for 2e3a280 - Browse repository at this point
Copy the full SHA 2e3a280View commit details -
[SPARK-10540] [SQL] Ignore HadoopFsRelationTest's "test all data type…
…s" if it is too flaky If hadoopFsRelationSuites's "test all data types" is too flaky we can disable it for now. https://issues.apache.org/jira/browse/SPARK-10540 Author: Yin Huai <yhuai@databricks.com> Closes #8705 from yhuai/SPARK-10540-ignore.
Configuration menu - View commit details
-
Copy full SHA for 6ce0886 - Browse repository at this point
Copy the full SHA 6ce0886View commit details -
[SPARK-8530] [ML] add python API for MinMaxScaler
jira: https://issues.apache.org/jira/browse/SPARK-8530 add python API for MinMaxScaler jira for MinMaxScaler: https://issues.apache.org/jira/browse/SPARK-7514 Author: Yuhao Yang <hhbyyh@gmail.com> Closes #7150 from hhbyyh/pythonMinMax.
Configuration menu - View commit details
-
Copy full SHA for 5f46444 - Browse repository at this point
Copy the full SHA 5f46444View commit details -
[SPARK-10546] Check partitionId's range in ExternalSorter#spill()
See this thread for background: http://search-hadoop.com/m/q3RTt0rWvIkHAE81 We should check the range of partition Id and provide meaningful message through exception. Alternatively, we can use abs() and modulo to force the partition Id into legitimate range. However, expectation is that user should correct the logic error in his / her code. Author: tedyu <yuzhihong@gmail.com> Closes #8703 from tedyu/master.
Configuration menu - View commit details
-
Copy full SHA for b231ab8 - Browse repository at this point
Copy the full SHA b231ab8View commit details -
[PYTHON] Fixed typo in exception message
Just fixing a typo in exception message, raised when attempting to pickle SparkContext. Author: Icaro Medeiros <icaro.medeiros@gmail.com> Closes #8724 from icaromedeiros/master.
Configuration menu - View commit details
-
Copy full SHA for c373866 - Browse repository at this point
Copy the full SHA c373866View commit details -
[SPARK-10442] [SQL] fix string to boolean cast
When we cast string to boolean in hive, it returns `true` if the length of string is > 0, and spark SQL follows this behavior. However, this behavior is very different from other SQL systems: 1. [presto](https://github.com/facebook/presto/blob/master/presto-main/src/main/java/com/facebook/presto/type/VarcharOperators.java#L89-L118) will return `true` for 't' 'true' '1', `false` for 'f' 'false' '0', throw exception for others. 2. [redshift](http://docs.aws.amazon.com/redshift/latest/dg/r_Boolean_type.html) will return `true` for 't' 'true' 'y' 'yes' '1', `false` for 'f' 'false' 'n' 'no' '0', null for others. 3. [postgresql](http://www.postgresql.org/docs/devel/static/datatype-boolean.html) will return `true` for 't' 'true' 'y' 'yes' 'on' '1', `false` for 'f' 'false' 'n' 'no' 'off' '0', throw exception for others. 4. [vertica](https://my.vertica.com/docs/5.0/HTML/Master/2983.htm) will return `true` for 't' 'true' 'y' 'yes' '1', `false` for 'f' 'false' 'n' 'no' '0', null for others. 5. [impala](http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_boolean.html) throw exception when try to cast string to boolean. 6. mysql, oracle, sqlserver don't have boolean type Whether we should change the cast behavior according to other SQL system or not is not decided yet, this PR is a test to see if we changed, how many compatibility tests will fail. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8698 from cloud-fan/string2boolean.
Configuration menu - View commit details
-
Copy full SHA for d5d6473 - Browse repository at this point
Copy the full SHA d5d6473View commit details -
[SPARK-7142] [SQL] Minor enhancement to BooleanSimplification Optimiz…
Configuration menu - View commit details
-
Copy full SHA for 1eede3b - Browse repository at this point
Copy the full SHA 1eede3bView commit details -
[SPARK-9992] [SPARK-9994] [SPARK-9998] [SQL] Implement the local TopK…
Configuration menu - View commit details
-
Copy full SHA for e626ac5 - Browse repository at this point
Copy the full SHA e626ac5View commit details -
[SPARK-9990] [SQL] Local hash join follow-ups
1. Hide `LocalNodeIterator` behind the `LocalNode#asIterator` method 2. Add tests for this Author: Andrew Or <andrew@databricks.com> Closes #8708 from andrewor14/local-hash-join-follow-up.
Andrew Or committedSep 11, 2015 Configuration menu - View commit details
-
Copy full SHA for c2af42b - Browse repository at this point
Copy the full SHA c2af42bView commit details -
[SPARK-10564] ThreadingSuite: assertion failures in threads don't fai…
…l the test This commit ensures if an assertion fails within a thread, it will ultimately fail the test. Otherwise we end up potentially masking real bugs by not propagating assertion failures properly. Author: Andrew Or <andrew@databricks.com> Closes #8723 from andrewor14/fix-threading-suite.
Andrew Or committedSep 11, 2015 Configuration menu - View commit details
-
Copy full SHA for d74c6a1 - Browse repository at this point
Copy the full SHA d74c6a1View commit details -
[SPARK-9014] [SQL] Allow Python spark API to use built-in exponential…
… operator This PR addresses (SPARK-9014)[https://issues.apache.org/jira/browse/SPARK-9014] Added functionality: `Column` object in Python now supports exponential operator `**` Example: ``` from pyspark.sql import * df = sqlContext.createDataFrame([Row(a=2)]) df.select(3**df.a,df.a**3,df.a**df.a).collect() ``` Outputs: ``` [Row(POWER(3.0, a)=9.0, POWER(a, 3.0)=8.0, POWER(a, a)=4.0)] ``` Author: 0x0FFF <programmerag@gmail.com> Closes #8658 from 0x0FFF/SPARK-9014.
Configuration menu - View commit details
-
Copy full SHA for c34fc19 - Browse repository at this point
Copy the full SHA c34fc19View commit details
Commits on Sep 12, 2015
-
[SPARK-10566] [CORE] SnappyCompressionCodec init exception handling m…
…asks important error information When throwing an IllegalArgumentException in SnappyCompressionCodec.init, chain the existing exception. This allows potentially important debugging info to be passed to the user. Manual testing shows the exception chained properly, and the test suite still looks fine as well. This contribution is my original work and I license the work to the project under the project's open source license. Author: Daniel Imfeld <daniel@danielimfeld.com> Closes #8725 from dimfeld/dimfeld-patch-1.
Configuration menu - View commit details
-
Copy full SHA for 6d83678 - Browse repository at this point
Copy the full SHA 6d83678View commit details -
[SPARK-10554] [CORE] Fix NPE with ShutdownHook
https://issues.apache.org/jira/browse/SPARK-10554 Fixes NPE when ShutdownHook tries to cleanup temporary folders Author: Nithin Asokan <Nithin.Asokan@Cerner.com> Closes #8720 from nasokan/SPARK-10554.
Configuration menu - View commit details
-
Copy full SHA for 8285e3b - Browse repository at this point
Copy the full SHA 8285e3bView commit details -
[SPARK-10547] [TEST] Streamline / improve style of Java API tests
Fix a few Java API test style issues: unused generic types, exceptions, wrong assert argument order Author: Sean Owen <sowen@cloudera.com> Closes #8706 from srowen/SPARK-10547.
Configuration menu - View commit details
-
Copy full SHA for 22730ad - Browse repository at this point
Copy the full SHA 22730adView commit details -
[SPARK-6548] Adding stddev to DataFrame functions
Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change. Author: JihongMa <linlin200605@gmail.com> Author: Jihong MA <linlin200605@gmail.com> Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com> Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local> Closes #6297 from JihongMA/SPARK-SQL.
Configuration menu - View commit details
-
Copy full SHA for f4a2280 - Browse repository at this point
Copy the full SHA f4a2280View commit details -
[SPARK-10330] Add Scalastyle rule to require use of SparkHadoopUtil J…
…obContext methods This is a followup to #8499 which adds a Scalastyle rule to mandate the use of SparkHadoopUtil's JobContext accessor methods and fixes the existing violations. Author: Josh Rosen <joshrosen@databricks.com> Closes #8521 from JoshRosen/SPARK-10330-part2.
Configuration menu - View commit details
-
Copy full SHA for b3a7480 - Browse repository at this point
Copy the full SHA b3a7480View commit details
Commits on Sep 13, 2015
-
[SPARK-10222] [GRAPHX] [DOCS] More thoroughly deprecate Bagel in favo…
…r of GraphX Finish deprecating Bagel; remove reference to nonexistent example Author: Sean Owen <sowen@cloudera.com> Closes #8731 from srowen/SPARK-10222.
Configuration menu - View commit details
-
Copy full SHA for 1dc614b - Browse repository at this point
Copy the full SHA 1dc614bView commit details
Commits on Sep 14, 2015
-
[SPARK-9720] [ML] Identifiable types need UID in toString methods
A few Identifiable types did override their toString method but without using the parent implementation. As a consequence, the uid was not present anymore in the toString result. It is the default behaviour. This patch is a quick fix. The question of enforcement is still up. No tests have been written to verify the toString method behaviour. That would be long to do because all types should be tested and not only those which have a regression now. It is possible to enforce the condition using the compiler by making the toString method final but that would introduce unwanted potential API breaking changes (see jira). Author: Bertrand Dechoux <BertrandDechoux@users.noreply.github.com> Closes #8062 from BertrandDechoux/SPARK-9720.
Configuration menu - View commit details
-
Copy full SHA for d815654 - Browse repository at this point
Copy the full SHA d815654View commit details -
[SPARK-9899] [SQL] log warning for direct output committer with specu…
…lation enabled This is a follow-up of #8317. When speculation is enabled, there may be multiply tasks writing to the same path. Generally it's OK as we will write to a temporary directory first and only one task can commit the temporary directory to target path. However, when we use direct output committer, tasks will write data to target path directly without temporary directory. This causes problems like corrupted data. Please see [PR comment](#8191 (comment)) for more details. Unfortunately, we don't have a simple flag to tell if a output committer will write to temporary directory or not, so for safety, we have to disable any customized output committer when `speculation` is true. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8687 from cloud-fan/direct-committer.
Configuration menu - View commit details
-
Copy full SHA for 32407bf - Browse repository at this point
Copy the full SHA 32407bfView commit details -
[SPARK-10584] [DOC] [SQL] Documentation about spark.sql.hive.metastor…
…e.version is wrong. The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1. Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8739 from sarutak/SPARK-10584.
Configuration menu - View commit details
-
Copy full SHA for cf2821e - Browse repository at this point
Copy the full SHA cf2821eView commit details -
[SPARK-10194] [MLLIB] [PYSPARK] SGD algorithms need convergenceTol pa…
…rameter in Python [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382) added a ```convergenceTol``` parameter for GradientDescent-based methods in Scala. We need that parameter in Python; otherwise, Python users will not be able to adjust that behavior (or even reproduce behavior from previous releases since the default changed). Author: Yanbo Liang <ybliang8@gmail.com> Closes #8457 from yanboliang/spark-10194.
Configuration menu - View commit details
-
Copy full SHA for ce6f3f1 - Browse repository at this point
Copy the full SHA ce6f3f1View commit details -
[SPARK-10573] [ML] IndexToString output schema should be StringType
Fixes bug where IndexToString output schema was DoubleType. Correct me if I'm wrong, but it doesn't seem like the output needs to have any "ML Attribute" metadata. Author: Nick Pritchard <nicholas.pritchard@falkonry.com> Closes #8751 from pnpritchard/SPARK-10573.
2Configuration menu - View commit details
-
Copy full SHA for 8a634e9 - Browse repository at this point
Copy the full SHA 8a634e9View commit details -
[SPARK-10522] [SQL] Nanoseconds of Timestamp in Parquet should be pos…
…itive Or Hive can't read it back correctly. Thanks vanzin for report this. Author: Davies Liu <davies@databricks.com> Closes #8674 from davies/positive_nano.
Configuration menu - View commit details
-
Copy full SHA for 7e32387 - Browse repository at this point
Copy the full SHA 7e32387View commit details -
[SPARK-6981] [SQL] Factor out SparkPlanner and QueryExecution from SQ…
…LContext Alternative to PR #6122; in this case the refactored out classes are replaced by inner classes with the same name for backwards binary compatibility * process in a lighter-weight, backwards-compatible way Author: Edoardo Vacchi <uncommonnonsense@gmail.com> Closes #6356 from evacchi/sqlctx-refactoring-lite.
2Configuration menu - View commit details
-
Copy full SHA for 64f0415 - Browse repository at this point
Copy the full SHA 64f0415View commit details -
[SPARK-9996] [SPARK-9997] [SQL] Add local expand and NestedLoopJoin o…
Configuration menu - View commit details
-
Copy full SHA for 217e496 - Browse repository at this point
Copy the full SHA 217e496View commit details -
[SPARK-10594] [YARN] Remove reference to --num-executors, add --prope…
…rties-file `ApplicationMaster` no longer has the `--num-executors` flag, and had an undocumented `--properties-file` configuration option. cc srowen Author: Erick Tryzelaar <erick.tryzelaar@gmail.com> Closes #8754 from erickt/master.
2Configuration menu - View commit details
-
Copy full SHA for 16b6d18 - Browse repository at this point
Copy the full SHA 16b6d18View commit details -
[SPARK-10576] [BUILD] Move .java files out of src/main/scala
Move .java files in `src/main/scala` to `src/main/java` root, except for `package-info.java` (to stay next to package.scala) Author: Sean Owen <sowen@cloudera.com> Closes #8736 from srowen/SPARK-10576.
Configuration menu - View commit details
-
Copy full SHA for 4e2242b - Browse repository at this point
Copy the full SHA 4e2242bView commit details -
[SPARK-10549] scala 2.11 spark on yarn with security - Repl doesn't work
Make this lazy so that it can set the yarn mode before creating the securityManager. Author: Tom Graves <tgraves@yahoo-inc.com> Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #8719 from tgravescs/SPARK-10549.
Tom Graves authored and Andrew Or committedSep 14, 2015 Configuration menu - View commit details
-
Copy full SHA for ffbbc2c - Browse repository at this point
Copy the full SHA ffbbc2cView commit details -
[SPARK-10543] [CORE] Peak Execution Memory Quantile should be Per-tas…
…k Basis Read `PEAK_EXECUTION_MEMORY` using `update` to get per task partial value instead of cumulative value. I tested with this workload: ```scala val size = 1000 val repetitions = 10 val data = sc.parallelize(1 to size, 5).map(x => (util.Random.nextInt(size / repetitions),util.Random.nextDouble)).toDF("key", "value") val res = data.toDF.groupBy("key").agg(sum("value")).count ``` Before: ![image](https://cloud.githubusercontent.com/assets/4317392/9828197/07dd6874-58b8-11e5-9bd9-6ba927c38b26.png) After: ![image](https://cloud.githubusercontent.com/assets/4317392/9828151/a5ddff30-58b7-11e5-8d31-eda5dc4eae79.png) Tasks view: ![image](https://cloud.githubusercontent.com/assets/4317392/9828199/17dc2b84-58b8-11e5-92a8-be89ce4d29d1.png) cc andrewor14 I appreciate if you can give feedback on this since I think you introduced display of this metric. Author: Forest Fang <forest.fang@outlook.com> Closes #8726 from saurfang/stagepage.
Configuration menu - View commit details
-
Copy full SHA for fd1e8cd - Browse repository at this point
Copy the full SHA fd1e8cdView commit details -
[SPARK-10564] ThreadingSuite: assertion failures in threads don't fai…
…l the test (round 2) This is a follow-up patch to #8723. I missed one case there. Author: Andrew Or <andrew@databricks.com> Closes #8727 from andrewor14/fix-threading-suite.
Andrew Or committedSep 14, 2015 Configuration menu - View commit details
-
Copy full SHA for 7b6c856 - Browse repository at this point
Copy the full SHA 7b6c856View commit details
Commits on Sep 15, 2015
-
[SPARK-9851] Support submitting map stages individually in DAGScheduler
This patch adds support for submitting map stages in a DAG individually so that we can make downstream decisions after seeing statistics about their output, as part of SPARK-9850. I also added more comments to many of the key classes in DAGScheduler. By itself, the patch is not super useful except maybe to switch between a shuffle and broadcast join, but with the other subtasks of SPARK-9850 we'll be able to do more interesting decisions. The main entry point is SparkContext.submitMapStage, which lets you run a map stage and see stats about the map output sizes. Other stats could also be collected through accumulators. See AdaptiveSchedulingSuite for a short example. Author: Matei Zaharia <matei@databricks.com> Closes #8180 from mateiz/spark-9851.
Configuration menu - View commit details
-
Copy full SHA for 1a09552 - Browse repository at this point
Copy the full SHA 1a09552View commit details -
[SPARK-10542] [PYSPARK] fix serialize namedtuple
Author: Davies Liu <davies@databricks.com> Closes #8707 from davies/fix_namedtuple.
Configuration menu - View commit details
-
Copy full SHA for 5520418 - Browse repository at this point
Copy the full SHA 5520418View commit details -
[SPARK-9793] [MLLIB] [PYSPARK] PySpark DenseVector, SparseVector impl…
…ement __eq__ and __hash__ correctly PySpark DenseVector, SparseVector ```__eq__``` method should use semantics equality, and DenseVector can compared with SparseVector. Implement PySpark DenseVector, SparseVector ```__hash__``` method based on the first 16 entries. That will make PySpark Vector objects can be used in collections. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8166 from yanboliang/spark-9793.
Configuration menu - View commit details
-
Copy full SHA for 4ae4d54 - Browse repository at this point
Copy the full SHA 4ae4d54View commit details -
[SPARK-10273] Add @SInCE annotation to pyspark.mllib.feature
Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings). Added since to methods + "versionadded::" to classes (derived from the git file history in pyspark). Author: noelsmith <mail@noelsmith.com> Closes #8633 from noel-smith/SPARK-10273-since-mllib-feature.
Configuration menu - View commit details
-
Copy full SHA for 610971e - Browse repository at this point
Copy the full SHA 610971eView commit details -
[SPARK-10275] [MLLIB] Add @SInCE annotation to pyspark.mllib.random
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8666 from yu-iskw/SPARK-10275.
Configuration menu - View commit details
-
Copy full SHA for a224935 - Browse repository at this point
Copy the full SHA a224935View commit details -
Links work now properly + consistent use of *Spark standalone cluster* (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs). Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8759 from jaceklaskowski/docs-submitting-apps.
Configuration menu - View commit details
-
Copy full SHA for 833be73 - Browse repository at this point
Copy the full SHA 833be73View commit details -
Comments preceding toMessage method state: "The edge partition is encoded in the lower * 30 bytes of the Int, and the position is encoded in the upper 2 bytes of the Int.". References to bytes should be changed to bits. This contribution is my original work and I license the work to the Spark project under it's open source license. Author: Robin East <robin.east@xense.co.uk> Closes #8756 from insidedctm/master.
Configuration menu - View commit details
-
Copy full SHA for 6503c4b - Browse repository at this point
Copy the full SHA 6503c4bView commit details -
Update version to 1.6.0-SNAPSHOT.
Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.
1Configuration menu - View commit details
-
Copy full SHA for 09b7e7c - Browse repository at this point
Copy the full SHA 09b7e7cView commit details -
[SPARK-10491] [MLLIB] move RowMatrix.dspr to BLAS
jira: https://issues.apache.org/jira/browse/SPARK-10491 We implemented dspr with sparse vector support in `RowMatrix`. This method is also used in WeightedLeastSquares and other places. It would be useful to move it to `linalg.BLAS`. Let me know if new UT needed. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8663 from hhbyyh/movedspr.
Configuration menu - View commit details
-
Copy full SHA for c35fdcb - Browse repository at this point
Copy the full SHA c35fdcbView commit details -
[SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.
This change does two things: - tag a few tests and adds the mechanism in the build to be able to disable those tags, both in maven and sbt, for both junit and scalatest suites. - add some logic to run-tests.py to disable some tags depending on what files have changed; that's used to disable expensive tests when a module hasn't explicitly been changed, to speed up testing for changes that don't directly affect those modules. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8437 from vanzin/test-tags.
Marcelo Vanzin committedSep 15, 2015 Configuration menu - View commit details
-
Copy full SHA for 8abef21 - Browse repository at this point
Copy the full SHA 8abef21View commit details -
[PYSPARK] [MLLIB] [DOCS] Replaced addversion with versionadded in mll…
…ib.random Missed this when reviewing `pyspark.mllib.random` for SPARK-10275. Author: noelsmith <mail@noelsmith.com> Closes #8773 from noel-smith/mllib-random-versionadded-fix.
Configuration menu - View commit details
-
Copy full SHA for 7ca30b5 - Browse repository at this point
Copy the full SHA 7ca30b5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d9ab01 - Browse repository at this point
Copy the full SHA 0d9ab01View commit details