Skip to content

Spark 3175: Branch-1.1 SBT build failed for Yarn-Alpha#2085

Closed
chesterxgchen wants to merge 231 commits intoapache:masterfrom
AlpineNow:SPARK-3175
Closed

Spark 3175: Branch-1.1 SBT build failed for Yarn-Alpha#2085
chesterxgchen wants to merge 231 commits intoapache:masterfrom
AlpineNow:SPARK-3175

Conversation

@chesterxgchen
Copy link

The issue is that the yarn/alpha/pom.xml using 1.1.0 instead of 1.1.1-SNAPSHOT version.
update the pom.xml to 1.1.1-SNAPSHOT (same as yarn/stable/pom.xml)

andrewor14 and others added 30 commits August 2, 2014 12:46
`spark.test.home` was introduced in #1734. This is fine for SBT but is failing maven tests. Either way it shouldn't throw an NPE.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1739 from andrewor14/fix-spark-test-home and squashes the following commits:

ce2624c [Andrew Or] Do not throw NPE if spark.test.home is not set
Added experimental Python API for Decision Trees.

API:
* class DecisionTreeModel
** predict() for single examples and RDDs, taking both feature vectors and LabeledPoints
** numNodes()
** depth()
** __str__()
* class DecisionTree
** trainClassifier()
** trainRegressor()
** train()

Examples and testing:
* Added example testing classification and regression with batch prediction: examples/src/main/python/mllib/tree.py
* Have also tested example usage in doc of python/pyspark/mllib/tree.py which tests single-example prediction with dense and sparse vectors

Also: Small bug fix in python/pyspark/mllib/_common.py: In _linear_predictor_typecheck, changed check for RDD to use isinstance() instead of type() in order to catch RDD subclasses.

CC mengxr manishamde

Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>

Closes #1727 from jkbradley/decisiontree-python-new and squashes the following commits:

3744488 [Joseph K. Bradley] Renamed test tree.py to decision_tree_runner.py Small updates based on github review.
6b86a9d [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
affceb9 [Joseph K. Bradley] * Fixed bug in doc tests in pyspark/mllib/util.py caused by change in loadLibSVMFile behavior.  (It used to threshold labels at 0 to make them 0/1, but it now leaves them as they are.) * Fixed small bug in loadLibSVMFile: If a data file had no features, then loadLibSVMFile would create a single all-zero feature.
67a29bc [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
cf46ad7 [Joseph K. Bradley] Python DecisionTreeModel * predict(empty RDD) returns an empty RDD instead of an error. * Removed support for calling predict() on LabeledPoint and RDD[LabeledPoint] * predict() does not cache serialized RDD any more.
aa29873 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
bf21be4 [Joseph K. Bradley] removed old run() func from DecisionTree
fa10ea7 [Joseph K. Bradley] Small style update
7968692 [Joseph K. Bradley] small braces typo fix
e34c263 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
4801b40 [Joseph K. Bradley] Small style update to DecisionTreeSuite
db0eab2 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix2' into decisiontree-python-new
6873fa9 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
225822f [Joseph K. Bradley] Bug: In DecisionTree, the method sequentialBinSearchForOrderedCategoricalFeatureInClassification() indexed bins from 0 to (math.pow(2, featureCategories.toInt - 1) - 1). This upper bound is the bound for unordered categorical features, not ordered ones. The upper bound should be the arity (i.e., max value) of the feature.
93953f1 [Joseph K. Bradley] Likely done with Python API.
6df89a9 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
4562c08 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
665ba78 [Joseph K. Bradley] Small updates towards Python DecisionTree API
188cb0d [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into decisiontree-python-new
6622247 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
b8fac57 [Joseph K. Bradley] Finished Python DecisionTree API and example but need to test a bit more.
2b20c61 [Joseph K. Bradley] Small doc and style updates
1b29c13 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into decisiontree-python-new
584449a [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
dab0b67 [Joseph K. Bradley] Added documentation for DecisionTree internals
8bb8aa0 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-bugfix
978cfcf [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-bugfix
6eed482 [Joseph K. Bradley] In DecisionTree: Changed from using procedural syntax for functions returning Unit to explicitly writing Unit return type.
376dca2 [Joseph K. Bradley] Updated meaning of maxDepth by 1 to fit scikit-learn and rpart. * In code, replaced usages of maxDepth <-- maxDepth + 1 * In params, replace settings of maxDepth <-- maxDepth - 1
e06e423 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into decisiontree-python-new
bab3f19 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
59750f8 [Joseph K. Bradley] * Updated Strategy to check numClassesForClassification only if algo=Classification. * Updates based on comments: ** DecisionTreeRunner *** Made dataFormat arg default to libsvm ** Small cleanups ** tree.Node: Made recursive helper methods private, and renamed them.
52e17c5 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-bugfix
f5a036c [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into decisiontree-python-new
da50db7 [Joseph K. Bradley] Added one more test to DecisionTreeSuite: stump with 2 continuous variables for binary classification.  Caused problems in past, but fixed now.
8e227ea [Joseph K. Bradley] Changed Strategy so it only requires numClassesForClassification >= 2 for classification
cd1d933 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into decisiontree-python-new
8ea8750 [Joseph K. Bradley] Bug fix: Off-by-1 when finding thresholds for splits for continuous features.
8a758db [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into decisiontree-python-new
5fe44ed [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-python-new
2283df8 [Joseph K. Bradley] 2 bug fixes.
73fbea2 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into decisiontree-bugfix
5f920a1 [Joseph K. Bradley] Demonstration of bug before submitting fix: Updated DecisionTreeSuite so that 3 tests fail.  Will describe bug in next commit.
f825352 [Joseph K. Bradley] Wrote Python API and example for DecisionTree.  Also added toString, depth, and numNodes methods to DecisionTreeModel.

(cherry picked from commit 3f67382)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
I think we will not generate the plan triggering this bug at this moment. But, let me explain it...

Right now, we are using `left.outputPartitioning` as the `outputPartitioning` of a `BroadcastHashJoin`. We may have a wrong physical plan for cases like...
```sql
SELECT l.key, count(*)
FROM (SELECT key, count(*) as cnt
      FROM src
      GROUP BY key) l // This is buildPlan
JOIN r // This is the streamedPlan
ON (l.cnt = r.value)
GROUP BY l.key
```
Let's say we have a `BroadcastHashJoin` on `l` and `r`. For this case, we will pick `l`'s `outputPartitioning` for the `outputPartitioning`of the `BroadcastHashJoin` on `l` and `r`. Also, because the last `GROUP BY` is using `l.key` as the key, we will not introduce an `Exchange` for this aggregation. However, `r`'s outputPartitioning may not match the required distribution of the last `GROUP BY` and we fail to group data correctly.

JIRA is being reindexed. I will create a JIRA ticket once it is back online.

Author: Yin Huai <huai@cse.ohio-state.edu>

Closes #1735 from yhuai/BroadcastHashJoin and squashes the following commits:

96d9cb3 [Yin Huai] Set outputPartitioning correctly.

(cherry picked from commit 67bd8e3)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: Chris Fregly <chris@fregly.com>

Closes #1434 from cfregly/master and squashes the following commits:

4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method
0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl
691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams
0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated docs
e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into the examples/ dir
d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the KinesisUtils api
912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class
db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client
338997e [Chris Fregly] improve build docs for kinesis
828f8ae [Chris Fregly] more cleanup
e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
cd68c0d [Chris Fregly] fixed typos and backward compatibility
d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support

(cherry picked from commit 91f9504)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
This also Closes #1701.

Author: GuoQiang Li <witgo@qq.com>

Closes #1208 from witgo/SPARK-1470 and squashes the following commits:

422646b [GuoQiang Li] Remove scalalogging-slf4j dependency
This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.

Scala:
```scala
registerFunction("strLenScala", (_: String).length)
sql("SELECT strLenScala('test')")
```
Python:
```python
sqlCtx.registerFunction("strLenPython", lambda x: len(x), IntegerType())
sqlCtx.sql("SELECT strLenPython('test')")
```
Java:
```java
sqlContext.registerFunction("stringLengthJava", new UDF1<String, Integer>() {
  Override
  public Integer call(String str) throws Exception {
    return str.length();
  }
}, DataType.IntegerType);

sqlContext.sql("SELECT stringLengthJava('test')");
```

Author: Michael Armbrust <michael@databricks.com>

Closes #1063 from marmbrus/udfs and squashes the following commits:

9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.

(cherry picked from commit 158ad0b)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…rted Hive commands.

Author: Michael Armbrust <michael@databricks.com>

Closes #1742 from marmbrus/asserts and squashes the following commits:

5182d54 [Michael Armbrust] Remove assertions that throw when users try unsupported Hive commands.

(cherry picked from commit 198df11)
Signed-off-by: Michael Armbrust <michael@databricks.com>
This is a follow up of #1636.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1738 from liancheng/test-for-spark-2729 and squashes the following commits:

b13692a [Cheng Lian] Added test case for SPARK-2729

(cherry picked from commit 866cf1f)
Signed-off-by: Michael Armbrust <michael@databricks.com>
The cause is explained in https://issues.apache.org/jira/browse/SPARK-2797.

Author: Yin Huai <huai@cse.ohio-state.edu>

Closes #1745 from yhuai/SPARK-2797 and squashes the following commits:

7b1627d [Yin Huai] The unpersist method of the Scala RDD cannot be called without the input parameter (blocking) from PySpark.

(cherry picked from commit d210022)
Signed-off-by: Michael Armbrust <michael@databricks.com>
There have been user complaints that the difference between `registerAsTable` and `saveAsTable` is too subtle.  This PR addresses this by renaming `registerAsTable` to `registerTempTable`, which more clearly reflects what is happening.  `registerAsTable` remains, but will cause a deprecation warning.

Author: Michael Armbrust <michael@databricks.com>

Closes #1743 from marmbrus/registerTempTable and squashes the following commits:

d031348 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into registerTempTable
4dff086 [Michael Armbrust] Fix .java files too
89a2f12 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into registerTempTable
0b7b71e [Michael Armbrust] Rename registerAsTable to registerTempTable

(cherry picked from commit 1a80437)
Signed-off-by: Michael Armbrust <michael@databricks.com>
As per https://issues.apache.org/jira/browse/SPARK-2602 , this may be resolved for Java 6 with the java.awt.headless system property, which never hurt anyone running a command line app. I tested it and seemed to get rid of focus stealing.

Author: Sean Owen <srowen@gmail.com>

Closes #1747 from srowen/SPARK-2602 and squashes the following commits:

b141018 [Sean Owen] Set java.awt.headless during tests
(cherry picked from commit 33f167d)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
The JIRA concerned removing jquery, and this does not remove jquery. While it is distributed by Spark it should have an accompanying line in LICENSE, very technically, as per http://www.apache.org/dev/licensing-howto.html

Author: Sean Owen <srowen@gmail.com>

Closes #1748 from srowen/SPARK-2414 and squashes the following commits:

2fdb03c [Sean Owen] Add LICENSE entry for jquery
(cherry picked from commit 9cf429a)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Minor fixes on top of #1679.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1736 from andrewor14/amend-#1679 and squashes the following commits:

3b46f5e [Andrew Or] Minor fixes
(cherry picked from commit 3dc55fd)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
…en ...

Per request by Reynold adding small note about proper sequencing of build then test.

Author: Stephen Boesch <javadba@gmail.com>

Closes #1615 from javadba/docs and squashes the following commits:

6c3183e [Stephen Boesch] Moved updated testing blurb per PWendell
5764757 [Stephen Boesch] SPARK-2712 - Add a small note to maven doc that mvn package must happen before test
(cherry picked from commit f8cd143)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Author: Allan Douglas R. de Oliveira <allan@chaordicsystems.com>

Closes #1186 from douglaz/spark_ec2_user_data and squashes the following commits:

94a36f9 [Allan Douglas R. de Oliveira] Added user data option to EC2 script
(cherry picked from commit a0bcbc1)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Bug fix: Before, when an RDD was created in Java and passed to DecisionTree.train(), the fake class tag caused problems.
* Fix: DecisionTree: Used new RDD.retag() method to allow passing RDDs from Java.

Other improvements to Decision Trees for easy-of-use with Java:
* impurity classes: Added instance() methods to help with Java interface.
* Strategy: Added Java-friendly constructor
--> Note: I removed quantileCalculationStrategy from the Java-friendly constructor since (a) it is a special class and (b) there is only 1 option currently.  I suspect we will redo the API before the other options are included.

CC: mengxr

Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>

Closes #1740 from jkbradley/dt-java-new and squashes the following commits:

0805dc6 [Joseph K. Bradley] Changed Strategy to use JavaConverters instead of JavaConversions
519b1b7 [Joseph K. Bradley] * Organized imports in JavaDecisionTreeSuite.java * Using JavaConverters instead of JavaConversions in DecisionTreeSuite.scala
f7b5ca1 [Joseph K. Bradley] Improvements to make it easier to run DecisionTree from Java. * DecisionTree: Used new RDD.retag() method to allow passing RDDs from Java. * impurity classes: Added instance() methods to help with Java interface. * Strategy: Added Java-friendly constructor ** Note: I removed quantileCalculationStrategy from the Java-friendly constructor since (a) it is a special class and (b) there is only 1 option currently.  I suspect we will redo the API before the other options are included.
d78ada6 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-java
320853f [Joseph K. Bradley] Added JavaDecisionTreeSuite, partly written
13a585e [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-java
f1a8283 [Joseph K. Bradley] Added old JavaDecisionTreeSuite, to be updated later
225822f [Joseph K. Bradley] Bug: In DecisionTree, the method sequentialBinSearchForOrderedCategoricalFeatureInClassification() indexed bins from 0 to (math.pow(2, featureCategories.toInt - 1) - 1). This upper bound is the bound for unordered categorical features, not ordered ones. The upper bound should be the arity (i.e., max value) of the feature.

(cherry picked from commit 2998e38)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
… 'spark.sql.dialect'

Many users have reported being confused by the distinction between the `sql` and `hql` methods.  Specifically, many users think that `sql(...)` cannot be used to read hive tables.  In this PR I introduce a new configuration option `spark.sql.dialect` that picks which dialect with be used for parsing.  For SQLContext this must be set to `sql`.  In `HiveContext` it defaults to `hiveql` but can also be set to `sql`.

The `hql` and `hiveql` methods continue to act the same but are now marked as deprecated.

**This is a possibly breaking change for some users unless they set the dialect manually, though this is unlikely.**

For example: `hiveContex.sql("SELECT 1")` will now throw a parsing exception by default.

Author: Michael Armbrust <michael@databricks.com>

Closes #1746 from marmbrus/sqlLanguageConf and squashes the following commits:

ad375cc [Michael Armbrust] Merge remote-tracking branch 'apache/master' into sqlLanguageConf
20c43f8 [Michael Armbrust] override function instead of just setting the value
7e4ae93 [Michael Armbrust] Deprecate hql() method in favor of a config option, 'spark.sql.dialect'

(cherry picked from commit 236dfac)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…commands

JIRA issue: [SPARK-2814](https://issues.apache.org/jira/browse/SPARK-2814)

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1753 from liancheng/spark-2814 and squashes the following commits:

c74a3b2 [Cheng Lian] Fixed SPARK-2814

(cherry picked from commit ac33cbb)
Signed-off-by: Michael Armbrust <michael@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-2783

Author: Yin Huai <huai@cse.ohio-state.edu>

Closes #1741 from yhuai/analyzeTable and squashes the following commits:

7bb5f02 [Yin Huai] Use sql instead of hql.
4d09325 [Yin Huai] Merge remote-tracking branch 'upstream/master' into analyzeTable
e3ebcd4 [Yin Huai] Renaming.
c170f4e [Yin Huai] Do not use getContentSummary.
62393b6 [Yin Huai] Merge remote-tracking branch 'upstream/master' into analyzeTable
db233a6 [Yin Huai] Trying to debug jenkins...
fee84f0 [Yin Huai] Merge remote-tracking branch 'upstream/master' into analyzeTable
f0501f3 [Yin Huai] Fix compilation error.
24ad391 [Yin Huai] Merge remote-tracking branch 'upstream/master' into analyzeTable
8918140 [Yin Huai] Wording.
23df227 [Yin Huai] Add a simple analyze method to get the size of a table and update the "totalSize" property of this table in the Hive metastore.

(cherry picked from commit e139e2b)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Kill only the python worker related to cancelled tasks.

The daemon will start a background thread to monitor all the opened sockets for all workers. If the socket is closed by JVM, this thread will kill the worker.

When an task is cancelled, the socket to worker will be closed, then the worker will be killed by deamon.

Author: Davies Liu <davies.liu@gmail.com>

Closes #1643 from davies/kill and squashes the following commits:

8ffe9f3 [Davies Liu] kill worker by deamon, because runtime.exec() is too heavy
46ca150 [Davies Liu] address comment
acd751c [Davies Liu] kill the worker when task is canceled

(cherry picked from commit 55349f9)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
Needed for Scala 2.11 compiler-interface

Signed-off-by: Anand Avati <avatiredhat.com>

Author: Anand Avati <avati@redhat.com>

Closes #1711 from avati/SPARK-1812-scala-maven-plugin and squashes the following commits:

9a22fc8 [Anand Avati] SPARK-1812: upgrade to scala-maven-plugin 3.2.0
Any time you use the directory name (`FWDIR`) it needs to be surrounded
in quotes. If you're also using wildcards, you can safely put the quotes
around just `$FWDIR`.

Author: Sarah Gerweck <sarah.a180@gmail.com>

Closes #1756 from sarahgerweck/folderSpaces and squashes the following commits:

732629d [Sarah Gerweck] Fix some bugs with spaces in directory name.
(cherry picked from commit 5507dd8)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
…dependent variables or features of data

Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is generally performed during the data preprocessing step.

In this work, a trait called `VectorTransformer` is defined for generic transformation on a vector. It contains one method to be implemented, `transform` which applies transformation on a vector.

There are two implementations of `VectorTransformer` now, and they all can be easily extended with PMML transformation support.

1) `StandardScaler` - Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.

2) `Normalizer` - Normalizes samples individually to unit L^n norm

Author: DB Tsai <dbtsai@alpinenow.com>

Closes #1207 from dbtsai/dbtsai-feature-scaling and squashes the following commits:

78c15d3 [DB Tsai] Alpine Data Labs

(cherry picked from commit ae58aea)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
This is a pull request regarding SPARK-2510 at https://issues.apache.org/jira/browse/SPARK-2510. Word2Vec creates vector representation of words in a text corpus. The algorithm first constructs a vocabulary from the corpus and then learns vector representation of words in the vocabulary. The vector representation can be used as features in natural language processing and machine learning algorithms.

To make our implementation more scalable, we train each partition separately and merge the model of each partition after each iteration. To make the model more accurate, multiple iterations may be needed.

To investigate the vector representations is to find the closest words for a query word. For example, the top 20 closest words to "china" are for 1 partition and 1 iteration :

taiwan 0.8077646146334014
korea 0.740913304563621
japan 0.7240667798885471
republic 0.7107151279078352
thailand 0.6953217332072862
tibet 0.6916782118129544
mongolia 0.6800858715972612
macau 0.6794925677480378
singapore 0.6594048695593799
manchuria 0.658989931844148
laos 0.6512978726001666
nepal 0.6380792327845325
mainland 0.6365469459587788
myanmar 0.6358614338840394
macedonia 0.6322366180313249
xinjiang 0.6285291551708028
russia 0.6279951236068411
india 0.6272874944023487
shanghai 0.6234544135576999
macao 0.6220588462925876

The result with 10 partitions and 5 iterations is:
taiwan 0.8310495079388313
india 0.7737171315919039
japan 0.756777901233668
korea 0.7429767187102452
indonesia 0.7407557427278356
pakistan 0.712883426985585
mainland 0.7053379963140822
thailand 0.696298191073948
mongolia 0.693690656871415
laos 0.6913069680735292
macau 0.6903427690029617
republic 0.6766381604813666
malaysia 0.676460699141784
singapore 0.6728790997360923
malaya 0.672345232966194
manchuria 0.6703732292753156
macedonia 0.6637955686322028
myanmar 0.6589462882439646
kazakhstan 0.657017801081494
cambodia 0.6542383836451932

Author: Liquan Pei <lpei@gopivotal.com>
Author: Xiangrui Meng <meng@databricks.com>
Author: Liquan Pei <liquanpei@gmail.com>

Closes #1719 from Ishiihara/master and squashes the following commits:

2ba9483 [Liquan Pei] minor fix for Word2Vec test
e248441 [Liquan Pei] minor style change
26a948d [Liquan Pei] Merge pull request #1 from mengxr/Ishiihara-master
c14da41 [Xiangrui Meng] fix styles
384c771 [Xiangrui Meng] remove minCount and window from constructor change model to use float instead of double
e93e726 [Liquan Pei] use treeAggregate instead of aggregate
1a8fb41 [Liquan Pei] use weighted sum in combOp
7efbb6f [Liquan Pei] use broadcast version of vocab in aggregate
6bcc8be [Liquan Pei] add multiple iteration support
720b5a3 [Liquan Pei] Add test for Word2Vec algorithm, minor fixes
2e92b59 [Liquan Pei] modify according to feedback
57dc50d [Liquan Pei] code formatting
e4a04d3 [Liquan Pei] minor fix
0aafb1b [Liquan Pei] Add comments, minor fixes
8d6befe [Liquan Pei] initial commit

(cherry picked from commit e053c55)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Add an hook to replace original namedtuple with an pickable one, then namedtuple could be used in RDDs.

PS: pyspark should be import BEFORE "from collections import namedtuple"

Author: Davies Liu <davies.liu@gmail.com>

Closes #1623 from davies/namedtuple and squashes the following commits:

045dad8 [Davies Liu] remove unrelated code changes
4132f32 [Davies Liu] address comment
55b1c1a [Davies Liu] fix tests
61f86eb [Davies Liu] replace all the reference of namedtuple to new hacked one
98df6c6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into namedtuple
f7b1bde [Davies Liu] add hack for CloudPickleSerializer
0c5c849 [Davies Liu] Merge branch 'master' of github.com:apache/spark into namedtuple
21991e6 [Davies Liu] hack namedtuple in __main__ module, make it picklable.
93b03b8 [Davies Liu] pickable namedtuple

(cherry picked from commit 59f84a9)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
…in ExternalMap / Sorter

All these changes are from mridulm's work in #1609, but extracted here to fix this specific issue and make it easier to merge not 1.1. This particular set of changes is to make sure that we read exactly the right range of bytes from each spill file in EAOM: some serializers can write bytes after the last object (e.g. the TC_RESET flag in Java serialization) and that would confuse the previous code into reading it as part of the next batch. There are also improvements to cleanup to make sure files are closed.

In addition to bringing in the changes to ExternalAppendOnlyMap, I also copied them to the corresponding code in ExternalSorter and updated its test suite to test for the same issues.

Author: Matei Zaharia <matei@databricks.com>

Closes #1722 from mateiz/spark-2792 and squashes the following commits:

5d4bfb5 [Matei Zaharia] Make objectStreamReset counter count the last object written too
18fe865 [Matei Zaharia] Update docs on objectStreamReset
576ee83 [Matei Zaharia] Allow objectStreamReset to be 0
0374217 [Matei Zaharia] Remove super paranoid code to close file handles
bda37bb [Matei Zaharia] Implement Mridul's ExternalAppendOnlyMap fixes in ExternalSorter too
0d6dad7 [Matei Zaharia] Added Mridul's test changes for ExternalAppendOnlyMap
9a78e4b [Matei Zaharia] Add @mridulm's fixes to ExternalAppendOnlyMap for batch sizes
serializer is imported multiple times during doctests, so it's better to make _hijack_namedtuple() safe to be called multiple times.

Author: Davies Liu <davies.liu@gmail.com>

Closes #1771 from davies/fix and squashes the following commits:

1a9e336 [Davies Liu] fix unit tests

(cherry picked from commit 9fd82db)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
…duler & SparkContext

Author: Reynold Xin <rxin@apache.org>

Closes #1772 from rxin/accumulator-dagscheduler and squashes the following commits:

6a58520 [Reynold Xin] [SPARK-2323] Exception in accumulator update should not crash DAGScheduler & SparkContext.

(cherry picked from commit 05bf4e4)
Signed-off-by: Reynold Xin <rxin@apache.org>
Replaces this with an O(1) operation that does not have to shift over
the whole tail of the array into the gap produced by the element removed.

Author: Matei Zaharia <matei@databricks.com>

Closes #1773 from mateiz/SPARK-2685 and squashes the following commits:

1ea028a [Matei Zaharia] Update comments in StreamBuffer and EAOM, and reuse ArrayBuffers
eb1abfd [Matei Zaharia] Update ExternalAppendOnlyMap to avoid buffer.remove()

(cherry picked from commit 066765d)
Signed-off-by: Matei Zaharia <matei@databricks.com>
…lling collections

This tracks memory properly if there are multiple spilling collections in the same task (which was a problem before), and also implements an algorithm that lets each thread grow up to 1 / 2N of the memory pool (where N is the number of threads) before spilling, which avoids an inefficiency with small spills we had before (some threads would spill many times at 0-1 MB because the pool was allocated elsewhere).

Author: Matei Zaharia <matei@databricks.com>

Closes #1707 from mateiz/spark-2711 and squashes the following commits:

debf75b [Matei Zaharia] Review comments
24f28f3 [Matei Zaharia] Small rename
c8f3a8b [Matei Zaharia] Update ShuffleMemoryManager to be able to partially grant requests
315e3a5 [Matei Zaharia] Some review comments
b810120 [Matei Zaharia] Create central manager to track memory for all spilling collections

(cherry picked from commit 4fde28c)
Signed-off-by: Matei Zaharia <matei@databricks.com>
davies and others added 25 commits August 19, 2014 22:44
Fix sortByKey() with take()

The function `f` used in mapPartitions should always return an iterator.

Author: Davies Liu <davies.liu@gmail.com>

Closes #2045 from davies/fix_sortbykey and squashes the following commits:

1160f59 [Davies Liu] fix sortByKey() with take()
(cherry picked from commit 0a7ef63)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
This patch adds unit tests for Spark Sink.

It also removes the private[flume] for Spark Sink,
since the sink is instantiated from Flume configuration (looks like this is ignored by reflection which is used by
Flume, but we should still remove it anyway).

Author: Hari Shreedharan <hshreedharan@apache.org>
Author: Hari Shreedharan <hshreedharan@cloudera.com>

Closes #1958 from harishreedharan/spark-sink-test and squashes the following commits:

e3110b9 [Hari Shreedharan] Add a sleep to allow sink to commit the transactions
120b81e [Hari Shreedharan] Fix complexity in threading model in test
4df5be6 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into spark-sink-test
c9190d1 [Hari Shreedharan] Indentation and spaces changes
7fedc5a [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into spark-sink-test
abc20cb [Hari Shreedharan] Minor test changes
7b9b649 [Hari Shreedharan] Merge branch 'master' into spark-sink-test
f2c56c9 [Hari Shreedharan] Update SparkSinkSuite.scala
a24aac8 [Hari Shreedharan] Remove unused var
c86d615 [Hari Shreedharan] [SPARK-3054][STREAMING] Add unit tests for Spark Sink.

(cherry picked from commit 8c5a222)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
…led.

Currently we have a separate profile called hive-thriftserver. I originally suggested this in case users did not want to bundle the thriftserver, but it's ultimately lead to a lot of confusion. Since the thriftserver is only a few classes, I don't see a really good reason to isolate it from the rest of Hive. So let's go ahead and just include it in the same profile to simplify things.

This has been suggested in the past by liancheng.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #2006 from pwendell/hiveserver and squashes the following commits:

742ea40 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into hiveserver
034ad47 [Patrick Wendell] SPARK-3092: Always include the thriftserver when -Phive is enabled.
(cherry picked from commit f2f26c2)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
This PR fixes two issues:

1. Fixes wrongly quoted command line option in `HiveThriftServer2Suite` that makes test cases hang until timeout.
1. Asks `dev/run-test` to run Spark SQL tests when `bin/spark-sql` and/or `sbin/start-thriftserver.sh` are modified.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #2036 from liancheng/fix-thriftserver-test and squashes the following commits:

f38c4eb [Cheng Lian] Fixed the same quotation issue in CliSuite
26b82a0 [Cheng Lian] Run SQL tests when dff contains bin/spark-sql and/or sbin/start-thriftserver.sh
a87f83d [Cheng Lian] Extended timeout
e5aa31a [Cheng Lian] Fixed metastore JDBC URI quotation

(cherry picked from commit cf46e72)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…n when EventLogging is enabled

#1891 was to avoid IOException when EventLogging is enabled.
The solution used ShutdownHookManager but it was defined only Hadoop 2.x. Hadoop 1.x don't have ShutdownHookManager so #1891 doesn't compile on Hadoop 1.x

Now, I had a compromised solution for both Hadoop 1.x and 2.x.
Only for FileLogger, an unique FileSystem object is created.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #1970 from sarutak/SPARK-2970 and squashes the following commits:

240c91e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2970
0e7b45d [Kousuke Saruta] Revert "[SPARK-2970] [SQL] spark-sql script ends with IOException when EventLogging is enabled"
e1262ec [Kousuke Saruta] Modified Filelogger to use unique FileSystem instance
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2060 from sarutak/SPARK-3149 and squashes the following commits:

1cc89af [Kousuke Saruta] Modified log message of accepting connection

(cherry picked from commit c1ba4cd)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
In client deploy mode, the driver is launched from within `SparkSubmit`'s JVM. This means by the time we parse Spark configs from `spark-defaults.conf`, it is already too late to control certain properties of the driver's JVM. We currently ignore these configs in client mode altogether.
```
spark.driver.memory
spark.driver.extraJavaOptions
spark.driver.extraClassPath
spark.driver.extraLibraryPath
```
This PR handles these properties before launching the driver JVM. It achieves this by spawning a separate JVM that runs a new class called `SparkSubmitDriverBootstrapper`, which spawns `SparkSubmit` as a sub-process with the appropriate classpath, library paths, java opts and memory.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1845 from andrewor14/handle-configs-bash and squashes the following commits:

bed4bdf [Andrew Or] Change a few comments / messages (minor)
24dba60 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
08fd788 [Andrew Or] Warn against external usages of SparkSubmitDriverBootstrapper
ff34728 [Andrew Or] Minor comments
51aeb01 [Andrew Or] Filter out JVM memory in Scala rather than Bash (minor)
9a778f6 [Andrew Or] Fix PySpark: actually kill driver on termination
d0f20db [Andrew Or] Don't pass empty library paths, classpath, java opts etc.
a78cb26 [Andrew Or] Revert a few changes in utils.sh (minor)
9ba37e2 [Andrew Or] Don't barf when the properties file does not exist
8867a09 [Andrew Or] A few more naming things (minor)
19464ad [Andrew Or] SPARK_SUBMIT_JAVA_OPTS -> SPARK_SUBMIT_OPTS
d6488f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
1ea6bbe [Andrew Or] SparkClassLauncher -> SparkSubmitDriverBootstrapper
a91ea19 [Andrew Or] Fix precedence of library paths, classpath, java opts and memory
158f813 [Andrew Or] Remove "client mode" boolean argument
c84f5c8 [Andrew Or] Remove debug print statement (minor)
b71f52b [Andrew Or] Revert a few more changes (minor)
7d94a8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
3a8235d [Andrew Or] Only parse the properties file if special configs exist
c37e08d [Andrew Or] Revert a few more changes
a396eda [Andrew Or] Nullify my own hard work to simplify bash
0effa1e [Andrew Or] Add code in Scala that handles special configs
c886568 [Andrew Or] Fix lines too long + a few comments / style (minor)
7a4190a [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
7396be2 [Andrew Or] Explicitly comment that multi-line properties are not supported
fa11ef8 [Andrew Or] Parse the properties file only if the special configs exist
371cac4 [Andrew Or] Add function prefix (minor)
be99eb3 [Andrew Or] Fix tests to not include multi-line configs
bd0d468 [Andrew Or] Simplify parsing config file by ignoring multi-line arguments
56ac247 [Andrew Or] Use eval and set to simplify splitting
8d4614c [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
aeb79c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
2732ac0 [Andrew Or] Integrate BASH tests into dev/run-tests + log error properly
8d26a5c [Andrew Or] Add tests for bash/utils.sh
4ae24c3 [Andrew Or] Fix bug: escape properly in quote_java_property
b3c4cd5 [Andrew Or] Fix bug: count the number of quotes instead of detecting presence
c2273fc [Andrew Or] Fix typo (minor)
e793e5f [Andrew Or] Handle multi-line arguments
5d8f8c4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
c7b9926 [Andrew Or] Minor changes to spark-defaults.conf.template
a992ae2 [Andrew Or] Escape spark.*.extraJavaOptions correctly
aabfc7e [Andrew Or] escape -> split (minor)
45a1eb9 [Andrew Or] Fix bug: escape escaped backslashes and quotes properly...
1cdc6b1 [Andrew Or] Fix bug: escape escaped double quotes properly
c854859 [Andrew Or] Add small comment
c13a2cb [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
8e552b7 [Andrew Or] Include an example of spark.*.extraJavaOptions
de765c9 [Andrew Or] Print spark-class command properly
a4df3c4 [Andrew Or] Move parsing and escaping logic to utils.sh
dec2343 [Andrew Or] Only export variables if they exist
fa2136e [Andrew Or] Escape Java options + parse java properties files properly
ef12f74 [Andrew Or] Minor formatting
4ec22a1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
e5cfb46 [Andrew Or] Collapse duplicate code + fix potential whitespace issues
4edcaa8 [Andrew Or] Redirect stdout to stderr for python
130f295 [Andrew Or] Handle spark.driver.memory too
98dd8e3 [Andrew Or] Add warning if properties file does not exist
8843562 [Andrew Or] Fix compilation issues...
75ee6b4 [Andrew Or] Remove accidentally added file
63ed2e9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
0025474 [Andrew Or] Revert SparkSubmit handling of --driver-* options for only cluster mode
a2ab1b0 [Andrew Or] Parse spark.driver.extra* in bash
250cb95 [Andrew Or] Do not ignore spark.driver.extra* for client mode
(cherry picked from commit b3ec51b)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Add explicit row copies when sort based shuffle is on.

Author: Michael Armbrust <michael@databricks.com>

Closes #2066 from marmbrus/sortShuffle and squashes the following commits:

fcd7bb2 [Michael Armbrust] Fix sort based shuffle for spark sql.

(cherry picked from commit a2e658d)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: wangfei <wangfei_hello@126.com>

Closes #2057 from scwf/patch-7 and squashes the following commits:

1b7b9a5 [wangfei] SPARK_LOGFILE and SPARK_ROOT_LOGGER no longer need in spark-daemon.sh

(cherry picked from commit a1e8b1b)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
… initialization of job conf

...al job conf

Author: Alex Liu <alex_liu68@yahoo.com>

Closes #1927 from alexliu68/SPARK-SQL-2846 and squashes the following commits:

e4bdc4c [Alex Liu] SPARK-SQL-2846 add configureInputJobPropertiesForStorageHandler to initial job conf

(cherry picked from commit d9e9414)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Instead of keeping copies in all pages, just reference the values
kept in the base SparkUI instance (by making them available via
getters).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #1252 from vanzin/SPARK-2169 and squashes the following commits:

4412fc6 [Marcelo Vanzin] Simplify UIUtils.headerSparkPage signature.
4e5d35a [Marcelo Vanzin] [SPARK-2169] Don't copy appName / basePath everywhere.
Simple way to reproduce this in the UI:

```scala
val f = new java.io.File("/tmp/test")
f.delete()
sc.parallelize(1 to 2, 2).map(x => (x,x )).repartition(3).mapPartitionsWithContext { case (context, iter) =>
  if (context.partitionId == 0) {
    val f = new java.io.File("/tmp/test")
    if (!f.exists) {
      f.mkdir()
      System.exit(0);
    }
  }
  iter
}.count()
```

Author: Reynold Xin <rxin@apache.org>

Closes #1545 from rxin/stage-attempt and squashes the following commits:

3ee1d2a [Reynold Xin] - Rename attempt to retry in UI. - Properly report stage failure in FetchFailed.
40a6bd5 [Reynold Xin] Updated test suites.
c414c36 [Reynold Xin] Fixed the hanging in JobCancellationSuite.
b3e2eed [Reynold Xin] Oops previous code didn't compile.
0f36075 [Reynold Xin] Mark unknown stage attempt with id -1 and drop that in JobProgressListener.
6c08b07 [Reynold Xin] Addressed code review feedback.
4e5faa2 [Reynold Xin] [SPARK-2298] Encode stage attempt in SparkListener & UI.
We read the py4j port from the stdout of the `bin/spark-submit` subprocess. If there is interference in stdout (e.g. a random echo in `spark-submit`), we throw an exception with a warning message. We do not, however, distinguish between this case from the case where no stdout is produced at all.

I wasted a non-trivial amount of time being baffled by this exception in search of places where I print random whitespace (in vain, of course). A clearer exception message that distinguishes between these cases will prevent similar headaches that I have gone through.

Author: Andrew Or <andrewor14@gmail.com>

Closes #2067 from andrewor14/python-exception and squashes the following commits:

742f823 [Andrew Or] Further clarify warning messages
e96a7a0 [Andrew Or] Distinguish between unexpected output and no output at all

(cherry picked from commit ba3c730)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
Moved TF-IDF before Word2Vec because the former is more basic. I also added a link for Word2Vec. atalwalkar

Author: Xiangrui Meng <meng@databricks.com>

Closes #2061 from mengxr/tfidf-doc and squashes the following commits:

ca04c70 [Xiangrui Meng] address comments
a5ea4b4 [Xiangrui Meng] add tf-idf user guide

(cherry picked from commit e157187)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
atalwalkar srowen

Author: Xiangrui Meng <meng@databricks.com>

Closes #2064 from mengxr/als-doc and squashes the following commits:

b2e20ab [Xiangrui Meng] introduced -> discussed
98abdd7 [Xiangrui Meng] add reference
339bd08 [Xiangrui Meng] add a section about regularization parameter in ALS

(cherry picked from commit e0f9462)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
This is failing my tests in #1777. @tdas

Author: Andrew Or <andrewor14@gmail.com>

Closes #1803 from andrewor14/fix-flaky-streaming-test and squashes the following commits:

ea11a03 [Andrew Or] Catch all exceptions caused by BindExceptions
54a0ca0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-flaky-streaming-test
664095c [Andrew Or] Tone down bind exception message
af3ddc9 [Andrew Or] Handle port collisions in flume polling test
…to in Flume Polling Tests.

Author: Hari Shreedharan <harishreedharan@gmail.com>

Closes #1820 from harishreedharan/use-free-ports and squashes the following commits:

b939067 [Hari Shreedharan] Remove unused import.
67856a8 [Hari Shreedharan] Remove findFreePort.
0ea51d1 [Hari Shreedharan] Make some changes to getPort to use map on the serverOpt.
1fb0283 [Hari Shreedharan] Merge branch 'master' of https://github.com/apache/spark into use-free-ports
b351651 [Hari Shreedharan] Allow Netty to choose port, and query it to decide the port to bind to. Leaving findFreePort as is, if other tests want to use it at some point.
e6c9620 [Hari Shreedharan] Making sure the second sink uses the correct port.
11c340d [Hari Shreedharan] Add info about race condition to scaladoc.
e89d135 [Hari Shreedharan] Adding Scaladoc.
6013bb0 [Hari Shreedharan] [STREAMING] Find free ports to use before attempting to create Flume Sink in Flume Polling Suite
Also, minor modifications to the MapR profile.
Updated DecisionTree documentation, with examples for Java, Python.
Added same Java example to code as well.
CC: @mengxr  @manishamde @atalwalkar

Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>

Closes #2063 from jkbradley/dt-docs and squashes the following commits:

2dd2c19 [Joseph K. Bradley] Last updates based on github review.
9dd1b6b [Joseph K. Bradley] Updated decision tree doc.
d802369 [Joseph K. Bradley] Updates based on comments: cache data, corrected doc text.
b9bee04 [Joseph K. Bradley] Updated DT examples
57eee9f [Joseph K. Bradley] Created JavaDecisionTree example from example in docs, and corrected doc example as needed.
d939a92 [Joseph K. Bradley] Updated DecisionTree documentation.  Added Java, Python examples.

(cherry picked from commit 050f8d0)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
The issue is that the yarn/alpha/pom.xml using 1.1.0 instead of 1.1.1-SNAPSHOT version.

update the pom.xml to 1.1.1-SNAPSHOT (same as yarn/stable/pom.xml)
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@vanzin
Copy link
Contributor

vanzin commented Aug 22, 2014

@chesterxgchen could you check how you submitted the PR? You seem to be merging a lot of unrelated things here.

@chesterxgchen
Copy link
Author

I only changed one line of code in each PR

That's strange, let me take a look

Sent from my iPhone

On Aug 21, 2014, at 5:56 PM, Marcelo Vanzin notifications@github.com wrote:

@chesterxgchen could you check how you submitted the PR? You seem to be merging a lot of unrelated things here.


Reply to this email directly or view it on GitHub.

@chesterxgchen
Copy link
Author

Let me close it and re-generate this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.