SPARK-1102: Create a saveAsNewAPIHadoopDataset method #12

CodingCat · 2014-02-27T00:22:51Z

https://spark-project.atlassian.net/browse/SPARK-1102

Create a saveAsNewAPIHadoopDataset method

By @mateiz: "Right now RDDs can only be saved as files using the new Hadoop API, not as "datasets" with no filename and just a JobConf. See http://codeforhire.com/2014/02/18/using-spark-with-mongodb/ for an example of how you have to give a bogus filename. For the old Hadoop API, we have saveAsHadoopDataset."

AmplabJenkins · 2014-02-27T00:23:49Z

Can one of the admins verify this patch?

AmplabJenkins · 2014-02-27T00:23:55Z

Can one of the admins verify this patch?

CodingCat · 2014-02-28T04:53:37Z

this is a re-opened PR, in the old PR, https://github.com/apache/incubator-spark/pull/636, all test cases have passed

Can anyone verify that and make further review?

CodingCat · 2014-02-28T14:14:58Z

I changed back the parameter type of the new method to Configuration for keeping consistent with other APIs, and whether Job should be parameter type is still under discussion

https://spark-project.atlassian.net/browse/SPARK-1139

CodingCat · 2014-03-02T04:18:33Z

I rebased the code after #11 was merged, and tested in my local side, I think it is ready for further review/testing

mateiz · 2014-03-02T23:09:55Z

Jenkins, test this please

AmplabJenkins · 2014-03-02T23:21:02Z

Merged build triggered.

AmplabJenkins · 2014-03-02T23:21:02Z

Merged build started.

AmplabJenkins · 2014-03-02T23:22:29Z

Merged build finished.

AmplabJenkins · 2014-03-02T23:22:29Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12954/

CodingCat · 2014-03-02T23:29:30Z

exceed with 5 chars....sorry.....fixed....

mateiz · 2014-03-03T04:44:10Z

Jenkins, this is ok to test

AmplabJenkins · 2014-03-03T05:21:19Z

Merged build triggered.

AmplabJenkins · 2014-03-03T05:21:19Z

Merged build started.

AmplabJenkins · 2014-03-03T06:20:51Z

Merged build finished.

AmplabJenkins · 2014-03-03T06:20:52Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12962/

CodingCat · 2014-03-05T01:59:46Z

This is ready to merge?

CodingCat · 2014-03-06T05:04:43Z

any further comments?

CodingCat · 2014-03-07T19:19:32Z

ping

AmplabJenkins · 2014-03-08T14:29:46Z

Merged build triggered.

AmplabJenkins · 2014-03-08T14:29:46Z

Merged build started.

AmplabJenkins · 2014-03-08T15:29:19Z

Merged build finished.

AmplabJenkins · 2014-03-08T15:29:19Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13073/

CodingCat · 2014-03-09T01:47:18Z

@mateiz I have rebased the code, any further comments?

CodingCat · 2014-03-10T03:24:52Z

ping

mateiz · 2014-03-10T20:19:50Z

Sorry, haven't had time to look at this lately, but will do soon.

CodingCat · 2014-03-10T20:37:41Z

No problem, thanks

mateiz · 2014-03-17T06:56:56Z

Hey, sorry for the super late reply. This looks good but it would be good to add a test for saveAsNewAPIHadoopDataset directly (passing it the file output format and such through a Conf). Once you do that I think this is ready to merge.

CodingCat · 2014-03-18T13:42:21Z

Hi, @mateiz, thank you for reviewing this,

just added the test cases for both old&new API-based saveAsHadoopDataset

Only sparkify on mac

…askTracker to reduce the chance of the communicating problem Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Closes apache#3785 from YanTangZhai/SPARK-4946 and squashes the following commits: 9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem e4c2c0a [YanTangZhai] Merge pull request apache#15 from apache/master 718afeb [YanTangZhai] Merge pull request apache#12 from apache/master 6e643f8 [YanTangZhai] Merge pull request apache#11 from apache/master e249846 [YanTangZhai] Merge pull request apache#10 from apache/master d26d982 [YanTangZhai] Merge pull request apache#9 from apache/master 76d4027 [YanTangZhai] Merge pull request apache#8 from apache/master 03b62b0 [YanTangZhai] Merge pull request apache#7 from apache/master 8a00106 [YanTangZhai] Merge pull request apache#6 from apache/master cbcba66 [YanTangZhai] Merge pull request apache#3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master

Support ! boolean logic operator like NOT in sql as follows select * from for_test where !(col1 > col2) Author: YanTangZhai <hakeemzhai@tencent.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3555 from YanTangZhai/SPARK-4692 and squashes the following commits: 1a9f605 [YanTangZhai] Update HiveQuerySuite.scala 7c03c68 [YanTangZhai] Merge pull request apache#23 from apache/master 992046e [YanTangZhai] Update HiveQuerySuite.scala ea618f4 [YanTangZhai] Update HiveQuerySuite.scala 192411d [YanTangZhai] Merge pull request apache#17 from YanTangZhai/master e4c2c0a [YanTangZhai] Merge pull request apache#15 from apache/master 1e1ebb4 [YanTangZhai] Update HiveQuerySuite.scala efc4210 [YanTangZhai] Update HiveQuerySuite.scala bd2c444 [YanTangZhai] Update HiveQuerySuite.scala 1893956 [YanTangZhai] Merge pull request apache#14 from marmbrus/pr/3555 59e4de9 [Michael Armbrust] make hive test 718afeb [YanTangZhai] Merge pull request apache#12 from apache/master 950b21e [YanTangZhai] Update HiveQuerySuite.scala 74175b4 [YanTangZhai] Update HiveQuerySuite.scala 92242c7 [YanTangZhai] Update HiveQl.scala 6e643f8 [YanTangZhai] Merge pull request apache#11 from apache/master e249846 [YanTangZhai] Merge pull request apache#10 from apache/master d26d982 [YanTangZhai] Merge pull request apache#9 from apache/master 76d4027 [YanTangZhai] Merge pull request apache#8 from apache/master 03b62b0 [YanTangZhai] Merge pull request apache#7 from apache/master 8a00106 [YanTangZhai] Merge pull request apache#6 from apache/master cbcba66 [YanTangZhai] Merge pull request apache#3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master

apache#12)

* Support simple rsync schema in the executor uri * Support simple rsync schema in the executor uri

apache#12)

* Support simple rsync schema in the executor uri * Support simple rsync schema in the executor uri (cherry picked from commit b043fdb)

* Support simple rsync schema in the executor uri * Support simple rsync schema in the executor uri (cherry picked from commit b043fdb) (cherry picked from commit 6d59ab1)

* Support simple rsync schema in the executor uri * Support simple rsync schema in the executor uri (cherry picked from commit b043fdb) (cherry picked from commit 6d59ab1) (cherry picked from commit 8a7a336) (cherry picked from commit e978875)

[SPARK-202] add CONTRIBUTING.md

* Support simple rsync schema in the executor uri * Support simple rsync schema in the executor uri (cherry picked from commit b043fdb) (cherry picked from commit 6d59ab1) (cherry picked from commit 8a7a336) (cherry picked from commit e978875) (cherry picked from commit cdbef9b)

ESS mapoutput metadata

…r-permission remove script vars in entrypoint

### What changes were proposed in this pull request? This PR proposes to make `PythonFunction` holds `Seq[Byte]` instead of `Array[Byte]` to be able to compare if the byte array has the same values for the cache manager. ### Why are the changes needed? Currently the cache manager doesn't use the cache for `udf` if the `udf` is created again even if the functions is the same. ```py >>> func = lambda x: x >>> df = spark.range(1) >>> df.select(udf(func)("id")).cache() ``` ```py >>> df.select(udf(func)("id")).explain() == Physical Plan == *(2) Project [pythonUDF0#14 AS <lambda>(id)#12] +- BatchEvalPython [<lambda>(id#0L)], [pythonUDF0#14] +- *(1) Range (0, 1, step=1, splits=12) ``` This is because `PythonFunction` holds `Array[Byte]`, and `equals` method of array equals only when the both array is the same instance. ### Does this PR introduce _any_ user-facing change? Yes, if the user reuse the Python function for the UDF, the cache manager will detect the same function and use the cache for it. ### How was this patch tested? I added a test case and manually. ```py >>> df.select(udf(func)("id")).explain() == Physical Plan == InMemoryTableScan [<lambda>(id)#12] +- InMemoryRelation [<lambda>(id)#12], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(2) Project [pythonUDF0#5 AS <lambda>(id)#3] +- BatchEvalPython [<lambda>(id#0L)], [pythonUDF0#5] +- *(1) Range (0, 1, step=1, splits=12) ``` Closes #28774 from ueshin/issues/SPARK-31945/udf_cache. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

Align the dependency versions with current grid

…n properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes #45214 from zhengruifeng/connect_fix_read_join. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…n properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#45214 from zhengruifeng/connect_fix_read_join. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix #45214 to 3.5 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes #46291 from zhengruifeng/connect_fix_read_join_35. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix #45214 to 3.4 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes #46290 from zhengruifeng/connect_fix_read_join_34. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix apache#45214 to 3.4 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[apache#12]Join LeftOuter, '`==`('index, 'id) '[apache#12]Join LeftOuter, '`==`('index, 'id) !:- '[apache#9]UnresolvedRelation [test_table_1], [], false :- '[apache#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[apache#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[apache#10]Join Inner, '`==`('id, 'index) +- '[apache#11]Project ['index, 'value_2] ! :- '[apache#7]UnresolvedRelation [test_table_1], [], false +- '[apache#10]Join Inner, '`==`('id, 'index) ! +- '[apache#8]UnresolvedRelation [test_table_2], [], false :- '[apache#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[apache#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[apache#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[apache#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#46290 from zhengruifeng/connect_fix_read_join_34. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org> (cherry picked from commit 5f58fa7)

Create a saveAsNewAPIHadoopDataset method

a8583ee

JasonMWhite pushed a commit to JasonMWhite/spark that referenced this pull request Dec 2, 2015

Merge pull request apache#12 from Shopify/only-sparkify-on-mac

619d772

Only sparkify on mac

lins05 pushed a commit to lins05/spark that referenced this pull request Jan 17, 2017

Copy the Dockerfiles from docker-minimal-bundle into the distribution. (

ad9adde

apache#12)

lins05 pushed a commit to lins05/spark that referenced this pull request Apr 23, 2017

Copy the Dockerfiles from docker-minimal-bundle into the distribution. (

acceb72

apache#12)

WenboZhao added a commit to WenboZhao/spark that referenced this pull request May 17, 2017

Support simple rsync schema in the executor uri (apache#12)

5770922

* Support simple rsync schema in the executor uri * Support simple rsync schema in the executor uri

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017

Copy the Dockerfiles from docker-minimal-bundle into the distribution. (

2b1a99d

apache#12)

Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018

Merge pull request apache#12 from mesosphere/contributing

e6ccb34

[SPARK-202] add CONTRIBUTING.md

HyukjinKwon referenced this pull request in HyukjinKwon/spark Oct 16, 2018

Address comments (#12)

cb23bd7

yifeih pushed a commit to yifeih/spark that referenced this pull request Jan 23, 2019

Merge pull request apache#12 from yifeih/yh/ess-metadata-v1

40ab79f

ESS mapoutput metadata

PingHao mentioned this pull request Oct 9, 2019

[SPARK-28120][SS] Rocksdb state storage implementation #24922

Closed

SirOibaf added a commit to SirOibaf/spark that referenced this pull request Dec 11, 2019

[HOPSWORKS-1271] Bump hive version (apache#12)

143da7d

ringtail added a commit to ringtail/spark that referenced this pull request May 9, 2020

Merge pull request apache#12 from ringtail/bugfix/spark-history-serve…

d98787d

…r-permission remove script vars in entrypoint

huaxingao mentioned this pull request Aug 13, 2020

[SPARK-32590][SQL] Remove fullOutput from RowDataSourceScanExec #29415

Closed

redsanket pushed a commit to redsanket/spark that referenced this pull request Feb 16, 2021

Merge pull request apache#12 from tmielika/patch-1

56b8c87

Align the dependency versions with current grid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-1102: Create a saveAsNewAPIHadoopDataset method #12

SPARK-1102: Create a saveAsNewAPIHadoopDataset method #12

CodingCat commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

CodingCat commented Feb 28, 2014

CodingCat commented Feb 28, 2014

CodingCat commented Mar 2, 2014

mateiz commented Mar 2, 2014

AmplabJenkins commented Mar 2, 2014

AmplabJenkins commented Mar 2, 2014

AmplabJenkins commented Mar 2, 2014

AmplabJenkins commented Mar 2, 2014

CodingCat commented Mar 2, 2014

mateiz commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

CodingCat commented Mar 5, 2014

CodingCat commented Mar 6, 2014

CodingCat commented Mar 7, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

CodingCat commented Mar 9, 2014

CodingCat commented Mar 10, 2014

mateiz commented Mar 10, 2014

CodingCat commented Mar 10, 2014

mateiz commented Mar 17, 2014

CodingCat commented Mar 18, 2014

SPARK-1102: Create a saveAsNewAPIHadoopDataset method #12

SPARK-1102: Create a saveAsNewAPIHadoopDataset method #12

Conversation

CodingCat commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

CodingCat commented Feb 28, 2014

CodingCat commented Feb 28, 2014

CodingCat commented Mar 2, 2014

mateiz commented Mar 2, 2014

AmplabJenkins commented Mar 2, 2014

AmplabJenkins commented Mar 2, 2014

AmplabJenkins commented Mar 2, 2014

AmplabJenkins commented Mar 2, 2014

CodingCat commented Mar 2, 2014

mateiz commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

CodingCat commented Mar 5, 2014

CodingCat commented Mar 6, 2014

CodingCat commented Mar 7, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

CodingCat commented Mar 9, 2014

CodingCat commented Mar 10, 2014

mateiz commented Mar 10, 2014

CodingCat commented Mar 10, 2014

mateiz commented Mar 17, 2014

CodingCat commented Mar 18, 2014