Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8532][SQL] In Python's DataFrameWriter, save/saveAsTable/json/parquet/jdbc always override mode #6937

Closed
wants to merge 8 commits into from

Conversation

yhuai
Copy link
Contributor

@yhuai yhuai commented Jun 22, 2015

https://issues.apache.org/jira/browse/SPARK-8532

This PR has two changes. First, it fixes the bug that save actions (i.e. save/saveAsTable/json/parquet/jdbc) always override mode. Second, it adds input argument partitionBy to save/saveAsTable/parquet.

@yhuai
Copy link
Contributor Author

yhuai commented Jun 22, 2015

@davies @marmbrus @mengxr

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35462 has finished for PR 6937 at commit 88eb6c4.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -276,7 +276,15 @@ def save(self, path=None, format=None, mode="error", **options):

>>> df.write.mode('append').parquet(os.path.join(tempfile.mkdtemp(), 'data'))
"""
self.mode(mode).options(**options)
if mode is not "error":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use "==" here

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35465 has finished for PR 6937 at commit d696dff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -346,13 +352,14 @@ def parquet(self, path, mode="error"):
* ``overwrite``: Overwrite existing data.
* ``ignore``: Silently ignore this operation if data already exists.
* ``error`` (default case): Throw an exception if data already exists.

:param partitionBy: names of partitioning columns
>>> df.write.parquet(os.path.join(tempfile.mkdtemp(), 'data'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line here

@davies
Copy link
Contributor

davies commented Jun 22, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35475 has finished for PR 6937 at commit d37abd2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -336,7 +342,7 @@ def json(self, path, mode="error"):
self._jwrite.mode(mode).json(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.mode(mode)._jwrite.json(path)

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35470 has finished for PR 6937 at commit 7fbc24b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Jun 22, 2015

Oh, this one hits

Archiving artifacts
WARN: No artifacts found that match the file pattern "**/target/unit-tests.log". Configuration error?
WARN: java.lang.InterruptedException: no matches found within 10000
Recording test results
ERROR: Publisher 'Publish JUnit test result report' failed: No test report files were found. Configuration error?
Finished: FAILURE

@yhuai
Copy link
Contributor Author

yhuai commented Jun 22, 2015

I will merge it into master and branch-1.4 once SparkQA is happy.

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35477 has finished for PR 6937 at commit f972d5d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PCAModel(JavaVectorTransformer):
    • class PCA(object):

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

asfgit pushed a commit that referenced this pull request Jun 22, 2015
…/parquet/jdbc always override mode

https://issues.apache.org/jira/browse/SPARK-8532

This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`.

Author: Yin Huai <yhuai@databricks.com>

Closes #6937 from yhuai/SPARK-8532 and squashes the following commits:

f972d5d [Yin Huai] davies's comment.
d37abd2 [Yin Huai] style.
d21290a [Yin Huai] Python doc.
889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet.
7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value.
d696dff [Yin Huai] Python style.
88eb6c4 [Yin Huai] If mode is "error", do not call mode method.
c40c461 [Yin Huai] Regression test.

(cherry picked from commit 5ab9fcf)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@asfgit asfgit closed this in 5ab9fcf Jun 22, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 22, 2015
…/parquet/jdbc always override mode

https://issues.apache.org/jira/browse/SPARK-8532

This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`.

Author: Yin Huai <yhuai@databricks.com>

Closes apache#6937 from yhuai/SPARK-8532 and squashes the following commits:

f972d5d [Yin Huai] davies's comment.
d37abd2 [Yin Huai] style.
d21290a [Yin Huai] Python doc.
889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet.
7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value.
d696dff [Yin Huai] Python style.
88eb6c4 [Yin Huai] If mode is "error", do not call mode method.
c40c461 [Yin Huai] Regression test.

(cherry picked from commit 5ab9fcf)
Signed-off-by: Yin Huai <yhuai@databricks.com>
animeshbaranawal pushed a commit to animeshbaranawal/spark that referenced this pull request Jun 25, 2015
…/parquet/jdbc always override mode

https://issues.apache.org/jira/browse/SPARK-8532

This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`.

Author: Yin Huai <yhuai@databricks.com>

Closes apache#6937 from yhuai/SPARK-8532 and squashes the following commits:

f972d5d [Yin Huai] davies's comment.
d37abd2 [Yin Huai] style.
d21290a [Yin Huai] Python doc.
889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet.
7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value.
d696dff [Yin Huai] Python style.
88eb6c4 [Yin Huai] If mode is "error", do not call mode method.
c40c461 [Yin Huai] Regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants