ks.read_spark_io / DataFrame.to_spark_io #447

rxin · 2019-06-08T12:57:25Z

Resolves #446

codecov-io · 2019-06-08T17:25:33Z

Codecov Report

Merging #447 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #447      +/-   ##
==========================================
+ Coverage   93.06%   93.07%   +0.01%     
==========================================
  Files          27       27              
  Lines        3344     3349       +5     
==========================================
+ Hits         3112     3117       +5     
  Misses        232      232

Impacted Files	Coverage Δ
databricks/koalas/namespace.py	`90.3% <100%> (+0.12%)`	⬆️
databricks/koalas/frame.py	`94.74% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 103f1a6...e2726df. Read the comment docs.

rxin · 2019-06-08T20:25:38Z

@floscha want to take a look at this?

floscha

This is some really useful functionality to have since, in practice, I read/write most data from/to HDFS rather than a local file system.

I left some comments regarding the documentation. Since the implementation itself is pretty straightforward, I don't have any complaints there 😉

floscha · 2019-06-09T08:01:41Z

databricks/koalas/frame.py

+        path : string, optional
+            Path to the data source.
+        format : string, optional
+            Name of the data source in Spark.


Name sounds more like file name if you asked me. Why not instead go for "Specifies the input data source format." like the Spark docs describe it.

Also, like you did for mode, it would be great to provide a list of supported formats, namely: CSV, JDBC, JSON, ORC, and Parquet.

Thanks. Good point. Will do the changes.

floscha · 2019-06-09T08:02:01Z

databricks/koalas/namespace.py

+    path : string, optional
+        Path to the data source.
+    format : string, optional
+        Name of the data source in Spark.


See my comment on format above.

floscha · 2019-06-09T08:02:10Z

databricks/koalas/tests/test_dataframe_spark_io.py

+
+            # Write out partitioned by one column
+            expected.to_spark_io(tmp, format='json', mode='overwrite', partition_cols='i32')
+            # reset column order, as once the data is written out, Spark rearranges partition


This line and 107 could start with a capital letter and end with a period, but that's rather cosmetic 😉

softagram-bot · 2019-06-09T08:43:49Z

Softagram Impact Report for pull/447 (head commit: `e2726df`)

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/447

Give feedback on this report to support@softagram.com

rxin added 6 commits June 8, 2019 14:22

DataFrame.to_parquet

18adf5e

style fix

cbb76e7

read_spark_io and DataFrame.to_spark_io

f72a4a4

Merge branch 'master' into spark_io

d3c1d2e

default error

8404012

add test case

4406b49

rxin changed the title ~~Generic Spark I/O functions~~ ks.read_spark_io / DataFrame.to_spark_io Jun 8, 2019

floscha suggested changes Jun 9, 2019

View reviewed changes

CR

e2726df

floscha approved these changes Jun 9, 2019

View reviewed changes

rxin merged commit 4af1d34 into databricks:master Jun 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ks.read_spark_io / DataFrame.to_spark_io #447

ks.read_spark_io / DataFrame.to_spark_io #447

rxin commented Jun 8, 2019

codecov-io commented Jun 8, 2019 •

edited

rxin commented Jun 8, 2019

floscha left a comment

floscha Jun 9, 2019

rxin Jun 9, 2019

floscha Jun 9, 2019

floscha Jun 9, 2019

softagram-bot commented Jun 9, 2019

ks.read_spark_io / DataFrame.to_spark_io #447

ks.read_spark_io / DataFrame.to_spark_io #447

Conversation

rxin commented Jun 8, 2019

codecov-io commented Jun 8, 2019 • edited

Codecov Report

rxin commented Jun 8, 2019

floscha left a comment

Choose a reason for hiding this comment

floscha Jun 9, 2019

Choose a reason for hiding this comment

rxin Jun 9, 2019

Choose a reason for hiding this comment

floscha Jun 9, 2019

Choose a reason for hiding this comment

floscha Jun 9, 2019

Choose a reason for hiding this comment

softagram-bot commented Jun 9, 2019

Softagram Impact Report for pull/447 (head commit: e2726df)

⭐ Change Overview

📄 Full report

codecov-io commented Jun 8, 2019 •

edited

Softagram Impact Report for pull/447 (head commit: `e2726df`)