Move compressionCodec to a string parameter #174

msperlich · 2015-10-23T20:58:35Z

This change allows Python (and presumably Java) users to use compressed
output by passing the compressionCodec as a string option, instead of an
argument to a function only visible in the implicit class.

This is an API breaking change, so I'm definitely open to other
approaches.

JoshRosen · 2015-10-23T21:37:01Z

Instead of changing saveAsCsvFile's signature, you could add an overloaded method and preserve binary compatibility in that way.

codecov-io · 2015-10-24T15:51:22Z

Current coverage is `85.68%`

Merging #174 into master will increase coverage by +0.20% as of 10704ff

@@            master    #174   diff @@
======================================
  Files           10      10       
  Stmts          489     496     +7
  Branches       147     146     -1
  Methods          0       0       
======================================
+ Hit            418     425     +7
  Partial          0       0       
  Missed          71      71

Review entire Coverage Diff as of 10704ff

Powered by Codecov. Updated on successful CI builds.

falaki · 2015-10-26T04:34:11Z

@msperlich would you please add a method named withCompression(codec: String) to CsvParser? Also please add this new parameter to the README file.

falaki · 2015-10-26T04:35:41Z

src/main/scala/com/databricks/spark/csv/package.scala

+     * org.apache.hadoop.io.compress.CompressionCodec then the output will be
+     * compressed.
+     */
+    def saveAsCsvFile(path: String, parameters: Map[String, String]): Unit = {


I prefer not to add this method. saveAsCsfFile is a helper method for convenience. This functionality should be implemented in `CsvParser``

rxin · 2015-10-26T08:23:54Z

@falaki can we please deprecate CsvContext? We should go through the unified reader interface, which is consistent across all languages. CsvContext should only be used for backward compatibility.

msperlich · 2015-10-26T19:31:03Z

@falaki I've tried to rework this PR to match what you've described. Let me know how it looks now. I'm new to both Scala and Spark, so thanks for your patience!

msperlich · 2015-11-16T16:21:58Z

any followup on this one?

falaki · 2015-11-19T23:04:31Z

src/main/scala/com/databricks/spark/csv/package.scala

@@ -26,6 +26,16 @@ package object csv {
  val defaultCsvFormat =
    CSVFormat.DEFAULT.withRecordSeparator(System.getProperty("line.separator", "\n"))

+  private[csv] def compresionCodecClass(className: String): Class[_ <: CompressionCodec] = {
+    className match {
+    case null => null


Nit: indent two spaces.

This change allows Python (and presumably Java) users to use compressed output by passing the compressionCodec as a string option, instead of an argument to a function only visible in the implicit class. This is an API breaking change, so I'm definitely open to other approaches.

Add overloaded versions of saveAsCsvFile to preserve API compatibilty. Fixed checkstyle violations.

Got rid of overloaded methods, added codec as a parameter to CsvParser. Updated README with some examples.

Remove extra spaces, fix indentation.

falaki · 2015-11-20T19:52:53Z

Thanks. Merging this now for release 1.3.0

msperlich force-pushed the master branch from 872c9e2 to 0b713fd Compare October 24, 2015 15:43

falaki reviewed Oct 26, 2015
View reviewed changes

msperlich force-pushed the master branch from fe9189e to 0bac3a1 Compare October 26, 2015 19:18

falaki reviewed Nov 19, 2015
View reviewed changes

msperlich added 3 commits November 20, 2015 10:45

Fix API breaking changes

2ada347

Add overloaded versions of saveAsCsvFile to preserve API compatibilty. Fixed checkstyle violations.

Rework codec paramter

4b8433e

Got rid of overloaded methods, added codec as a parameter to CsvParser. Updated README with some examples.

msperlich force-pushed the master branch from 0bac3a1 to 1e49839 Compare November 20, 2015 15:45

Style fixups

54fff71

Remove extra spaces, fix indentation.

msperlich force-pushed the master branch from 1e49839 to 54fff71 Compare November 20, 2015 16:28

falaki closed this in 84858ce Nov 20, 2015

HyukjinKwon mentioned this pull request Dec 8, 2015

Add a support to select codec as an option. databricks/spark-xml#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move compressionCodec to a string parameter #174

Move compressionCodec to a string parameter #174

msperlich commented Oct 23, 2015

JoshRosen commented Oct 23, 2015

codecov-io commented Oct 24, 2015

falaki commented Oct 26, 2015

falaki Oct 26, 2015

rxin commented Oct 26, 2015

msperlich commented Oct 26, 2015

msperlich commented Nov 16, 2015

falaki Nov 19, 2015

falaki commented Nov 20, 2015

Move compressionCodec to a string parameter #174

Move compressionCodec to a string parameter #174

Conversation

msperlich commented Oct 23, 2015

JoshRosen commented Oct 23, 2015

codecov-io commented Oct 24, 2015

Current coverage is 85.68%

falaki commented Oct 26, 2015

falaki Oct 26, 2015

Choose a reason for hiding this comment

rxin commented Oct 26, 2015

msperlich commented Oct 26, 2015

msperlich commented Nov 16, 2015

falaki Nov 19, 2015

Choose a reason for hiding this comment

falaki commented Nov 20, 2015

Current coverage is `85.68%`