Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move compressionCodec to a string parameter #174

Closed
wants to merge 4 commits into from

Conversation

msperlich
Copy link
Contributor

This change allows Python (and presumably Java) users to use compressed
output by passing the compressionCodec as a string option, instead of an
argument to a function only visible in the implicit class.

This is an API breaking change, so I'm definitely open to other
approaches.

@JoshRosen
Copy link
Contributor

Instead of changing saveAsCsvFile's signature, you could add an overloaded method and preserve binary compatibility in that way.

@codecov-io
Copy link

Current coverage is 85.68%

Merging #174 into master will increase coverage by +0.20% as of 10704ff

@@            master    #174   diff @@
======================================
  Files           10      10       
  Stmts          489     496     +7
  Branches       147     146     -1
  Methods          0       0       
======================================
+ Hit            418     425     +7
  Partial          0       0       
  Missed          71      71       

Review entire Coverage Diff as of 10704ff

Powered by Codecov. Updated on successful CI builds.

@falaki
Copy link
Member

falaki commented Oct 26, 2015

@msperlich would you please add a method named withCompression(codec: String) to CsvParser? Also please add this new parameter to the README file.

* org.apache.hadoop.io.compress.CompressionCodec then the output will be
* compressed.
*/
def saveAsCsvFile(path: String, parameters: Map[String, String]): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to add this method. saveAsCsfFile is a helper method for convenience. This functionality should be implemented in `CsvParser``

@rxin
Copy link
Contributor

rxin commented Oct 26, 2015

@falaki can we please deprecate CsvContext? We should go through the unified reader interface, which is consistent across all languages. CsvContext should only be used for backward compatibility.

@msperlich
Copy link
Contributor Author

@falaki I've tried to rework this PR to match what you've described. Let me know how it looks now. I'm new to both Scala and Spark, so thanks for your patience!

@msperlich
Copy link
Contributor Author

any followup on this one?

@@ -26,6 +26,16 @@ package object csv {
val defaultCsvFormat =
CSVFormat.DEFAULT.withRecordSeparator(System.getProperty("line.separator", "\n"))

private[csv] def compresionCodecClass(className: String): Class[_ <: CompressionCodec] = {
className match {
case null => null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indent two spaces.

This change allows Python (and presumably Java) users to use compressed
output by passing the compressionCodec as a string option, instead of an
argument to a function only visible in the implicit class.

This is an API breaking change, so I'm definitely open to other
approaches.
Add overloaded versions of saveAsCsvFile to preserve API compatibilty.
Fixed checkstyle violations.
Got rid of overloaded methods, added codec as a parameter to CsvParser.
Updated README with some examples.
Remove extra spaces, fix indentation.
@falaki
Copy link
Member

falaki commented Nov 20, 2015

Thanks. Merging this now for release 1.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants