Skip to content

[SPARK-16104][SQL] Do not creaate CSV writer object for every flush when writing#13809

Closed
HyukjinKwon wants to merge 7 commits intoapache:masterfrom
HyukjinKwon:write-perf
Closed

[SPARK-16104][SQL] Do not creaate CSV writer object for every flush when writing#13809
HyukjinKwon wants to merge 7 commits intoapache:masterfrom
HyukjinKwon:write-perf

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 21, 2016

What changes were proposed in this pull request?

This PR let CsvWriter object is not created for each time but able to be reused. This way was taken after from JSON data source.

Original CsvWriter was being created for each row but it was enhanced in #13229. However, it still creates CsvWriter object for each flush() in LineCsvWriter. It seems it does not have to close the object and re-create this for every flush.

It follows the original logic as it is but CsvWriter is reused by reseting CharArrayWriter.

How was this patch tested?

Existing tests should cover this.

writerSettings.setHeaders(headers: _*)
writerSettings.setQuoteEscapingEnabled(options.escapeQuotes)

new CsvWriter(writer, writerSettings)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me organise this in a separate class at the PR for refactoring this.

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60932 has finished for PR 13809 at commit a62a26f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 21, 2016

Hm.. I will try to minimise the changes here.

@HyukjinKwon
Copy link
Member Author

Hi @davies, it seems the past change in your PR is related with this. Could you please take a look?

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #60986 has finished for PR 13809 at commit a96d547.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #60985 has finished for PR 13809 at commit 61dfef8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #60988 has finished for PR 13809 at commit 882a24f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Jun 22, 2016

LGTM,
Merging this into master, thanks!

@asfgit asfgit closed this in 7580f30 Jun 22, 2016
@HyukjinKwon HyukjinKwon deleted the write-perf branch January 2, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments