Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5683] [SQL] Avoid multiple json generator created #4468

Closed
wants to merge 1 commit into from

Conversation

chenghao-intel
Copy link
Contributor

No description provided.

@SparkQA
Copy link

SparkQA commented Feb 9, 2015

Test build #27092 has started for PR 4468 at commit e556aba.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 9, 2015

Test build #27093 has started for PR 4468 at commit aeb7801.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 9, 2015

Test build #27092 has finished for PR 4468 at commit e556aba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27092/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Feb 9, 2015

Test build #27093 has finished for PR 4468 at commit aeb7801.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27093/
Test PASSed.

@chenghao-intel
Copy link
Contributor Author

cc @marmbrus @yhuai

iter.map(JsonRDD.rowToJSON(rowSchema, jsonFactory))
val writer = new CharArrayWriter()
// create the Generator without separator inserted between 2 records
val gen = new JsonFactory().createGenerator(writer).setRootValueSeparator(null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expect any test result change after using setRootValueSeparator(null)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A (single space) will be inserted between 2 json records (in string) by default;
Previously, we create thegeneratorobject for each of record in serialization, hence it's OK with default object separator.
However, in this PR, we are trying to using a singlegenerator for all of the records, that's why we need to set it as null, which means nothing will be inserted between records.

@yhuai
Copy link
Contributor

yhuai commented Feb 10, 2015

@chenghao-intel Thank you for the PR. It will be great if you can also attach some performance results.

@chenghao-intel
Copy link
Contributor Author

@yhuai, thanks for reviewing, in a local testing, it shows about 10-15% performance gain.

@yhuai
Copy link
Contributor

yhuai commented Feb 10, 2015

LGTM

@chenghao-intel
Copy link
Contributor Author

@marmbrus any more comment on this?

asfgit pushed a commit that referenced this pull request Feb 11, 2015
Author: Cheng Hao <hao.cheng@intel.com>

Closes #4468 from chenghao-intel/json and squashes the following commits:

aeb7801 [Cheng Hao] avoid multiple json generator created

(cherry picked from commit a60aea8)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@marmbrus
Copy link
Contributor

Thanks, merged to master and 1.3

@asfgit asfgit closed this in a60aea8 Feb 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants