Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write output directly to disk with -o #85

Merged
merged 1 commit into from
Aug 24, 2020

Conversation

szeiger
Copy link
Collaborator

@szeiger szeiger commented Aug 21, 2020

This prevents caching the entire output string in memory, reducing the heap usage for our largest output file of 7.2 MB from 101 MB to 93 MB in my test.

Only JSON is supported for now. YAML requires more work because the YamlRenderer is more closely tied to the underlying StringWriter, so it still has to buffer.

This prevents caching the entire output string in memory, reducing the heap usage for our largest output file of 7.2 MB from 101 MB to 93 MB in my test.

Only JSON is supported for now. YAML requires more work because the YamlRenderer is more closely tied to the underlying StringWriter, so it still has to buffer.
val buf = new BufferedOutputStream(out)
val wr = new OutputStreamWriter(buf, StandardCharsets.UTF_8)
val u = contents(wr)
wr.flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call buf.flush() here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, flushing (like closing) is transitive, you only need to call it once for the first stream/writer in the pipeline.

@lihaoyi-databricks
Copy link
Contributor

One comment, looks good to me otherwise. Feel free to merge once that's resolved!

@szeiger szeiger merged commit a7e217b into databricks:master Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants