[Fix] Do not buffer files in memory when downloading #1599
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
This PR fixes a performance bug that led downloaded files (e.g. with
databricks fs cp dbfs:/Volumes/.../somefile .
) to be buffered in memory before being written.Results from profiling the download of a ~100MB file:
Before:
After:
Note that this fix is temporary. A longer term solution should be to use the API provided by the Go SDK rather than making an HTTP request directly from the CLI.
fix #1575
Tests
Verified that the CLI properly downloads the file when doing the profiling.