Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #3054: Dynamically switch to batching for larger csv writes #3061

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

stress-tess
Copy link
Member

This PR (closes #3054) adds the ability to write chunks of data when writing csv files that would cause us to run out of memory when using the new optimizatioin. We use the arkouda memory management functions to determine approximately how much memory is available on each locale, and then divide the data we want to write on that locale into slices of that size. This makes sure the chapel native strings array is as big as possible without running out of. Hopefully we'll keep the performance bump for the small cases

Copy link
Contributor

@jaketrookman jaketrookman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvements, looks good to me

Copy link
Contributor

@ajpotts ajpotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than the questions I marked.

src/CSVMsg.chpl Outdated Show resolved Hide resolved
src/CSVMsg.chpl Outdated Show resolved Hide resolved
src/CSVMsg.chpl Outdated Show resolved Hide resolved
… writes

This PR (closes Bears-R-Us#3054) adds the ability to write chunks of data when writing csv files that would cause us to run out of memory when using the new optimizatioin. We use the arkouda memory management functions to determine approximately how much memory is available on each locale, and then divide the data we want to write on that locale into slices of that size. This makes sure the chapel native strings array is as big as possible without running out of. Hopefully we'll keep the performance bump for the small cases
@stress-tess stress-tess requested a review from ajpotts April 2, 2024 21:26
@stress-tess stress-tess merged commit 13d344a into Bears-R-Us:master Apr 3, 2024
13 checks passed
@stress-tess stress-tess deleted the 3054_batch_csv_write branch April 3, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dynamically switch to batching for larger csv writes
3 participants