Stream json data to a file to save 30% of the memory. #2510
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed that using --no-data-dir saved more than 300 MiB when working with the multiqc/test-data. This is odd. Turns out the internal data is serialized into a string before it is written. Using a streaming method saves the memory.
before:
After:
I would love to reduce the memory further, but a lot of the memory profile shows stuff that has already been touched by other PRs. As stated before the LZ compression is a major culprit, using 200 MiB. If it could be replaced with gzip that usage would simply vanish. Another culprit is reading and storing lines from files, but that is already touched upon by #2505 and I suspect that that will have much better memory use as lines are not saved permanently. I added the profile report.
memray-flamegraph-multiqc.9194.html.zip
This change is pretty small. While I do not like that it can dump to file (returning None) and dump to string (returning str) it is good that the replace_nan function is still used.