-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory on Windows #167
Comments
Our conda-build update, to correctly pass 16 from a config file, instead of the (too high?) default of 22, should also reduce memory usage. I am having trouble figuring out how much from the zstd manual or man pages. |
A quick memory test. import zstandard
import sys
import io
compressor = zstandard.ZstdCompressor(level=int(sys.argv[1]))
writer = compressor.stream_writer(io.BytesIO())
writer.write(b"hello")
writer.close() OSX's
So we go from 36MB, 44MB all the way up to 691MB. The conda-build update should help - we were experiencing a bug where conda-forge wanted -16 but got the default of -22. I think it is possible to pass the input size into zstdcompressor, if we use the API more smartly. This would help when packages are less than e.g. the 691MB buffer seen in -22. I would be open to changing our default to -19 which is the highest the command line tool lets you do, without passing --ultra... |
(It would be tricky but very nice to figure out a good "give me successive objects that can stream data into the .conda zip archive" API, all in memory, to go into conda-package-streaming) |
I highly recommend going for
For compression the higher levels additionally use bigger search tables which is why the memory increases more. You get slightly higher resource usage for |
We're also probably going for -19 as the default on the conda-forge side, see conda-forge/conda-forge-ci-setup-feedstock#217 . (Yay, I could just copy/paste the comment from here 😁.) |
Hopefully our parallel decompression will be okay... |
If https://github.com/conda/conda/blob/22.11.0/conda/core/package_cache_data.py#L72 sets the max. parallelism, then you're most definitely fine :). |
I suspect that cph 2.x is much more likely to encounter OOM situations than the 1.x ones -- not unlikely due to the (if I understand the comment above correctly) fact that no size hints are provided. |
(because we saw the OOM issue in the last few hours after the 2.x was published on conda-forge quite frequently) |
I did a simple test after using
|
On Linux I get about twice as much memory usage: (base) conda list conda-package-handling
# packages in environment at /opt/conda:
#
# Name Version Build Channel
conda-package-handling 1.9.0 py310h5764c6d_1 conda-forge
(base) /usr/bin/time --format=%M cph c python-3.10.8-h4a9ceb5_0_cpython.conda --out-folder /tmp/ python-3.10.8-h4a9ceb5_0_cpython.conda
706156
(base) rm /tmp/python-3.10.8-h4a9ceb5_0_cpython.conda
(base) mamba install conda-package-handling\>=2 -y >/dev/null 2>&1
(base) conda list conda-package-handling
# packages in environment at /opt/conda:
#
# Name Version Build Channel
conda-package-handling 2.0.1 pyh38be061_0 conda-forge
(base) /usr/bin/time --format=%M cph c python-3.10.8-h4a9ceb5_0_cpython.conda --out-folder /tmp/ python-3.10.8-h4a9ceb5_0_cpython.conda
1394704 |
#169 calls compressor() once, and should halve the memory usage to be comparable to libarchive-using 1.9. The size hints are easy to get at https://python-zstandard.readthedocs.io/en/latest/compressor.html?highlight=stream_writer#zstandard.ZstdCompressor.stream_writer but I'm not currently prepared to reintroduce temporary files, to be able to use size hints. It would be a good experiment to measure their impact. It might be easy to add them on extract in conda-package-streaming. |
I wonder about the practicality of using I'd suggest that we pull the |
https://github.com/conda/conda-package-handling/blob/memory/tests/memory_test.py uses same memory with times=1 or =156 |
Interesting.. Any idea why we saw different behavior for |
Some nuance of reference counting surely. Do we have an excuse to use memray? |
If it's any comfort the extract part of |
2.0.2 released |
I'd still be good to have a ticket open to track the questions from #167 (comment) (i.e., until it's ensured that it doesn't cause memory leaks if |
Checklist
What happened?
zstd
appears to be using a surprising amount of memory on Windows, even if the package is small. It shouldn't actually use more memory than the total size of the uncompressed data. It looks like it might be trying to allocate that memory anyway?Conda Info
No response
Conda Config
No response
Conda list
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: