Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory on Windows #167

Closed
2 tasks done
dholth opened this issue Nov 30, 2022 · 21 comments
Closed
2 tasks done

Out of memory on Windows #167

dholth opened this issue Nov 30, 2022 · 21 comments
Labels
in-progress issue is actively being worked on locked [bot] locked due to inactivity type::bug describes erroneous operation, use severity::* to classify the type

Comments

@dholth
Copy link
Contributor

dholth commented Nov 30, 2022

Checklist

  • I added a descriptive title
  • I searched open reports and couldn't find a duplicate

What happened?

zstd appears to be using a surprising amount of memory on Windows, even if the package is small. It shouldn't actually use more memory than the total size of the uncompressed data. It looks like it might be trying to allocate that memory anyway?

Traceback (most recent call last):
  File "C:\Miniforge\Scripts\conda-mambabuild-script.py", line 9, in <module>
    sys.exit(main())
  File "C:\Miniforge\lib\site-packages\boa\cli\mambabuild.py", line 256, in main
    call_conda_build(action, config)
  File "C:\Miniforge\lib\site-packages\boa\cli\mambabuild.py", line 228, in call_conda_build
    result = api.build(
  File "C:\Miniforge\lib\site-packages\conda_build\api.py", line 186, in build
    return build_tree(
  File "C:\Miniforge\lib\site-packages\conda_build\build.py", line 3091, in build_tree
    packages_from_this = build(metadata, stats,
  File "C:\Miniforge\lib\site-packages\conda_build\build.py", line 2374, in build
    newly_built_packages = bundlers[pkg_type](output_d, m, env, stats)
  File "C:\Miniforge\lib\site-packages\conda_build\build.py", line 1694, in bundle_conda
    conda_package_handling.api.create(
  File "C:\Miniforge\lib\site-packages\conda_package_handling\api.py", line 117, in create
    raise err
  File "C:\Miniforge\lib\site-packages\conda_package_handling\api.py", line 110, in create
    out = format.create(prefix, file_list, out_fn, out_folder, **kw)
  File "C:\Miniforge\lib\site-packages\conda_package_handling\conda_fmt.py", line 111, in create
    component_stream.close()
zstd.ZstdError: zstd compress error: Allocation error : not enough memory

Conda Info

No response

Conda Config

No response

Conda list

No response

Additional Context

No response

@dholth dholth added the type::bug describes erroneous operation, use severity::* to classify the type label Nov 30, 2022
@jezdez
Copy link
Member

jezdez commented Nov 30, 2022

@dholth
Copy link
Contributor Author

dholth commented Nov 30, 2022

Our conda-build update, to correctly pass 16 from a config file, instead of the (too high?) default of 22, should also reduce memory usage.

I am having trouble figuring out how much from the zstd manual or man pages.

@dholth
Copy link
Contributor Author

dholth commented Nov 30, 2022

A quick memory test.

import zstandard
import sys
import io

compressor = zstandard.ZstdCompressor(level=int(sys.argv[1]))
writer = compressor.stream_writer(io.BytesIO())
writer.write(b"hello")
writer.close()

OSX's /usr/bin/time shows us memory needs:

% /usr/bin/time -l python memtest.py 12
36225024 maximum resident set size
% /usr/bin/time -l python memtest.py 16
44367872 maximum resident set size
% /usr/bin/time -l python memtest.py 19
95436800 maximum resident set size
% /usr/bin/time -l python memtest.py 22
691208192 maximum resident set size

So we go from 36MB, 44MB all the way up to 691MB. The conda-build update should help - we were experiencing a bug where conda-forge wanted -16 but got the default of -22.

I think it is possible to pass the input size into zstdcompressor, if we use the API more smartly. This would help when packages are less than e.g. the 691MB buffer seen in -22.

I would be open to changing our default to -19 which is the highest the command line tool lets you do, without passing --ultra...

@dholth
Copy link
Contributor Author

dholth commented Nov 30, 2022

conda-package-handling 2.0's conda creation code is in the style of conda-package-streaming, but is independent https://github.com/conda/conda-package-handling/blob/main/src/conda_package_handling/conda_fmt.py#L104

(It would be tricky but very nice to figure out a good "give me successive objects that can stream data into the .conda zip archive" API, all in memory, to go into conda-package-streaming)

@mbargull
Copy link
Member

mbargull commented Nov 30, 2022

I highly recommend going for -19 at max. Not just because of the resources during compression but also for the decompression on the users' side.
For decompression you'd need some sub 10M base memory plus the window size (which is capped by the input size, i.e., it's most relevant for big packages), e.g.,

  • level 19: window size: 2**23=8M
  • level 20: window size: 2**25=32M
  • level 21: window size: 2**26=64M
  • level 22: window size: 2**27=128M

For compression the higher levels additionally use bigger search tables which is why the memory increases more.
See https://github.com/facebook/zstd/blob/v1.5.2/lib/compress/clevels.h#L44-L50 for the actual (logarithmic) values used for each level and https://manpages.ubuntu.com/manpages/focal/man1/zstd.1.html for a short description of those.

You get slightly higher resource usage for -19 compared to -18 during compression (and the same low max x+8M for decompression). I'd go for one of those and try -19 first and see if we encounter cases that demand lowering it to -18 if needed.

@mbargull
Copy link
Member

We're also probably going for -19 as the default on the conda-forge side, see conda-forge/conda-forge-ci-setup-feedstock#217 . (Yay, I could just copy/paste the comment from here 😁.)

@dholth
Copy link
Contributor Author

dholth commented Nov 30, 2022

Hopefully our parallel decompression will be okay...

@mbargull
Copy link
Member

Hopefully our parallel decompression will be okay...

If https://github.com/conda/conda/blob/22.11.0/conda/core/package_cache_data.py#L72 sets the max. parallelism, then you're most definitely fine :).
Plus, you need to parallelly extract a bunch of >=128M tarballs to get peak memory usage. And only the current builds from conda-forge from the last two weeks are affected (AFAICT Anaconda's are transmuted with a custom lower level, i.e., not affected).
No need to worry there.

@mbargull
Copy link
Member

I think it is possible to pass the input size into zstdcompressor, if we use the API more smartly. This would help when packages are less than e.g. the 691MB buffer seen in -22.

I suspect that cph 2.x is much more likely to encounter OOM situations than the 1.x ones -- not unlikely due to the (if I understand the comment above correctly) fact that no size hints are provided.

@mbargull
Copy link
Member

mbargull commented Nov 30, 2022

(because we saw the OOM issue in the last few hours after the 2.x was published on conda-forge quite frequently)

@dholth
Copy link
Contributor Author

dholth commented Dec 1, 2022

I did a simple test after using cph x to extract a package, and then recreating it, using OSX's peak-memory-showing /usr/bin/time. This can be run in 1.x and 2.x since the CLI is the same. create, c should be more like what conda-build does, compared to transmute. The memory usage looks consistent with cph 2.0 keeping two zstd compressors in memory at the same time (for the pkg- and info- components). Though libarchive as used in 1.9 is definitely using a different version of the zstandard library.

/usr/bin/time -l cph c python-3.11.0-h93c2e33_0_cpython --out-folder /tmp/ python-3.11.0-h93c2e33_0_cpython.conda

@mbargull
Copy link
Member

mbargull commented Dec 1, 2022

On Linux I get about twice as much memory usage:

(base) conda list conda-package-handling
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
conda-package-handling    1.9.0           py310h5764c6d_1    conda-forge
(base) /usr/bin/time --format=%M cph c python-3.10.8-h4a9ceb5_0_cpython.conda --out-folder /tmp/ python-3.10.8-h4a9ceb5_0_cpython.conda 
706156
(base) rm /tmp/python-3.10.8-h4a9ceb5_0_cpython.conda 
(base) mamba install conda-package-handling\>=2 -y >/dev/null 2>&1
(base) conda list conda-package-handling
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
conda-package-handling    2.0.1              pyh38be061_0    conda-forge
(base) /usr/bin/time --format=%M cph c python-3.10.8-h4a9ceb5_0_cpython.conda --out-folder /tmp/ python-3.10.8-h4a9ceb5_0_cpython.conda 
1394704

@dholth
Copy link
Contributor Author

dholth commented Dec 1, 2022

#169 calls compressor() once, and should halve the memory usage to be comparable to libarchive-using 1.9.

The size hints are easy to get at https://python-zstandard.readthedocs.io/en/latest/compressor.html?highlight=stream_writer#zstandard.ZstdCompressor.stream_writer but I'm not currently prepared to reintroduce temporary files, to be able to use size hints. It would be a good experiment to measure their impact. It might be easy to add them on extract in conda-package-streaming.

@mbargull
Copy link
Member

mbargull commented Dec 1, 2022

I wonder about the practicality of using python-zstandard though.. From a first glance I could not see an exposed method on the compressor instance to free memory and del compressor also didn't seem to free anything.
For a CLI invocation of cph or a single package build gh-169 would be enough.
But what about recipes with multiple outputs? What about the conda-build test suite?
And most importantly: Does the decompressor behave the same? Then we'd have a problem with package extraction in conda itself -- no matter if parallel or serial...

I'd suggest that we pull the conda-forge::conda-package-handling=2.0.1 packages until those questions are resolved.

@dholth
Copy link
Contributor Author

dholth commented Dec 1, 2022

https://github.com/conda/conda-package-handling/blob/memory/tests/memory_test.py

uses same memory with times=1 or =156

@mbargull
Copy link
Member

mbargull commented Dec 1, 2022

Interesting.. Any idea why we saw different behavior for cph?

@dholth
Copy link
Contributor Author

dholth commented Dec 1, 2022

Some nuance of reference counting surely.

Do we have an excuse to use memray?

@dholth
Copy link
Contributor Author

dholth commented Dec 1, 2022

If it's any comfort the extract part of conda-package-streaming used by this package was originally written for a system that extracts every single package in defaults in a loop, to do some analysis.

@dholth
Copy link
Contributor Author

dholth commented Dec 1, 2022

2.0.2 released

@dholth dholth closed this as completed Dec 1, 2022
@dholth dholth added the in-progress issue is actively being worked on label Dec 1, 2022
@mbargull
Copy link
Member

mbargull commented Dec 1, 2022

I'd still be good to have a ticket open to track the questions from #167 (comment) (i.e., until it's ensured that it doesn't cause memory leaks if CondaFormat_v2.create is run multiple times in the same Python process).

@dholth
Copy link
Contributor Author

dholth commented Dec 13, 2022

#176

@dholth dholth closed this as completed Dec 13, 2022
@github-actions github-actions bot added the locked [bot] locked due to inactivity label Dec 14, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
in-progress issue is actively being worked on locked [bot] locked due to inactivity type::bug describes erroneous operation, use severity::* to classify the type
Projects
Archived in project
Development

No branches or pull requests

3 participants