Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment variable for --memory #4028

Open
catleeball opened this issue Apr 17, 2024 · 2 comments
Open

Environment variable for --memory #4028

catleeball opened this issue Apr 17, 2024 · 2 comments

Comments

@catleeball
Copy link

catleeball commented Apr 17, 2024

Is your feature request related to a problem? Please describe.

Sometimes I use tools that utilize zstd, but they don't allow me to pass arbitrary flags to zstd (in this case rg -z). In these cases, the tool won't work in cases where the archive requires additional flags to decompress (such as needing --long=32 or --memory=2048MB)

Describe the solution you'd like

It would be great if there was an environment variable like ZSTD_MEMORY or similar to specify window size for compression/decompression.

This would allow me to use tools where I'm not able to pass options to zstd to decompress files that would need a flag like --long=32 or --memory=2048MB.

Describe alternatives you've considered

In this case, I wrote a shell script using xargs that ran zstdcat --long=31 on files and piped them to rg.

Additional context

In this case, I was using rg to gather statistics on a large compressed text corpora. Usually rg is nice for search a lot of compressed files in a directory, but these had been compressed with --long, so I got an error[1].

I also opened a bug in the ripgrep repo to support passing flags to the compression libraries it uses. That said, this would be a really nice general feature of zstd.

Thank you! :)

Footnotes

[1] Error message from rg, bubbling up an error from zstd:

-------------------------------------------------------------------------------                                                                                                               
./submissions/RS_2018-01.zst:                                                                                                                                                                 
-------------------------------------------------------------------------------                                                                                                               
sions/RS_2018-01.zst : Decoding error (36) : Frame requires too much memory for decoding                                                                                                      
sions/RS_2018-01.zst : Window size larger than maximum : 2147483648 > 134217728                                                                                                               
sions/RS_2018-01.zst : Use --long=31 or --memory=2048MB                                                                                                                                       
------------------------------------------------------------------------------- 
@Cyan4973
Copy link
Contributor

Cyan4973 commented Apr 17, 2024

We could easily solve this issue by removing any limitation (or selecting a larger limit) for the Window Size at decompression time.

While I'm sure it would better fit your use case, the problem is, in other environments, this could be considered a DoS attack vector: a crafted input could instruct the decompressor to allocate a vast amount of memory, with potentially some nasty consequences (we mitigate the worst of it by only reserving the memory address space, and the physical memory itself is normally not used until this amount of data is actually produced, but still).
More security conscious environments will not like such a possibility.

Our proposed solution is to offer a flag so that users can explicitly "opt-in" to employ more memory on their system when they want to.

But it seems this solution is not appropriate for your use case.

Now your proposition would be to change the default behavior, but as you can guess, this is always going to be in tension with other environments which would prefer other defaults.

The next solution that comes to mind is to provide some control through environment variables.
https://github.com/facebook/zstd/tree/v1.5.6/programs#passing-parameters-through-environment-variables
This is used precisely in these situations where the zstd CLI is indirectly invoked and there is no way to pass parameters via the intermediate caller.
There are already a few variables defined there, one idea could be to add another one, for example ZSTD_MEMORY, that would then change the default window size that the zstd decompressor accepts to allocate without requiring an explicit authorization. It could be extended or restricted, depending on your locally preferred default. And of course, the command line could still be used to step off those boundaries on demand.

@catleeball
Copy link
Author

An environment variable for memory would be great too! I didn't know about the DoS attack vector, thanks for noting that.

I'll go ahead and edit the tile and comment to be an FR to be for an environment variable instead.

@catleeball catleeball changed the title Auto-detect when --long is needed to decompress Environment variable for --memory Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants