Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using default in HashBuffers::reset #147

Closed
wants to merge 3 commits into from

Conversation

Lonami
Copy link

@Lonami Lonami commented Mar 1, 2024

My own library indirectly depends on miniz via use flate2::write::GzDecoder;. A user reported to me a stack overflow, which seems to come from the default implementation. I don't know if they're using Windows, but it might be similar to #21.

I don't actually know for sure if reset is the problem, but it seemed like it doesn't crash immediately, so it might be the reset as opposed to initializing the buffers the first time around.

I have kept the #[inline], but I don't know if it still makes sense (hopefully the optimizer can emit a single memset; I'm having trouble running the benchmarks on Windows).

@Lonami Lonami force-pushed the reset-hashbuffers-inplace branch from 7a5a136 to 5e8c3df Compare March 1, 2024 21:15
@oyvindln
Copy link
Collaborator

oyvindln commented Mar 2, 2024

The compiler ought to be able to optimize that yeah but I'll have to check. I will be away for a week so may not be able to check properly until then.

@Lonami
Copy link
Author

Lonami commented Mar 2, 2024

Unfortunately "ought to optimize it" is not a good thing to rely on when it can abort the entire process (not my screenshot):
stack overflow

At first I wanted to make all buffers a Vec, before I realized they were already Boxed.

Initialization seems fine, as we create the Box directly, and that's probably as safe as we can be it will be done directly on the heap.

But this may run on the stack and then be copied depending on optimization level.

@Lonami
Copy link
Author

Lonami commented Mar 2, 2024

The person I mentioned sent me an update, with a screenshot too:

The default was called in new function of DictOxide
debugger stack trace

So that's quite unfortunate. I am not sure if we have ways to initialize the structure on the heap directly in a guaranteed way without unsafe.

There's certainly been discussion on this before https://users.rust-lang.org/t/how-to-create-large-objects-directly-in-heap/26405.

@Lonami Lonami force-pushed the reset-hashbuffers-inplace branch from 5e8c3df to 60ed8cb Compare March 2, 2024 21:47
@Lonami
Copy link
Author

Lonami commented Mar 2, 2024

Instead of simply changing the reset function to replace self, I've changed the entire structure with three Vec and removed the Box.

I don't know if we can do much better without unsafe (but I saw this crate bans unsafe).

@Lonami Lonami force-pushed the reset-hashbuffers-inplace branch from 60ed8cb to 4a4e63c Compare March 2, 2024 21:50
@Lonami
Copy link
Author

Lonami commented Mar 2, 2024

I guess three boxed slices is better.

@Lonami
Copy link
Author

Lonami commented Mar 2, 2024

I asked them to try with my fork:

[patch.crates-io]
miniz_oxide = { git = "https://github.com/Lonami/miniz_oxide.git", branch = "reset-hashbuffers-inplace" }

and:

Yep can confirm, no more stack overflow
in both debug and release mode

@Lonami
Copy link
Author

Lonami commented Mar 3, 2024

Let me know if I should reset the last commit. 85KB is quite a bit to allocate on the stack in case the optimization fails again. So while this makes the object another usize large (to store the slice's length), I think it might be good to do it anyway for consistency.

@oyvindln
Copy link
Collaborator

oyvindln commented Mar 3, 2024

I don't think we can use the method in those commits without introducing unsafe or drastically worsening performance. I can't easily profile until next week as I'm not at home but losing the length as part of the type will likely prevent the compiler from optimizing several bounds checks in the performance critical sections of the code.

Is the person having stack overflow issues having them in release mode?

@oyvindln
Copy link
Collaborator

oyvindln commented Mar 3, 2024

Anyhow, we used Box::default as a workaround as Box::new didn't optimize out the stack copy in the past, but that's probably quite outdated now with stuff like this change, and the current rust sdl lib implementation of box::default calls box::new so I think that needs to be re-checked any maybe that would help here. (the reason why it helped in the past was that box::default used the old special box keyword directly on the structs default trait implementation previously)

@Lonami
Copy link
Author

Lonami commented Mar 3, 2024

without introducing unsafe

Yep, it's quite unfortunate. But until Rust stabilizes a solution, I think we should choose either safety or performance.

Maybe this change should be under #[cfg(target_os = "windows")]? As I think Windows stack size might be smaller by default.

But, aborting the process on overflow isn't really better than a performance hit.

or drastically worsening performance

Let's wait on the benchmarks. Perhaps we can add assert_eq!(buffer.len(), EXPECTED) at the start of the functions and hope that helps.

Is the person having stack overflow issues having them in release mode?

From what I understood, yes.

@DCNick3
Copy link

DCNick3 commented Mar 6, 2024

This approach can potentially be helpful, as it presumably allows to construct a boxed array without blowing the stack: rust-lang/rust#63291 (comment)

@oyvindln
Copy link
Collaborator

oyvindln commented Mar 6, 2024

As noted I will look into whether using new instead of default helps solve this and whether I can reproduce the stack overflow in release on windows but if not that method seems reasonable as it seems to avoid the other issues and just requires bumping the minimun rust version to 1.56.0.

@oyvindln
Copy link
Collaborator

Are you sure this is happening in release mode/with optimizations turned on? When looking at the generated assembly output in release mode I can only see the stack pointer being incremented by 28 bytes in the DictOxide::new() function, which ends up being optimized to just a alloc and memset calls with the underlying object functions inlined. So while the current init code isn't completely optimal and can be reworked a bit now that rust properly seem to optimize Box::new() properly which it hasn't always done, it already seem to be optimizing out the stack copies fine.
Screenshot_20240312_192904

I also couldn't get it to trigger a stack overflow when running the unit tests on windows. that said with optimizations off it may end up using a bit more stack space which of course will add up in a larger application using stack space elsewhere which I guess might be the issue here. It looks like it's running in debug mode (which normally doesn't have optimization active) at least in the screenshots.

@Lonami
Copy link
Author

Lonami commented Mar 12, 2024

I haven't tried reproducing the problem. All I know is the person that had the problem before, does not have it after this PR.

@Lonami
Copy link
Author

Lonami commented Mar 12, 2024

Some more info.

The person was testing in a Windows server. It did not have the latest version of rustc. After update they no longer get the stack overflow in release mode, but they still get it in debug mode.

Unfortunately they didn't check the rustc version prior to upgrading.

@DCNick3
Copy link

DCNick3 commented Mar 12, 2024

It would still be nice to have library work in debug mode too...

@oyvindln
Copy link
Collaborator

oyvindln commented Mar 30, 2024

Yeah it works fine in debug mode normally. It's more an issue with how Rust works than this library in particular that compiling with no optimizations results in a lot of stack usage so a few tens of kilobytes of stack usage in many different libraries can add up in a large project.

I will have to look into whether it's an issue in release in older versions.

@oyvindln oyvindln closed this Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants