Don't panic on dropped frames #18

JCapucho · 2022-09-13T21:10:43Z

On some lower end systems it's possible in some high pressure moments hit an edge case where a frame is dropped whilst buffers are still waiting to be mapped, causing wgpu to signal an error on the mapping callback and the profiler to panic.

This is fixed by tracking wether the current frame was dropped or not and ignoring buffer mappings if it was dropped.

I'm opening this PR to gather opinions on how this should be done, right now it's implemented using an AtomicBool, but it could be possibly be done trough the already existing mapped_buffers using a sentinel value like usize::MAX or another method.

On some lower end systems it's possible in some high pressure moments hit an edge case where a frame is dropped whilst buffers are still waiting to be mapped, causing wgpu to signal an error on the mapping callback and the profiler to panic. This is fixed by tracking wether the current frame was dropped or not and ignoring buffer mappings if it was dropped.

Wumpf · 2022-09-15T17:00:21Z

excellent catch, thank you for reporting and coming up with a first fix! :)

I'm wondering if the deeper problem isn't that we're doing an unwrap here without handling the error

            pool.resolved_buffer_slice().map_async(wgpu::MapMode::Read, move |res| {
                res.unwrap();

.. presuming I understood correctly and that unwrap is where we bail out with wgpu telling us things went sideways? Sadly wgpu doesn't distinguish any errors there so far :/. If so, I think it would be reasonable to assume that the respective error there means that we already discarded the frame.

Another way to look at this is that we shouldn't actually throw away buffers with pending maps ever. Which makes me realize that this is problematic already:

        // Make sure we don't overflow
        if self.pending_frames.len() == self.max_num_pending_frames {
            // Drop previous frame.
            let dropped_frame = self.pending_frames.pop().unwrap();
            self.cache_unused_query_pools(dropped_frame.query_pools);
        }

If we recycle a the query pools from a dropped frame, we still have a map_async going on, so we can't actually reuse it :O
Also on that note any idea why this code is dropping the previous and not the last frame?

JCapucho · 2022-09-15T17:18:30Z

.. presuming I understood correctly and that unwrap is where we bail out with wgpu telling us things went sideways? Sadly wgpu doesn't distinguish any errors there so far :/. If so, I think it would be reasonable to assume that the respective error there means that we already discarded the frame.

Yes, wgpu doesn't provide us with the reason of the error and I think this is tied to the webgpu spec, but we could ask if it's possible if an error reason be returned, or just if it's an error because of an abortion or not.

I don't think we can assume that if it's an error it's because it's already mapped, but we could just ignore that buffer (mark it as tainted or something), or discard that frame.

Another way to look at this is that we shouldn't actually throw away buffers with pending maps ever. Which makes me realize that this is problematic already:

This could prove problematic in slower systems, since it would increase memory pressure and possibly pressure on the gpu.

If we recycle a the query pools from a dropped frame, we still have a map_async going on, so we can't actually reuse it :O Also on that note any idea why this code is dropping the previous and not the last frame?

No idea, seems like a bug

Wumpf · 2022-09-15T17:26:39Z

but we could just ignore that buffer (mark it as tainted or something), or discard that frame.

I like the idea of getting rid of that unwrap() there and handle any kind of mapping failure :). I'll work on something, likely ready this weekend.

This could prove problematic in slower systems, since it would increase memory pressure and possibly pressure on the gpu.

Good point. We mustn't stack up indefinitely.

You're okay with not merging your PR until then, or would it make your life easier somehow if it was as a transitional fix?

JCapucho · 2022-09-15T17:43:06Z

You're okay with not merging your PR until then, or would it make your life easier somehow if it was as a transitional fix?

I don't need it, I'm patching it in my cargo dependencies and that works for me.

Wumpf · 2022-09-15T18:05:34Z

got a clean repro here now \o/
https://github.com/Wumpf/wgpu-profiler/blob/dropped-frames2/tests/dropped_frame_handling.rs
about time this project gets some basic testing other than the example project

Wumpf added a commit that referenced this pull request Sep 15, 2022

add repro test case for #18

4cca67d

Wumpf added a commit that referenced this pull request Sep 17, 2022

add repro test case for #18

d286843

Wumpf mentioned this pull request Sep 17, 2022

Fix crash on dropped frame when buffer mapping hasn't completed yet #20

Merged

Wumpf closed this Sep 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't panic on dropped frames #18

Don't panic on dropped frames #18

JCapucho commented Sep 13, 2022

Wumpf commented Sep 15, 2022

JCapucho commented Sep 15, 2022

Wumpf commented Sep 15, 2022 •

edited

JCapucho commented Sep 15, 2022

Wumpf commented Sep 15, 2022

Don't panic on dropped frames #18

Don't panic on dropped frames #18

Conversation

JCapucho commented Sep 13, 2022

Wumpf commented Sep 15, 2022

JCapucho commented Sep 15, 2022

Wumpf commented Sep 15, 2022 • edited

JCapucho commented Sep 15, 2022

Wumpf commented Sep 15, 2022

Wumpf commented Sep 15, 2022 •

edited