Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WinDX] Handle "graphics device removed" scenario #6265

Open
MichaelDePiazzi opened this issue Mar 22, 2018 · 20 comments

Comments

Projects
None yet
7 participants
@MichaelDePiazzi
Copy link
Contributor

commented Mar 22, 2018

Occasionally I get a crash report due to a SharpDX exception with error code DXGI_ERROR_DEVICE_REMOVED. I know this can happen normally due to a number of reasons (e.g. video card removed, driver upgrade, driver error, etc).

The reason I am raising this issue is because XNA seems to handle this scenario gracefully and recover from it automatically. Whereas MonoGame currently just silently swallows the exception when it first occurs in GraphicsDevice.Present(), which then results in it just being thrown in the next place which happens to try to do something relating to the graphics device. (N.B. I tested this using dxcap -forcetdr as described here)

So ideally, MonoGame should also be able to recover from this. I took a look into how to do this, and found the following article which describes how this scenario should be handled:
https://docs.microsoft.com/en-us/windows/uwp/gaming/handling-device-lost-scenarios

Basically, the Direct3D device must be reinitialised which is straightforward enough to do. But, all device-dependent resources must also be recreated. So using a texture as an example, I imagine this means that the original data must be kept so that it can be used to later recreate the texture after the device has been reinitialised.

I started messing around with this and seemed to get the device reinitialised correctly, but got stuck on the resource recreation part. fwiw, here's my code from GraphicsDevice.DirectX.PlatformPresent():

try
{
    var syncInterval = PresentationParameters.PresentationInterval.GetSyncInterval();

    // The first argument instructs DXGI to block n VSyncs before presenting.
    lock (_d3dContext)
        _swapChain.Present(syncInterval, PresentFlags.None);
}
catch (SharpDX.SharpDXException ex)
{
    if ((ex.ResultCode == SharpDX.DXGI.ResultCode.DeviceRemoved) || (ex.ResultCode == SharpDX.DXGI.ResultCode.DeviceReset))
    {
        _swapChain.Dispose();
        _swapChain = null;
        CreateDeviceResources();
        Reset();
    }
    else
    {
        throw;
    }
}

So I'm not really too sure how to proceed with this. I'm hoping someone else may be able to help out here or offer some insight. But at the very least, I'd say that GraphicsDevice.DirectX.PlatformPresent() should stop swallowing the SharpDX exceptions as that just muddles the true source of the error.

@harmwind

This comment has been minimized.

Copy link

commented Mar 22, 2018

I think this is related to #2539.

In XNA, the GraphicsDevice has a DeviceResourceManager which keeps track of all the resources. When the device has been lost and the GraphicsDevice is being reset, the DeviceResourceManager checks all its resources of type IDynamicGraphicsResource and triggers their ContentLost event. This allows the developer to recreate these resources. Examples of classes that implement IDynamicGraphicsResource in XNA are RenderTargets and Dynamic Vertex and Index buffers.

@KonajuGames

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2018

Android already does similar for when an activity is recreated. See ContentManager.ReloadGraphicsContent(). Explicitly created resources such as RenderTargets, textures loaded through FromStream() and buffers need to be recreated by the developer because they are in the best position to know how to recreate their own resources. That is what the DeviceLost and DeviceReset events in GraphicsDevice are for. Android makes sure the OnDeviceReset event is triggered after it has called ReloadGraphicsContent().

XNA does indeed have internal interfaces IGraphicsResource and IDynamicGraphicsResource that appear to try to copy the data for recreation of the resource, but if the device is lost or reset, how do you get the data that was in the buffer?

@harmwind

This comment has been minimized.

Copy link

commented Mar 22, 2018

I was indeed talking about explicitly created resources. Before MonoGame we used XNA and a DeviceLost event was quite common under DirectX9 (e.g. lock your screen or screensaver kicked in). The ContentLost event of the DynamicVertexBuffers we used to recreate the buffers by filling them from source data. In our case these were used for height map tiles in a quadtree. (the reason we use dynamic buffers is because the source data is modified in realtime by excavation equipment so the tiles need to be updated contantly).
Since I use MonoGame I didn't get DeviceLost events anymore, but I believe that is because it works differently in DirectX11.

@MichaelDePiazzi

This comment has been minimized.

Copy link
Contributor Author

commented Mar 23, 2018

Thanks for the insight guys!

I tested a few different texture sources in XNA to see what would happen to them after forcing a "device removed" event. This is what I tested:

  • Texture2D loaded from an XNB via the content manager
  • Texture2D loaded from a PNG using FromStream
  • Texture2D programmatically created using SetData
  • RenderTarget2D with RenderTargetUsage.PreserveContents filled with a colour

Also note that RenderTarget2D has the ContentLost event, whereas Texture2D does not.

All of the Texture2D's were automatically restored by XNA after the device was "removed". However, the RenderTarget2D was not restored and was blank afterwards. It's ContentLost event was fired though.

So it seems like graphics resources with the ContentLost event (I assume this must be from the mentioned IDynamicGraphicsResource) are expected to be recreated by the developer when the event is fired. Whereas graphics resources without this event (IGraphicsResource I'm guessing) seem to be automatically restored.

So in regards to XNA automatically restoring Texture2D's (regardless of their source) and how it might be doing that - The only thing I can think of is that before it creates the texture on the GPU, it must be keeping a copy of the original data. And then when the device is reset, it copies that data back to the GPU again.

This should be easy enough to do in MonoGame (for WinDX at least), but will obviously use more memory.

Any thoughts on this?

@harmwind

This comment has been minimized.

Copy link

commented Mar 23, 2018

XNA uses a resource manager to store all device resources. These resources are stored in (system) memory as WeakReferences. When the device is lost, the resource manager will try to recreate all resources of type IGraphicsResource by restoring the data from system memory. The IDynamicGraphicsResources can't be restored like this (since their data is dynamic), so their ContentLost event is triggered in that case to allow the developer to restore them.

When you memory profile XNA, all these WeakReferences are probably what make up the large unmanaged memory block. So this indeed uses more memory (system memory that is, not GPU memory).

I'm not sure how MonoGame handles this, but I didn't find a similar mechanism yet. I also didn't encounter DeviceLost events anyway, but I read that DeviceLost events occur less often in DirectX11 than they did in DirectX9 (XNA).

@MichaelDePiazzi

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2018

Thanks @harmwind.

I might try and take another crack at this if I can find some time. But I'm not going to worry about replicating these internals since they're not part of XNA's public API. Please let me know if you think these are actually required though (cc @KonajuGames @tomspilman).

What I'm taking from this is that the original resource data is kept in system memory and restored automatically when the device is lost. And for any resources with a ContentLost event (i.e. the "dynamic" resources), they won't do this but they must fire this event when the device is lost (btw, XNA also seems to fire this when manually calling GraphicsDevice.Reset).

And I'll also make sure that the GraphicsDevice.DeviceLost event is fired when the device is lost (same as XNA).

I also didn't encounter DeviceLost events anyway, but I read that DeviceLost events occur less often in DirectX11 than they did in DirectX9 (XNA).

This is my understanding as well from what I've read. Apparently this is mainly due to DX9 treating the graphics device as an exclusive resource (so it's "lost" every time something else takes focus), whereas DX11 treats it as a shared resource. They do still happen though, just much less often. The differences are desribed a bit more here.

@harmwind

This comment has been minimized.

Copy link

commented Mar 24, 2018

For me it hasn't been an issue yet that this part from XNA seems to be missing in MonoGame. XNA was only a small part of our software suite and only last month I found time to finally migrate from XNA to MonoGame (allowing us to go 64 bit and have a library that is maintained and worked on actively).
The ContentLost events of IDynamicGraphicsResource was the only part (in our code) that was not implemented in MonoGame and I found issue #2539 describing this.

I'm not sure whether this automatic restore mechanism and firing of ContentLost events should be implemented. Our software will soon be field tested (with MonoGame) and then we'll know if DeviceLost events will still occur.
Removing a video card or updating a driver are not scenario's that need to be supported while running the software and I think this also applies to most games.

MonoGame is already used for many different games and applications and the fact that only few people have noticed this automatic resource restoring is missing indicates that it's not really an issue.
Did you encounter it yourself already or only by triggering is by using forcetdr?

@MichaelDePiazzi

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2018

@harmwind Yeah, I've received several crash reports from players of my game relating to this. It's not common, but it does happen on occasion. I'd definitely prefer to recover from it gracefully if possible rather than my players getting a crash or a message telling them to restart the game.

@nkast

This comment has been minimized.

Copy link
Contributor

commented Mar 24, 2018

@MichaelDePiazzi
at some point you might want to call the ContentManager.ReloadGraphicsContent().

FYI, there is a bug there that cause memory leaks.

  1. Some readers will create a new resource instead of reloading the existingInstance. The old resource remains in the .disposableAssets and .loadedAssets.
  2. The ContentManager does not lookup the existingInstance of Shared and Embedded resources. One example are the VertexBuffers and IndexBuffers of Model.

here is an old PR with all the necessary fixes. #3725

@harmwind

This comment has been minimized.

Copy link

commented Mar 24, 2018

@nkast
Could you explain why the old PR (#3725) was closed? Was it because WP8 support was dropped?
I think it is still relevant to other platforms.

@nkast

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2018

I don't quite remember but probably it was after the WP8 drop and/or a clean up of stagnant branches.
I still have a patch with those changes in my fork and never had an issue, although the ReloadGraphicsContent() code path is not actively used nowadays.

@MichaelDePiazzi

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2018

Thanks @nkast, I will definitely keep all of that in mind. Although for WinDX desktop, it seems like it will be easier and faster to just keep a copy of the data in system memory.

ReloadGraphicsContent() code path is not actively used nowadays.

Actually, @KonajuGames mentioned that it was being used by Android. So in this case I'd assume the fixes in your PR would still be useful, right?

@willmotil

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2018

I thought xna just called Load() again on a device reset and set these callback flags for stuff the user didn't put in load content that he needed to manually reload or reset himself. So if you set up your rendertargets in load shouldn't they get recreated as well ?
Not sure on that but...

@Jjagg

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2018

I thought xna just called Load() again on a device reset

If I understand you correctly, you're talking about how it was in XNA 1.0, but they changed this behavior in XNA 2.0 so the user no longer has to worry about device lost.

Details in a blog post by our Lord and savior Shawn Hargreaves: https://blogs.msdn.microsoft.com/shawnhar/2007/12/12/virtualizing-the-graphicsdevice-in-xna-game-studio-2-0/

@tomspilman

This comment has been minimized.

Copy link
Member

commented Mar 27, 2018

I thought xna just called Load() again on a device reset

Yeah it doesn't.

t seems like it will be easier and faster to just keep a copy of the data in system memory.

That should be our path forward on desktop systems where this matters. Note this is a platform specific solution, so it shouldn't be in common code... but in code only executed by platforms that need it.

Does OpenGL on Mac/Linux have "device lost" like events and do we need to restore static texture resources? Or is this just a DirectX on Windows issue? Does it affect iOS? @cra0zy @dellis1972 ?

Android does have this issue... but we solve it by loading from disk as it is not feasible to keep them in memory. That said... would it be better if we hit disk on Windows too?

@tomspilman tomspilman added this to the 3.8 Release milestone Mar 27, 2018

@tomspilman

This comment has been minimized.

Copy link
Member

commented Mar 27, 2018

FromStream()

I bet XNA restores textures loaded via Texture2D.FromStream() .

Also i bet if you SetData on a loaded texture from either ContentManager or FromStream that XNA restores it including the modifications at runtime. This then means it always modifies the system memory cache first then copies it back to the GPU.

This is likely why they do not load content back from disk as they couldn't reproduce the modifications done at runtime.

But again remember... this only happens in the case of a GPU failure of some sort. It does seem like alot of overhead and trouble for an infrequent issue. Still we want to be XNA compatible... so we should do it.

@MichaelDePiazzi

This comment has been minimized.

Copy link
Contributor Author

commented Mar 28, 2018

I bet XNA restores textures loaded via Texture2D.FromStream() .

Yeah, it does. I tested this and a few other cases (see #6265 (comment)).

Also i bet if you SetData on a loaded texture from either ContentManager or FromStream that XNA restores it including the modifications at runtime.

I hadn't specifically tested modifying a loaded texture, but just tried it then and can confirm that XNA also restores it including the modifications.

But again remember... this only happens in the case of a GPU failure of some sort. It does seem like alot of overhead and trouble for an infrequent issue. Still we want to be XNA compatible... so we should do it.

Yeah, it does seem like a lot of work for what is an infrequent issue. But I'd definitely prefer to recover from it gracefully if possible rather than my players getting a crash or a message telling them to restart the game. So if I can find some time, I will definitely take another crack at this. And, as you mentioned, I will make sure any changes for this are only in the WinDX platform specific code.

@harmwind

This comment has been minimized.

Copy link

commented Mar 28, 2018

@MichaelDePiazzi
I have been trying myself to reproduce the DeviceLost events, but have only been able to do so by triggering it with dxcap -forcetdr. I'm also worried a bit that my users will experience crashes due to this happening, but can't reproduce it. Even on Windows tablets changing the orientation or resolution during running is all working properly.
Did you receive crash logs or stack traces from your users? If so I would like to know what can trigger a DeviceLost event using DirectX11 (other than removing the GPU or updating the driver).

@MichaelDePiazzi

This comment has been minimized.

Copy link
Contributor Author

commented Mar 28, 2018

@harmwind
Same here. I've tried a few things, but the only way I've managed to trigger this is with dxcap -forcetdr also.

And yes, I have received crash logs complete with stack traces from my users. Although they don't tell me a whole lot at the moment because:

  1. MonoGame is currently swallowing SharpDX exceptions on Present, so it ends up throwing from the next thing that does something with the graphics device. So it's difficult to know for sure if it was thrown from Present originally (although this is fairly likely based on what the DirectX documentation says).
  2. If you do get a device removed error, I found out you need to call GetDeviceRemovedReason to find out why it was "removed". But I haven't started doing this yet.

So that I can get more info, I've patched MonoGame to no longer swallow the exception on Present. And then if it does happen, I'm calling GetDeviceRemovedReason and logging the result. But I haven't pushed this change to my users yet. I plan to soon though.

I'm not sure of all of the scenarios that can cause a "device removed" error in DX11. But the article I linked to earlier lists a few potential examples as follows:

  • The graphics driver is upgraded.
  • The system changes from a power-saving graphics adapter to a performance graphics adapter.
  • The graphics device stops responding and is reset.
  • A graphics adapter is physically attached or removed.

And GetDeviceRemovedReason can return the following potential reasons:

  • DXGI_ERROR_DEVICE_HUNG: The application's device failed due to badly formed commands sent by the application. This is an design-time issue that should be investigated and fixed.
  • DXGI_ERROR_DEVICE_REMOVED: The video card has been physically removed from the system, or a driver upgrade for the video card has occurred. The application should destroy and recreate the device.
  • DXGI_ERROR_DEVICE_RESET: The device failed due to a badly formed command. This is a run-time issue; The application should destroy and recreate the device.
  • DXGI_ERROR_DRIVER_INTERNAL_ERROR: The driver encountered a problem and was put into the device removed state.
  • DXGI_ERROR_INVALID_CALL: The application provided invalid parameter data; this must be debugged and fixed before the application is released.
@harmwind

This comment has been minimized.

Copy link

commented Mar 28, 2018

@MichaelDePiazzi
OK, thanks for the info. In my case (MonoGame embedded in Winforms), I'm not using Present at all but instead I'm using SwapChainRenderTargets which have their own Present. All exceptions that occur there are also swallowed and ignored. The exceptions that occur are always SharpDX exceptions, but often not of the type DeviceRemoved or DeviceReset. I'm getting SharpDX.SharpDXException: 'HRESULT: [0x80070057], Module: [General], ApiCode: [E_INVALIDARG/Invalid Arguments], Message: The parameter is incorrect.
The solution is however still the same (I used your code example from your first post) and put it in a separate public method for testing.
It is something we have to keep in mind when you find a way to properly recover and reload the graphics resources. When only using SwapChainRenderTargets it will be a bit more difficult to trigger the recover since the exception doesn't happen in the GraphicsDevice, but in the SwapChainRenderTarget.Present (which is called from user code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.