Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Memory Leak in Windows 10, AIR 27 when using Starling RenderTexture. #20

Open
hardcoremore opened this issue Sep 5, 2017 · 36 comments

Comments

@hardcoremore
Copy link

hardcoremore commented Sep 5, 2017

Hi,

I have found which I assume is GPU memory leak when using Starling RenderTexture.

Only once on enter frame I call this:

override public function clear():void
{
    var child:DisplayObject;
 
    var image:Image;
    var line:StarlingLine;
 
    while(drawSprite.numChildren > 0)
    {
        child = drawSprite.getChildAt(0);
 
        image = child as Image;
        line = child as StarlingLine;
 
        if(image)
        {
            // Am I disposing RenderTexture properly here???
            image.texture.dispose(); // this will always be RenderTexture
            image.dispose();
        }
        else if(line)
        {
            line.dispose();
        }
 
        drawSprite.removeChildAt(0);
    }
}

This is called multiple times (between 50-300 times) on enter frame:

var canvas:Canvas = new Canvas();
var polygon:Polygon = new Polygon(starlingVertices);
 
var alpha:Number = GetFillAlpha();
 
if(isSolid === false)
{
    alpha = 0.1
}
 
canvas.beginFill(color, alpha);
canvas.drawPolygon(polygon);
canvas.endFill();
 
var texture:RenderTexture = new RenderTexture(width, height);
    texture.draw(canvas);
 
var image:Image = new Image(texture);
    image.x = posX + (width / 2);
    image.y = posY + (height / 2);
    image.alignPivot();
    image.readjustSize();
 
this.drawSprite.addChild(image);

I have NVIDIA GTX 770 (from 2013) with 2GB og VRAM on my PC. And I have installed Afterburner. These are the images that show Starling debug draw together with After burner GPU statistics.

This memory leak only occurs when I am using RenderTexture so many times on enter frame.

As you can see in Image 1 memory looks good both in Starling and AfterBurner but once I start using RenderTexture (Image 2 and Image 3) on enter frame the memory in AfterBurner goes up to the max and it is only released when I close the app.

Memory is only raised in AfterBurner and on Starling debug draw it remains good at all times. But after some time the memory is max out in AfterBurner I get error that back buffer
creation is failing.

Image 1:
https://ibb.co/hgQGCQ

Image 2:
https://ibb.co/dCpORk

Image 3:
https://ibb.co/hkiTsQ

Here is the link to download sample project to reproduce this memory leak:

https://goo.gl/VDVe5F

When you run application the app will start to flickr in about a minute (probably because back buffer can not be created because the memory is full) and than it will crash.

I used Flash Builder 4.7 to create this test app.

It is also worth mentioning that I am on a Windows 10 and Adobe AIR 27 Beta which means that Stage3D will load Direct X 11 .dll

I am using latest Starling from master branch.

Regards,

Caslav

@PrimaryFeather
Copy link
Contributor

@hardcoremore Please also create an issue on tracker.adobe.com — this issue tracker here is just an add-on to simplify discussing AIR issues in the community. Thanks in advance!

(And don't forget to post the link here!)

@PrimaryFeather
Copy link
Contributor

Additional info: I tested this on macOS 10.12.6, where the sample application ran without issues. I could reproduce this on a Windows notebook, though.

@hardcoremore
Copy link
Author

I do not know how to create issue in adobe tracker for this bug because it is happening through Starling. I can not describe what steps are needed to reproduce it. I can only write what am I doing in Starling to reproduce this memory leak

@PrimaryFeather
Copy link
Contributor

That doesn't matter — Starling does not contain any platform specific code, so if a sample produces a problem on Windows, and not on macOS, it's proof enough for me that it's probably an AIR/driver/etc. issue.

So just include / reference the sample project as you did above, and that's fine.

@hardcoremore
Copy link
Author

hardcoremore commented Apr 27, 2020

Hi @ajwfrost ,

What is the status for this bug? It is still happening with latest AIR 33.1.1.98. I have just tested it on Windows 10 and there is still huge GPU memory leak, so after some time my game becomes unplayable.

Regards,

Caslav

@ajwfrost
Copy link
Collaborator

Hi @hardcoremore - I have someone looking into the associated bug here with memory leak in the GPU (caused by filters? with Direct3D 11) - it's probable that this is the same root cause I suspect, particularly as this is reproducible only on Windows (and presumably if you drop this back to software or to Direct3D 9 then it no longer happens?)

Just quickly looking at it again on my machine, it runs for a short while before getting very slow. There's a lot of ActionScript going on, but concerningly I'm also seeing a fair amount of garbage collection time in Scout.

Looking at the allocations vs deallocations in Scout, it looks like there are a number of classes where we have a lot of allocations but not deallocations, it would perhaps be a good idea if you trace out the size of some of your lists to see if things aren't being cleaned up as promptly as they could be. But I think this is a red herring in terms of the GPU memory: the critical things (textures/vertex buffers) look okay so this is likely to be the same issue as before, which we think is related to the vertex buffers in D3D11..

thanks

@hardcoremore
Copy link
Author

hardcoremore commented Apr 28, 2020

Hi @ajwfrost,

(and presumably if you drop this back to software or to Direct3D 9 then it no longer happens?)

Yes this is correct. Also this is not happening on Mac OS, on Mac OS everything is working fine.

There's a lot of ActionScript going on, but concerningly I'm also seeing a fair amount of garbage collection time in Scout.

Yeah sorry about that, I used that project for many other bugs as well. For this one you should only look at dfpck.FilterMemoryLeak class.

Looking at the allocations vs deallocations in Scout, it looks like there are a number of classes where we have a lot of allocations but not deallocations, it would perhaps be a good idea if you trace out the size of some of your lists to see if things aren't being cleaned up as promptly as they could be

If I understand you correctly there is a number of ActionScript classes where deallocation is not done properly inside Adobe AIR source code, internally? I agree there is a lot of GC there.

@ajwfrost
Copy link
Collaborator

Thanks :-)

The allocations / deallocations thing could become a concern if the number of open objects continued to climb. Normally there are two parts to the 'garbage collection': one is reference counting, and the other is a mark-and-sweep garbage collector. Reference counting is cheap/easy so we want to use that as much as possible, but sometimes we have to use the mark/sweep approach which is where you sometimes see stutters in gameplay if there is a lot going on within the 'sweep' phase. Here, it seems there's a lot being done in 'sweep' which could mean two objects referring to each other still (hence still with reference counts) but being isolated from everything else i.e. hanging around with a circular reference but no way to be accessed from the main application. This is a situation to try to avoid. Ultimately, it will all get cleaned up, it just takes more time and effort.

There is one thing I think where we saw objects being added to a vector that then just grew: those objects would never be garbage collected as long as the vector remained reachable..

Anyway, that's a tangent, we can check on this GPU issue..

thanks!

@aclipski
Copy link

Just wanted to bump this up, since this issue is causing significant problems for our game's users on Steam. Typically after about an 30-60 minutes of continuous play, our Windows players start to see corrupted textures which forces them to restart the app, and in some cases restart their computer.

Has there any progress on fixing this issue? Thanks!

@ajwfrost
Copy link
Collaborator

We were looking at this one yesterday, the developer is trying to narrow things down and checking all the GPU memory allocations/deallocations. It seems there's no quick way to find the problem here so we're having to add a lot of trace output in a lot of places and then go through the resulting data to match up allocations vs deallocations, which is a bit painstaking due to the way in which the GC stuff works..

So in progress and I'm hoping for a good outcome soon, but it's been a bit challenging..

@hardcoremore
Copy link
Author

Hi @ajwfrost,

Any luck with this one, were you able to resolve it?

Regards,
Caslav

@ajwfrost
Copy link
Collaborator

@hardcoremore sorry, still under investigation, but we had to pause this as a few higher-priority issues cropped up that the developer had to focus on..

We'd managed to go through more of the code and couldn't see any allocate/deallocate mismatch - I'm starting to understand why this wasn't addressed previously by Adobe! A bit early yet to blame Microsoft/Direct3D11 drivers, we'll keep digging (the other tasks should be completed soon and this is her next priority).

thanks

@hardcoremore
Copy link
Author

Hi @ajwfrost,

Was there any progress on this issue?

Regards,
Caslav

@ajwfrost
Copy link
Collaborator

@hardcoremore we spent more time trying to look into this, but the picture is very unclear due to the amount of hanging AS3 objects that are around.. we need to perhaps retry this with intensive GC going on to see if the memory still increases: the D3D11 profiling that we've done suggests that there aren't any leaks happening which may mean (a) the increase in GPU memory is purely down to AS3 objects that haven't yet been garbage collected, or (b) there's something going wrong within D3D11 perhaps. This is a somewhat strange issue...

@hardcoremore
Copy link
Author

@ajwfrost,

Thanks for the response. Can you please explain a little more what hanging AS3 objects means and why it is happening?

It is really interesting to learn how AIR works internally. Understanding AIR better will definitively allow use to write "smarter" code and optimize it in a way we did not know it was possible.

Thanks,
Caslav

@ajwfrost
Copy link
Collaborator

@hardcoremore would love to write more on this..!

The simple view is that in ActionScript, objects are garbage collected in one of two ways:
a) cheap option = reference counting. You have a local variable, it goes out of scope, it gets cleaned up at that same point (internally it gets added to a 'zero count table' which is frequently cleaned out).
b) expensive option = mark-and-sweep. You have an object that references another one, which has a circular reference back to the first. We only find out about this when we do the full garbage collection which starts off gradually going through and little-by-little marking all the objects that can be reached from one of the 'roots' (e.g. the Stage object, or your first start-up class instance that the runtime creates). Once this has finished, we do a "sweep" (which is where sometimes you'll see the runtime freeze as this is done all at once..) where we clean up all of the unmarked objects such as those with circular references.

There are other things that mean that the reference counting isn't sufficient, so this is a very simplistic view. The issue is that the mark-and-sweep garbage collection only runs based on certain memory triggers, and if your app isn't hitting these, you can end up with ActionScript objects that no one has any reference to but that haven't yet been cleaned up. If they're holding on to native assets such as BitmapData or GPU buffers, then this memory is also still hanging around.

Anyway - we can tweak some of the parameters within the virtual machine to change how the mark-and-sweep garbage collection is triggered to make this really aggressive i.e. always do a full mark-and-sweep between each frame, which would be incredibly bad from a general performance perspective, but will help us isolate issues such as this where it might be that native resources are being held open by ActionScript objects that were going to be cleaned up at some point in the future based on the internal thresholds/triggers...

thanks

@hardcoremore
Copy link
Author

hardcoremore commented Jun 23, 2020

Hi @ajwfrost,

Thanks for the explanation. I was always wondering, especially in today devices where we have so much memory available why there is no system method in AS3 which we can call to completeley disable GC to run not and separate method to enable it again. More precisley I want to be able to allow GC to make all the reference counts and everything but not to do any cleaning or heavy tasks until I allow it.

I imagine it like this: When the user starts playing level in a game call AS3 System method to disable GC cleaning and let objects be there no matter how high memory gets. And than being able to run GC after player lost life or at the end of the level. There is no use of lowering FPS
dramatically during gameplay (which happanes today with AIR while GC is running) to clean several mb of RAM when user on their machine has 8GB of RAM available. We need more freedom and control over GC.

Is it possible for you to implement just 2 System methods in AIR. One method that will prevent GC cleaning no matter what. And other method that will always trigger GC when called 100% not only based on some parameters.

I want to be able to tell AIR to never run GC clean up while user is playing level no matter how much memory is allocated. I don't care I want the control.

This would mean a lot for games and can improve expirience greatly, how dificult it is to implement this?

Thanks,

Caslav

@FliplineStudios
Copy link

@hardcoremore You do already have have System.gc() and System.pauseForGCIfCollectionImminent() with some thresholds to give you some control over when the garbage collection happens. They're more like suggestions to the garbage collector so they don't always kick in when you'd like. We've had some luck with getting it to trigger some of the time with this (there was a bug or an issue at one point in the past where System.gc() had to be called twice -- I think once to finish marking and once for the sweep):

System.pauseForGCIfCollectionImminent(0.1);
System.gc();
System.gc();

Actually preventing the GC from running at all sounds like a bad idea, there are times when the OS tells an app that it HAS to clean up some resources to free up some memory (especially on mobile), and if it can't run the GC at that point the app would likely just crash.

@hardcoremore
Copy link
Author

hardcoremore commented Jun 25, 2020

Actually preventing the GC from running at all sounds like a bad idea, there are times when the OS tells an app that it HAS to clean up some resources to free up some memory (especially on mobile), and if it can't run the GC at that point the app would likely just crash.

That does not sound logical to me at all.
How can OS know what part of the memory do you need and want to use?
Why would OS crash a perfectly running app because it wants you to free 10 or 20 MB of RAM, that doesn't make sense?

And besides no one can predict all the use cases for the AIR in all of the platforms it supports, That's why all I want is control over it, a simple switch that can enable or disable GC from doing the cleaning part. That's all, its just a boolean switch.

So that I can be sure that GC cleaning will only occur after the level ends or when character dies and thus preventing FPS drops while game is running.

@ajwfrost
Copy link
Collaborator

Giving a bit more control of the GC might be an option but it would come with warnings as you may well be making problems for yourself... there are situations in which the GC has to run, but this is generally when the operating system tells us we can't allocate any more memory. For Flash Player there was the concept of a 'hard' and 'soft' memory limit which could be configured on devices and the GC kicked in to try to ensure you didn't go over the soft limit; if we couldn't avoid going over the hard limit then the player aborted and the user saw the "out of memory" icon in their Flash instances. Actually this worked really well on the mobile websites as it allowed you to then re-start particular instances of the plug-in and leave other ones closed.. and it really highlighted which SWF files were just incredibly greedy and inefficient in their memory usage!

Anyway back to AIR: we actually have a customer for whom the application start-up time is crucial, and for them we just disabled the garbage collection until after the first 10 frames have happened. It means there's a higher memory usage during this start-up period, but knocked almost half a second off their application start-up time. (Embedded AIR, not the AIR SDK, but the principles are the same).

One thing we also found though: if we lowered the thresholds and had GC happen more frequently, then it stopped the longer pauses from happening. Little-and-often is how the 'mark' phase happens, but if we do this more frequently then it means it has less to do in the 'sweep' phase which is the bit that has to be done all at once. The thresholds for GC self-adjust based on various timings and profiling of the content and how the runtime is performing (it's very clever really!) but yes it's not like they can always anticipate the usage patterns of the content, so having the application provide hints to them is quite a good idea...

Another thing for our to-do list :-)

@hardcoremore
Copy link
Author

hardcoremore commented Jun 25, 2020

Yes please @ajwfrost it would be great to have the option for GC not to run cleaning during game play. Even if GC runs frequently so it don't have "long pauses" it really not makes any sense to not be able to disable it for certain period of time especially when there is so much memory in devices these days.

I will be always willing to sacrifice 20, 30, 50 or even 100 MB of higher memory usage but to be able to never run GC while level is running even if the GC spends 1 third of the frame (but it usually spends much more.

The more control we have over GC the better, and we can adjust for the specific use case.

@FliplineStudios
Copy link

That does not sound logical to me at all.
How can OS know what part of the memory do you need and want to use?
Why would OS crash a perfectly running app because it wants you to free 10 or 20 MB of RAM, that doesn't make sense?

I was thinking specifically of Apple's "Low Memory Warning" on iOS devices, where the OS informs the app that it's reaching the limit of what's available on the device, and if it doesn't lower its memory usage when it receives these warnings the app can be forced to terminate:
https://developer.apple.com/documentation/xcode/improving_your_app_s_performance/reducing_your_app_s_memory_use/responding_to_low-memory_warnings

On older devices with very little RAM (iPhone 5, iPad 2, etc.) we do see these low memory warnings fairly often, where the GC is more aggressive in responding to that -- I'd suspect AIR would still need to allow for scenarios like this where it would ignore your GC preferences, so the app isn't automatically closed because it didn't respond to the OS warnings.

I'd definitely like having more control over the GC though in general!

@hardcoremore
Copy link
Author

HI @ajwfrost,

Is there any update on this issue?

@ajwfrost
Copy link
Collaborator

Hi @hardcoremore - actually yes .. we had been getting a little tied up with the different test cases that were in your project but one of the guys had been looking further into it and realised we were using the wrong case (lots and lots of small balls appearing and the whole thing just stopped after a few frames..)
Now using the correct test case (!) and we see the leak and have identified there's something going wrong with reference counting. So when the rectangle texture that's created by Starling is disposed, we're actually holding on to the underlying object for some reason.
No fix yet, but we're getting closer..

@hardcoremore
Copy link
Author

Awesome, finally, this bug is so ugly and old :)

Thanks andrew, hope to see it fixed soon.

@ajwfrost
Copy link
Collaborator

Yes well I might have spoken too soon... it turns out there are a large number of cached 'state' values involved in the rendering pipeline, so after a few dozen frames, these start to then get cleaned up. So actually the textures are being released after all.. the gpu memory should go up for a while and then stabilise with the same number of textures being destroyed as created.

We're still seeing that the memory usage is increasing .. but it's going to be more of a challenge to find out why :-(

@ajwfrost
Copy link
Collaborator

Found it! There was a 'render target view' being cached from the start of render-to-texture, but it wasn't being cleaned up properly when the next call was made to overwrite this..

So this will be fixed in our next release..

@hardcoremore
Copy link
Author

hardcoremore commented Jul 17, 2020

Awesome! So this was not connected to DirectX11 after all. When we can expect this to be released as its pretty big fix?

Thanks

@ajwfrost
Copy link
Collaborator

It was in the Direct3D11 implemention of Stage3D, hence being specific to this mechanism..
We're looking at the next release probably early August to get some Android fixes/updates in too...

@hardcoremore
Copy link
Author

It would be great to push this GPU memory leak as soon as possible as it is very critical.

@aclipski
Copy link

Agreed. Our game's community is hanging on every word spoken in this thread, we're quite anxious for a remedy :)

@hardcoremore
Copy link
Author

Hi @ajwfrost,

Is there a new release soon with this fix?

@hardcoremore
Copy link
Author

I have tested this and it is finally fixed in Harmans AIR SDK 33.1.1.217 that came out today :D. Finally!

Its working fine on my machine.

@aclipski
Copy link

Unfortunately I'm unable to test out this fix for our users, since 33.1.1.217 has introduced a 100% reproducible crash on Windows, somewhere in the Direct3D code.

From the Windows Event Viewer:

Faulting application name: WarOfOmens.exe, version: 0.0.0.0, time stamp: 0x5f296c48
Faulting module name: d3d11.dll, version: 10.0.17763.1, time stamp: 0x13a31007
Exception code: 0xc0000005
Fault offset: 0x000000000011f4c7
Faulting process id: 0x2ef0
Faulting application start time: 0x01d671b5720cc286
Faulting application path: C:\Program Files (x86)\Steam\steamapps\common\War of Omens\WarOfOmens.exe
Faulting module path: C:\Windows\SYSTEM32\d3d11.dll
Report Id: 234a05c7-6628-4121-802b-2f9c097b4994
Faulting package full name: 
Faulting package-relative application ID:  

We are using Away3D 4.1.6 for some 3D elements in our game, and our Steam client on Windows crashes as soon as it tries to render anything that contains a directional light. As far as I know this doesn't occur on any of our other platforms (Flash Player on Mac/Windows, iOS, Android, and native Mac)

The crash goes away completely if I revert to the AIR version from our previous releases (33.1.1.98).

I'm trying to isolate the exact cause through process of elimination, but until this is fixed, I won't ever be able to upgrade our game past version 33.1.1.98. I'm going to do some more investigation but intend to open a new bug for this crash once I've gathered a little bit more information.

Any idea why this new version might have caused this crash to start occurring? I'd be happy to share our project's code if that would help.

Thanks!

@ajwfrost
Copy link
Collaborator

@aclipski that's a concern! - are you able to get this to give a full dump file so that we can get at the call stack there? or could we get your application (contents of the "War of Omens" folder should be enough)? I believe the only fix related to D3D11 that we put in was this one, we can go review the change again in case it's not properly protecting against an error condition or similar...

@aclipski
Copy link

@ajwfrost I'm not sure how to generate a full dump file, but I can probably put together a standalone client that you should be able to run and reproduce the problem. How would you prefer I send this client to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants