Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC behaviour #96

Closed
HurryStarfish opened this issue Feb 26, 2019 · 9 comments
Closed

GC behaviour #96

HurryStarfish opened this issue Feb 26, 2019 · 9 comments

Comments

@HurryStarfish
Copy link
Member

HurryStarfish commented Feb 26, 2019

It seems that the garbage collector used in NG is extremely aggressive in allocating memory and reluctant to return or reuse it.

I've written the following test program calls a function F that allocates a gigabyte of memory for a local variable which becomes unreachable as soon as the function returns. Then it runs the GC, prints some info to the console and waits for input before running the next iteration:

SuperStrict
Framework BRL.StandardIO

GCCollect
GCCollect
Repeat
	Local a1:Int = GCMemAlloced()
	Local a2:Int = F()
	Local c:Int = GCCollect()
	Local a3:Int = GCMemAlloced()
	
	Print "alloced initally: " + a1
	Print "alloced by F: " + (a2 - a1)
	Print "collected: " + c
	Print "left over: " + (a3 - a1)
	Input
Forever

Function F:Int()
	Local b:Byte[1000 * 1000 * 1000] ' allocate about 1GB
	Local alloced:Int = GCMemAlloced()
	Return alloced
End Function

When run in vanilla BlitzMax on my laptop (starting out with a RAM usage of ~3.5GB out of 8GB), the program produces output that looks like this:

alloced initally: 1316
alloced by F: 1000000024
collected: 1000000024
left over: 0

alloced initally: 3692
alloced by F: 999997648
collected: 1000000024
left over: -2376

alloced initally: 3704
alloced by F: 999997636
collected: 1000000024
left over: -2388

alloced initally: 3704
alloced by F: 999997636
collected: 1000000024
left over: -2388

alloced initally: 3704
alloced by F: 999997636
collected: 1000000024
left over: -2388

alloced initally: 3704
alloced by F: 999997636
collected: 1000000024
left over: -2388

This is the computer's RAM usage during that time:
image

Memory comsumption here shows little spikes, which immediately disappear again. Evidently, vanilla's GC promptly returns the garbage memory that was allocated by F in every iteration (presumably the slightly negative "left over" values from the 2nd iteration onward are due to the strings being passed to Print and Input getting reclaimed as well), and overall memory consumption between iterations stays constant.


On the other hand, this is the output for NG:

alloced initally: 262144
alloced by F: 1000525824
collected: 4614304
left over: 1000525824

alloced initally: 1000787968
alloced by F: 1033555968
collected: 4614304
left over: 1033555968

alloced initally: 2034343936
alloced by F: 1033555968
collected: 4614304
left over: 1033555968

alloced initally: -1227067392
alloced by F: 0
collected: 4614304
left over: 0

alloced initally: -1227067392
alloced by F: 1033555968
collected: 4614304
left over: 1033555968

alloced initally: -193511424
alloced by F: 0
collected: 4614304
left over: 0

image

There is some weirdness in the console output caused by the fact that GCMemAlloced and GCCollect return Ints when they should be returning Size_T, but that aside:
Apparently, NG at first neither returns nor reuses the memory allocated for the array, and instead keeps allocating more. It only starts collecting once it has exhausted all available RAM; at times causing freezes on the order of multiple seconds up to tens of seconds for itself and any other program as the computer starts trying to make room for it by swapping memory between RAM and hard disk.


I am not expecting NG to behave exactly like vanilla does, and I could understand the GC holding on to some memory to make future allocations more efficient. But the current behaviour is akin to a memory leak, which is ...very bad. Is there a way to get the GC to behave itself?

@woollybah
Copy link
Member

It might be the memory pool reservation. I notice when I hit return the first few times there's a small delay, but after that, it's much faster - probably because it now has a big chunk of memory ready for new objects.

If you create a Type and Print something from the Delete() method, you'll see that things are getting freed as you might expect - but obviously there's a lot of memory pre-allocated in the pool.

I haven't found any functions yet, which might be used to shrink it back down.
But I'd say the GC is doing whatever it is designed for.

@HurryStarfish
Copy link
Member Author

HurryStarfish commented Feb 26, 2019

It might be the memory pool reservation. I notice when I hit return the first few times there's a small delay, but after that, it's much faster - probably because it now has a big chunk of memory ready for new objects.

Yes and that's fine in principle - but not when that chunk takes up all available RAM or more, freezing the entire PC (and causing a wait of half a minute long afterwards every time I click on another program because its memory had to get pushed to the harddrive).
This test indicates that the GC in fact does not use that memory as a pool to allocate new objects from. Instead, as long as possible, every iteration of the loop allocates an additional gigabyte of RAM, despite the fact that all the "previous" memory would have been free to reuse.

@GWRon
Copy link
Contributor

GWRon commented Feb 26, 2019

Seems others have similar issues:

crystal-lang/crystal#3997

Some there state that the GC collects stuff after two cycles. First "finalizes" and second "frees".

So you might try to call GCCollect twice in your loop

@HurryStarfish
Copy link
Member Author

HurryStarfish commented Feb 26, 2019

The two-cycle thing should only apply to objects with finalizers, which byte arrays don't have, so that can't be it. Plus, I ran 6+ iterations in every test; and normally GCCollect shouldn't even be called at all, it's supposed to do that on its own. It might be collecting the objects ... but swallowing up (more than) all free RAM on the system despite not needing it is problematic.

@woollybah
Copy link
Member

I've committed some updates to the GC settings. Hopefully that will help a bit.

@woollybah
Copy link
Member

And an update to Boehm GC 8.0.4

@HurryStarfish
Copy link
Member Author

HurryStarfish commented Mar 6, 2019

I've tested this a bit more, but haven't really come to any definite conclusions. It's quite possible that my test program is flawed - possibly due to its high frequency of large allocations? The GC might be trying to estimate much much memory the program might need based on past allocations and be thrown off by this. Adding a sufficient amount of smaller allocations inbetween appears to improve the behaviour; but it's hard to tell for sure, since test results vary. I suppose trying to test GC performance with small sample like this might be a bad idea.

That said, I've noticed something else:
It seems that the GC will not collect any objects with finalizers that are connected by circular chains of references, even when all of them are unreachable. This is caused the GC's topological ordering mechanism - it refuses to finalize objects as long as they're still referenced by other, yet unfinalized, objects. With cyclic references, this results in objects never getting collected, just like with the old single-threaded legacy BlitzMax reference counting "GC". Considering how easily this can cause leaks, I think this should be addressed.
From what I understand, the solution seems to be replacing all calls to GC_REGISTER_FINALIZER with GC_REGISTER_FINALIZER_NO_ORDER and adding -DJAVA_FINALIZATION to the compile flags. The former disables topological ordering, while the latter should ensure that objects reachable during finalization still exist - so that references accessible from a Delete method can lead to objects that have already been finalized, but not objects that have been collected. (I'm not 100% sure about the last part, so it's better if you double-check.)

Test code (with topological ordering, only T3 objects get finalized and collected, while T1 and T2 objects leak):

SuperStrict
Framework BRL.StandardIO

For Local i:Int = 1 To 10
	F
Next
Print Collect() + " bytes collected"
Print Collect() + " bytes collected"
Print Collect() + " bytes collected"
Input


Function F()
	Local o1:T1 = New T1
	Local o2:T2 = New T2
	Local o3:T3 = New T3
	o1.ref2 = o2
	o2.ref1 = o1
	o3.ref2 = o2
End Function

Type T1
	Field ref2:T2
	Method New()
		Print "new T1 " + ToString()
	End Method
	Method Delete()
		Print "delete T1 " + ToString()
	End Method
End Type

Type T2
	Field ref1:T1
	Method New()
		Print "new T2 " + ToString()
	End Method
	Method Delete()
		Print "delete T2 " + ToString()
	End Method
End Type

Type T3
	Field ref2:T2
	Method New()
		Print "new T3 " + ToString()
	End Method
	Method Delete()
		Print "delete T3 " + ToString()
	End Method
End Type

Function Collect:Long()
	GCSuspend
	Print "collecting..."
	GCResume
	Return Long GCCollect()
End Function

@woollybah
Copy link
Member

That seems better. Thanks.
I've had to adjust TList - removing the Delete() from TList because now it is not ordered, I was seeing the TList Delete() being called after some of the list items had already been deleted. The Delete is no longer required as it's now able to sort itself out - regarding references.

But it may raise issues elsewhere, that may rely on Delete() occurring in a predictable way...

@woollybah
Copy link
Member

Maybe this is an encouragement to utilise IDisposable where appropriate :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants