Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.

Fix cache inconsistency when free'ing from GC #797

Merged
merged 2 commits into from
May 22, 2014
Merged

Fix cache inconsistency when free'ing from GC #797

merged 2 commits into from
May 22, 2014

Conversation

etcimon
Copy link
Contributor

@etcimon etcimon commented May 21, 2014

After a pointer is freed, its size and info is not removed from cache thus if it is allocated for another purpose any subsequent realloc calls will refer to the previous size.

This can cause memory corruption with an application that uses GC.free, delete or realloc(p, 0), or a complete crash if a big allocation's size is cached and the page pool was reduced with minimize, ie. the call to realloc will try to resize in-place rather than fetch a new allocation, and it will try to copy over bytes from/to an invalid memory region causing an Access Violation vibe-d/vibe.d#470

I also added USE_CACHE consistency and had it evaluated at compile-time.

After a pointer is freed, it is not removed from cache thus if it is allocated for another purpose any subsequent realloc calls will refer to the previous size.

This can cause memory corruption with an application that uses `GC.free`, `delete` or `realloc(p, 0)`, or a complete crash if a big allocation's size is cached and the page pool was reduced with minimize, ie. the call to `realloc` will try to resize in-place rather than with malloc.

Verify USE_CACHE at compile-time

Fix alignment
if (p == cached_info_key){
cached_info_key = cached_info_key.init;
cached_info_val = cached_info_val.init;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache fields are member of Gcx, so it's gcx.cached_info_key.

@MartinNowak
Copy link
Member

Thanks a lot for taking this down.

Testing .editorconfig again...

Identation should be fine now...
@etcimon
Copy link
Contributor Author

etcimon commented May 22, 2014

Thanks a lot for taking this down.

Np! I'm also planning adding sampling to the GC right now. I was surprised to see it's an easy optimization that could be added on top of this.

Basically, when marking, you take 1 in X of the references and send them to a specific array that represents the pool they refer to. Then, next time you're going to collect you test them individually and if they're mostly there, you skip marking/free'ing for that particular pool during collection. You can force collection on certain pools every 1 in X collections to even out the average lifetime of the references.

You're going to want to have a lower certainty of failure for big allocations, but basically you're using probabilities to avoid pushing a lot of useless load on the processor, especially when you're in a part of an application that's just allocating a lot (sampling will determine that the software is not in a state of data removal).

http://en.wikipedia.org/wiki/Bayes_factor

@MartinNowak
Copy link
Member

s/taking/tracking/ :).

@MartinNowak
Copy link
Member

Auto-merge toggled on

@Safety0ff
Copy link
Contributor

I'm in favor of removing the size cache (the info cache can stay.)
When I did some tests I noticed that the hit rate was very poor whereas the info cache was acceptable. The only time it's really useful is when you have a sequence of GC calls which all deal with the same memory block.

@etcimon
Copy link
Contributor Author

etcimon commented May 22, 2014

When I did some tests I noticed that the hit rate was very poor whereas the info cache was acceptable.

Did you test with an appender?

@etcimon
Copy link
Contributor Author

etcimon commented May 22, 2014

@MartinNowak Have you considered using a special GC.BlkAttr.NO_SCAN on appender!string in phobos? Or when doing new string in general? I've never seen an immutable char array carry pointers in it and it seems like it would save most of the application memory from being scanned.

@Safety0ff
Copy link
Contributor

Did you test with an appender?

I actually used the phobos unittest suite as my test. The results was 1.4% hit rate (120k queries). I don't have the numbers anymore (except for the overall) but only a 1 to 3 unittests gave decent hit rates.
The testsuite however, might not have given accurate results on how real code behaves, but I never got around to doing more tests.

The info cache had a hit rate of 51%.

@etcimon
Copy link
Contributor Author

etcimon commented May 22, 2014

I actually used the phobos unittest suite as my test.

It wouldn't represent the normal uses. I think most applications need to interact with i/o streams, which relies on the appender. When appending, you're often going to call extend about 10-20 times in a row.

MartinNowak added a commit that referenced this pull request May 22, 2014
Fix cache inconsistency when free'ing from GC
@MartinNowak MartinNowak merged commit a3f4378 into dlang:master May 22, 2014
@Safety0ff
Copy link
Contributor

When appending, you're often going to call extend about 10-20 times in a row.

With the current implementation the cache only helps in the early return case (when you can't extend due to the size being greater than the page size.)

I want to remove redundant pool searches. Currently the worst case path in extend is 2 findPool calls which are logarithmic in the number of pools.
If caching really provides benefit, the perhaps a cache in findPool can replace both cache and size caches since finding the size / info given a pool is done in contant time.

@etcimon
Copy link
Contributor Author

etcimon commented May 22, 2014

I think the best way to remove redundant pool searches is sampling for the mark() step. With sampling, you can mark pools to be skipped entirely during collection. For that, you need a little Bayesian probabilistic statistics (machine learning) but the tradeoff is a >90% faster collection, when it's not skipped altogether.

Second, the easiest way to make it faster is to adopt the use of NO_SCAN from a higher-level, using the type information. If you're not recursing uselessly, you won't call findPool, and the savings are exponential to the size of the application.
http://forum.dlang.org/thread/lll9dr$2ll7$1@digitalmars.com

@Safety0ff
Copy link
Contributor

I mean caching findPool for calls done outside of marking (I'd expect it would be subject to trashing in mark.)

@etcimon
Copy link
Contributor Author

etcimon commented May 22, 2014

I mean caching findPool for calls done outside of marking (I'd expect it would be subject to trashing in mark.)

Oh, yes of course. I don't think it has a significant cost either way and removing cache removes complexity which makes bugs easier to catch ;)

@etcimon etcimon deleted the patch-1 branch May 22, 2014 17:49
@MartinNowak
Copy link
Member

NO_SCAN for appender would be great. I have to think a bit more about the sampling, sounds interesting.

@etcimon
Copy link
Contributor Author

etcimon commented May 22, 2014

In the forums, it was told that noscan was already implemented inside the typeinfo. I'm not sure if it's used with in-scope declarations here: https://github.com/D-Programming-Language/phobos/blob/58112802ed34206c21d39a6c359be2445fd46804/std/array.d#L2289 ?

@MartinNowak
Copy link
Member

Ah, Appender is already using NO_SCAN (here here).
Can you explain the sampling in a little more detail or provide some reading?

@Safety0ff
Copy link
Contributor

@etcimon could you please create a test case for this pull request to avoid regression?

@etcimon
Copy link
Contributor Author

etcimon commented May 23, 2014

Can you explain the sampling in a little more detail or provide some reading?

Ah, yes it took a while to find it again because my textbooks were in french.

http://books.google.ca/books?id=b-XFrpBQ7d0C&lpg=PA119&pg=PA154#v=onepage&q&f=false

The pages 154 to 183 are relevant, there's a lot of types of probabilistic distributions but these pages talk about the important ones. At first a good sampling algorithm can do a great job assuming a binomial, hypergeometric, bernouilli, or poisson distributions, they're pretty much a law of nature anyways.

Basically we'd keep a sample of ..say.. n: 30 pointers found in the heap that point to a pool table, if there is a total of N: 400. This is a sample size chosen according to the distribution. and then we'd test the probability of >10% of them having been deleted based on how many of the 30 failed *(sample.p)==sample.pOldVal. If the null hypothesis e.g. Ho: the probability of >10% pointers were changed in population N is <25% is rejected (this could even be determined with a sample of 3-4 pointers for a N: 400 population if we can be flexible), we run a collection on the pool, and we can even verify these statistics and re-adjust the model (std deviation & average) from there, to be even more precise and necessitate an even smaller sample the next time.

Ideally the distribution would change dynamically with the bayes factor (never tried it but it looks interesting ;) and so would the std deviation and other parameters. This doesn't need to be completed for the method to save CPU cycles though, it'll do what it has to do: delay collection for pools that don't need it right away.

@MartinNowak
Copy link
Member

Interesting idea, but first we'd need to figure out how to save CPU cycle using that, because as it stands you'd still need to scan the pool for pointers to other pools, right? Not sure if simply excluding a pool from being marked saves a lot of cycles. Scanning the whole pool at once, instead of selectively, will be a little faster though.

@etcimon
Copy link
Contributor Author

etcimon commented May 23, 2014

as it stands you'd still need to scan the pool for pointers to other pools,

Well, the way I understand it is that you're not scanning the pool for pointers if it's set as NO_SCAN. These pools would only contain basic types, like a char[40] pointed to by a char* from another pool (which would not be in a NO_SCAN pool).

edit: oh I thought you were talking about the NO_SCAN idea from the other thread. I'm tired..

Yes, the way I see it is that this would mostly save time if all the collections are probably unchanged, as in not so many deletions. It would add the ability to skip collection most of the time (80% of collections are done for 0.01% of the data? I don't know?), fitting to pools would help with being more precise, and maybe even more permissive about the importance of them (small allocations, meh...)

@MartinNowak
Copy link
Member

So that would be during a phase when your program allocates a lot of memory without freeing anything. Maybe a coarser but simpler heuristic to not trigger collection during those phases could have the same effect. For example taking the relative amount of memory reclaimed.

@etcimon
Copy link
Contributor Author

etcimon commented May 23, 2014

A relative amount implies you compare allocations vs deallocations, how would you go about guessing it in advance? And you're still freeing things, just not enough to trigger collection

@MartinNowak
Copy link
Member

You can't know it in advance, but you could adapt thresholds after a collection or do some more sophisticated prediction for the next collection. The same applies to keeping track of still alive pointers to a certain pool though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants