Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Rework of -profile=gc calculations #2607

Merged
merged 2 commits into from
May 28, 2019
Merged

Conversation

RazvanN7
Copy link
Contributor

@RazvanN7 RazvanN7 commented May 15, 2019

This is a revamp of #1846 . I'm having a hard time understanding the results of the test cases added in the original PR. Any help would be appreciated.

cc @leandro-lucarella-sociomantic

Edit: It looks like there is a missing call to resetThreadStats that causes the tests to fail. Can anyone help?

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @RazvanN7! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + druntime#2607"

src/gc/impl/conservative/gc.d Outdated Show resolved Hide resolved
@@ -41,24 +42,26 @@ __gshared

extern (C) void profilegc_setlogfilename(string name)
{
logfilename = name;
logfilename = name ~ "\0";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they are passed to C functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namely fopen

src/rt/profilegc.d Outdated Show resolved Hide resolved
@rainers
Copy link
Member

rainers commented May 15, 2019

As noted in the original PR, I'm not a fan of this command line switch. Here's an example how to implement proper and extended tracing with a proxy GC: https://github.com/rainers/visuald/blob/dmdserver/tools/tracegc.d#L231

* Resets runtime stats for currently active GC implementation of current thread
* See `core.memory.GC.Stats` for list of available metrics.
*/
static void resetThreadLocalStats() nothrow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reset function is never a good idea with independent clients checking the same value. Just use difference to a previous value.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is detecting overflows of the counter for long lived processes. But it might be acceptable anyway. I have the feeling there were other reasons for the reset, but I can't remember right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counter will overflow just the same as the difference between two consecutive calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage for applications is they can reset the counter and then just look at it later. If they can't reset, they need to save the old value then subtract it from the current value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still a bad API especially with multiple clients. Do you also reset the system clock if you are doing benchmarks?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rainers has a point. The overflowing is still passing quite some complexity to the client but it's true there are workarounds to detect it and deal with it.

@@ -395,6 +398,8 @@ class ConservativeGC : GC
alloc_size = size;
}
gcx.leakDetector.log_malloc(p, size);
bytesAllocated += alloc_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread locals can be slow. Versioning this under a compile time condition might be acceptable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compile time is not a good option IMHO, because then you can't use this feature with a normal compiler. Two solutions come in mind. One is doing the proxy trick and providing some sort of flag to pick the GC you want to use, the tracing GC being one of the options. I don't remember if that mechanism is already in place and if currently any other GC is provided by the official distribution. Another option would be to do a simple runtime if (tracing) bytesAllocated += alloc_size;, that should be fast enough for all users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost is there but should be evaluated relative to the surrounding time. I recall I measured a while ago and an allocation took several thousand cycles even in the best case. The additional overhead would be difficult to measure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a short measurement with the vdparser benchmark, adding the TLS counter costs about 1% for Win64 (which probably has one of the faster implementation of TLS), though that measurement was done on a mobile processor with disabled speedstepping. Thats probably also consistent with the full allocation taking about 1000 cycles, given the additional instructions generated.

@leandro-lucarella-sociomantic
Copy link
Contributor

I'm having a hard time understanding the results of the test cases added in the original PR. Any help would be appreciated.

The test is just allocating some memory and then verified the profiler got the right places and the amount of memory allocated is reported correctly. If you have more details on what needs clarification, please let me know.

Also ping @mihails-strasuns just in case you have some time.

@leandro-lucarella-sociomantic
Copy link
Contributor

As noted in the original PR, I'm not a fan of this command line switch. Here's an example how to implement proper and extended tracing with a proxy GC: https://github.com/rainers/visuald/blob/dmdserver/tools/tracegc.d#L231

Interesting.

@leandro-lucarella-sociomantic
Copy link
Contributor

Thanks @RazvanN7 for picking this up!

@RazvanN7
Copy link
Contributor Author

RazvanN7 commented May 21, 2019

Thanks for the reply @leandro-lucarella-sociomantic . What I don't understand regarding the tests is the amount that is reported to be allocated. For example, the first line for linux 64 in the expected results file says:

288	              1	immutable(char)[][int] D main src/profilegc.d:23

Looking at line 23, we have:

string[int] aa = [1:"one", 2:"two", 3:"three"];

This implementation reports 464 bytes allocated instead 288 for this particular line. I have no idea why this difference occurs; I don't even know why it should report 288; as far as I can tell, it might be possible that the number represents the sum of previous allocations + the current one; it is very likely that a reset should have occurred but it didn't, however I do not know what can/should cause resets. Any information on this is welcome.

cc @mihails-strasuns

@mihails-strasuns
Copy link
Contributor

@RazvanN7 naturally this test case is extremely fragile - it has to check for allocation size which is implementation defined and thus test needs to be updated each time implementation changes. It may also differ between platforms even at a single point of time (thus separate expected outputs per platform).

In case of AA - most likely it used to allocate exactly 288 bytes at the point of implementing original PR, all these numbers are taken from actual GC reports and not calculated manually. It is easily possible that something has changed in druntime/compiler since than and now the expected amount is different.

@RazvanN7
Copy link
Contributor Author

So, simply tweaking the test is an acceptable option?

@mihails-strasuns
Copy link
Contributor

Sure, it must match what GC actually does.

@RazvanN7
Copy link
Contributor Author

RazvanN7 commented May 21, 2019

This seems to be green now, so it will probably be merged soon.

@leandro-lucarella-sociomantic what other stats are you interested in? After this is merged, can we hop on implementing other stats or should we try to come up with a generic design that would potentially expose anything that the GC has to offer?

@RazvanN7
Copy link
Contributor Author

Windows failure is unrelated.

@RazvanN7 RazvanN7 force-pushed the op_profile branch 2 times, most recently from 8f65d6d to cc0349d Compare May 21, 2019 14:38
@codecov
Copy link

codecov bot commented May 21, 2019

Codecov Report

Merging #2607 into master will increase coverage by 0.14%.
The diff coverage is 17.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2607      +/-   ##
==========================================
+ Coverage   73.82%   73.96%   +0.14%     
==========================================
  Files         145      145              
  Lines       16840    16815      -25     
==========================================
+ Hits        12432    12438       +6     
+ Misses       4408     4377      -31
Impacted Files Coverage Δ
src/core/gc/gcinterface.d 0% <ø> (ø) ⬆️
src/gc/impl/proto/gc.d 13.46% <ø> (ø) ⬆️
src/gc/impl/manual/gc.d 1.58% <ø> (ø) ⬆️
src/rt/tracegc.d 0% <0%> (ø) ⬆️
src/gc/proxy.d 50% <0%> (-0.69%) ⬇️
src/core/memory.d 95.63% <0%> (-0.47%) ⬇️
src/rt/profilegc.d 14.75% <20%> (+1.29%) ⬆️
src/gc/impl/conservative/gc.d 78.99% <83.33%> (+0.07%) ⬆️
src/rt/aaA.d 88.01% <0%> (-0.28%) ⬇️
src/rt/lifetime.d 84.48% <0%> (-0.11%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3906c3a...cc0349d. Read the comment docs.

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@@ -395,6 +398,8 @@ class ConservativeGC : GC
alloc_size = size;
}
gcx.leakDetector.log_malloc(p, size);
bytesAllocated += alloc_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost is there but should be evaluated relative to the surrounding time. I recall I measured a while ago and an allocation took several thousand cycles even in the best case. The additional overhead would be difficult to measure.

@@ -41,24 +42,26 @@ __gshared

extern (C) void profilegc_setlogfilename(string name)
{
logfilename = name;
logfilename = name ~ "\0";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namely fopen

@RazvanN7 RazvanN7 force-pushed the op_profile branch 3 times, most recently from 9a20357 to 4aca77c Compare May 21, 2019 15:02
@leandro-lucarella-sociomantic
Copy link
Contributor

@leandro-lucarella-sociomantic what other stats are you interested in? After this is merged, can we hop on implementing other stats or should we try to come up with a generic design that would potentially expose anything that the GC has to offer?

Thanks a lot @RazvanN7 !

I recently notice dmd 2.085 already have some improvements on that front! https://dlang.org/phobos/core_memory.html#.GC.ProfileStats

Probably the most important stats for us are numCollections and totalPauseTime. I just created an issue in ocean so we can make use of it: sociomantic-tsunami/ocean#711.

As I mentioned in #1846 (comment), more info could be provided (allocation info) but at this point this is not a blocker for us, just a nice to have.

The other pending issue we have is about GC.stats being O(n) instead of O(1) (like in D1): #2164

The trade off for this is introducing bugs in the calculation if much easier if calculations are kept up to date all the time (but I think that also happens with the profiling info, so I'm not sure now.

@rainers
Copy link
Member

rainers commented May 21, 2019

Here's an example how to implement proper and extended tracing with a proxy GC: https://github.com/rainers/visuald/blob/dmdserver/tools/tracegc.d#L231

For reference, here is a skeleton that works for hooking the conservative GC with a tracing proxy: https://gist.githubusercontent.com/rainers/6cdf73b48837defb9f88/raw/7e776acd806f44668e5d615a17bdac20047fec64/tracegc.d
One downside of that approach is that you have to analyze the callstack to find the allocating call. That needs to be done anyway for all calls in phobos/druntime that have not been compiled with -profile=gc.

@dlang-bot dlang-bot merged commit 5d2e7f0 into dlang:master May 28, 2019
@leandro-lucarella-sociomantic
Copy link
Contributor

I see this was finally merged, thanks again for picking it up and pushing for it.

About the reset, I agree with @rainers that, even when convenient for most use cases, having a reset is a bad API design and we should consider removing it before a first release.

@wilzbach
Copy link
Member

wilzbach commented Jun 4, 2019

we should consider removing it before a first release.

The next release is on 1.7 - with the first beta appearing on 15.6.

@leandro-lucarella-sociomantic
Copy link
Contributor

#2629

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants