-
-
Notifications
You must be signed in to change notification settings - Fork 422
Conversation
|
Thanks for your pull request and interest in making D better, @RazvanN7! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + druntime#2607" |
| @@ -41,24 +42,26 @@ __gshared | |||
|
|
|||
| extern (C) void profilegc_setlogfilename(string name) | |||
| { | |||
| logfilename = name; | |||
| logfilename = name ~ "\0"; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they are passed to C functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namely fopen
|
As noted in the original PR, I'm not a fan of this command line switch. Here's an example how to implement proper and extended tracing with a proxy GC: https://github.com/rainers/visuald/blob/dmdserver/tools/tracegc.d#L231 |
| * Resets runtime stats for currently active GC implementation of current thread | ||
| * See `core.memory.GC.Stats` for list of available metrics. | ||
| */ | ||
| static void resetThreadLocalStats() nothrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reset function is never a good idea with independent clients checking the same value. Just use difference to a previous value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is detecting overflows of the counter for long lived processes. But it might be acceptable anyway. I have the feeling there were other reasons for the reset, but I can't remember right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The counter will overflow just the same as the difference between two consecutive calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One advantage for applications is they can reset the counter and then just look at it later. If they can't reset, they need to save the old value then subtract it from the current value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still a bad API especially with multiple clients. Do you also reset the system clock if you are doing benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rainers has a point. The overflowing is still passing quite some complexity to the client but it's true there are workarounds to detect it and deal with it.
| @@ -395,6 +398,8 @@ class ConservativeGC : GC | |||
| alloc_size = size; | |||
| } | |||
| gcx.leakDetector.log_malloc(p, size); | |||
| bytesAllocated += alloc_size; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread locals can be slow. Versioning this under a compile time condition might be acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compile time is not a good option IMHO, because then you can't use this feature with a normal compiler. Two solutions come in mind. One is doing the proxy trick and providing some sort of flag to pick the GC you want to use, the tracing GC being one of the options. I don't remember if that mechanism is already in place and if currently any other GC is provided by the official distribution. Another option would be to do a simple runtime if (tracing) bytesAllocated += alloc_size;, that should be fast enough for all users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost is there but should be evaluated relative to the surrounding time. I recall I measured a while ago and an allocation took several thousand cycles even in the best case. The additional overhead would be difficult to measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a short measurement with the vdparser benchmark, adding the TLS counter costs about 1% for Win64 (which probably has one of the faster implementation of TLS), though that measurement was done on a mobile processor with disabled speedstepping. Thats probably also consistent with the full allocation taking about 1000 cycles, given the additional instructions generated.
The test is just allocating some memory and then verified the profiler got the right places and the amount of memory allocated is reported correctly. If you have more details on what needs clarification, please let me know. Also ping @mihails-strasuns just in case you have some time. |
Interesting. |
|
Thanks @RazvanN7 for picking this up! |
|
Thanks for the reply @leandro-lucarella-sociomantic . What I don't understand regarding the tests is the amount that is reported to be allocated. For example, the first line for linux 64 in the expected results file says: Looking at line 23, we have: string[int] aa = [1:"one", 2:"two", 3:"three"];This implementation reports 464 bytes allocated instead 288 for this particular line. I have no idea why this difference occurs; I don't even know why it should report 288; as far as I can tell, it might be possible that the number represents the sum of previous allocations + the current one; it is very likely that a reset should have occurred but it didn't, however I do not know what can/should cause resets. Any information on this is welcome. |
|
@RazvanN7 naturally this test case is extremely fragile - it has to check for allocation size which is implementation defined and thus test needs to be updated each time implementation changes. It may also differ between platforms even at a single point of time (thus separate expected outputs per platform). In case of AA - most likely it used to allocate exactly 288 bytes at the point of implementing original PR, all these numbers are taken from actual GC reports and not calculated manually. It is easily possible that something has changed in druntime/compiler since than and now the expected amount is different. |
|
So, simply tweaking the test is an acceptable option? |
|
Sure, it must match what GC actually does. |
|
This seems to be green now, so it will probably be merged soon. @leandro-lucarella-sociomantic what other stats are you interested in? After this is merged, can we hop on implementing other stats or should we try to come up with a generic design that would potentially expose anything that the GC has to offer? |
|
Windows failure is unrelated. |
8f65d6d to
cc0349d
Compare
Codecov Report
@@ Coverage Diff @@
## master #2607 +/- ##
==========================================
+ Coverage 73.82% 73.96% +0.14%
==========================================
Files 145 145
Lines 16840 16815 -25
==========================================
+ Hits 12432 12438 +6
+ Misses 4408 4377 -31
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
| @@ -395,6 +398,8 @@ class ConservativeGC : GC | |||
| alloc_size = size; | |||
| } | |||
| gcx.leakDetector.log_malloc(p, size); | |||
| bytesAllocated += alloc_size; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost is there but should be evaluated relative to the surrounding time. I recall I measured a while ago and an allocation took several thousand cycles even in the best case. The additional overhead would be difficult to measure.
| @@ -41,24 +42,26 @@ __gshared | |||
|
|
|||
| extern (C) void profilegc_setlogfilename(string name) | |||
| { | |||
| logfilename = name; | |||
| logfilename = name ~ "\0"; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namely fopen
9a20357 to
4aca77c
Compare
Thanks a lot @RazvanN7 ! I recently notice dmd 2.085 already have some improvements on that front! https://dlang.org/phobos/core_memory.html#.GC.ProfileStats Probably the most important stats for us are As I mentioned in #1846 (comment), more info could be provided (allocation info) but at this point this is not a blocker for us, just a nice to have. The other pending issue we have is about The trade off for this is introducing bugs in the calculation if much easier if calculations are kept up to date all the time (but I think that also happens with the profiling info, so I'm not sure now. |
For reference, here is a skeleton that works for hooking the conservative GC with a tracing proxy: https://gist.githubusercontent.com/rainers/6cdf73b48837defb9f88/raw/7e776acd806f44668e5d615a17bdac20047fec64/tracegc.d |
|
I see this was finally merged, thanks again for picking it up and pushing for it. About the reset, I agree with @rainers that, even when convenient for most use cases, having a reset is a bad API design and we should consider removing it before a first release. |
The next release is on 1.7 - with the first beta appearing on 15.6. |
This is a revamp of #1846 . I'm having a hard time understanding the results of the test cases added in the original PR. Any help would be appreciated.
cc @leandro-lucarella-sociomantic
Edit: It looks like there is a missing call to resetThreadStats that causes the tests to fail. Can anyone help?