Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for -XX:+PrintGCCause (will be on by default in Java 8) #66

Closed
ryangardner opened this issue May 22, 2013 · 17 comments
Closed
Milestone

Comments

@ryangardner
Copy link
Contributor

In java 8, this will be on by default. In the next non-security Java 7 release, it is going to be available via the -XX:+PrintGCCause flag (it's currently available in ea build)

The AbstractGCEvent.parse method currently barfs when hit with a GC log message that comes out of something with the PrintGCCause information in it:

the log entries look like:

"GC pause (G1 Evacuation Pause) (young)"

This will not be just for G1, this additional detail will be provided for all collectors - it makes the parsing in AbstractGCEvent.parse a bit trickier because it will have to know that

"GC pause (G1 Evacuation Pause) (young)"
"GC pause (G1 Humongous Allocation) (young)"

both map to:

"GC pause (young)"

Looking at the JDK7 / JDK8 sources - the constants are defined in the gcCause.cpp file:

System.gc()
FullGCAlot
ScavengeAlot
Allocation Profiler
JvmtiEnv ForceGarbageCollection
GCLocker Initiated GC
Heap Inspection Initiated GC
Heap Dump Initiated GC
No GC
Allocation Failure
Tenured Generation Full
Metadata GC Threshold
CMS Generation Full
CMS Initial Mark
CMS Final Remark
Old Generation Expanded On Last Scavenge
Old Generation Too Full To Scavenge
Ergonomics
G1 Evacuation Pause
G1 Humongous Allocation
Last ditch collection
ILLEGAL VALUE - last gc cause - ILLEGAL VALUE
unknown GCCause

Java 7 has

Permanent Generation Full

instead of "Metadata GC Threshold"

I can help with a patch if you have an idea of where this might be easiest to wedge in

@chewiebug
Copy link
Owner

Hi Ryan

I'll have to look into this feature. Do you have a sample log file
showing more how it will look like?

Regards, Jörg

@ryangardner
Copy link
Contributor Author

Here is one where it has the things squashed in along with G1 and the adaptive size policy

2013-05-15T01:42:38.938+0000: 13.780: [GC pause (G1 Evacuation Pause) (young) 13.781: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 89, predicted base time: 20.52 ms, remaining time: 129.48 ms, target pause time: 150.00 ms]
 13.781: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 25 regions, survivors: 4 regions, predicted young region time: 3593.07 ms]
 13.781: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 25 regions, survivors: 4 regions, old: 0 regions, predicted pause time: 3613.59 ms, target pause time: 150.00 ms]
, 0.1418780 secs]
   [Parallel Time: 127.9 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 13780.7, Avg: 13780.9, Max: 13781.1, Diff: 0.3]
      [Ext Root Scanning (ms): Min: 2.6, Avg: 3.8, Max: 8.0, Diff: 5.3, Sum: 30.4]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Processed Buffers: Min: 0, Avg: 0.6, Max: 5, Diff: 5, Sum: 5]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
      [Object Copy (ms): Min: 119.4, Avg: 123.5, Max: 124.7, Diff: 5.3, Sum: 987.8]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
      [GC Worker Total (ms): Min: 127.2, Avg: 127.3, Max: 127.5, Diff: 0.3, Sum: 1018.8]
      [GC Worker End (ms): Min: 13908.2, Avg: 13908.3, Max: 13908.3, Diff: 0.1]
   [Code Root Fixup: 1.6 ms]
   [Clear CT: 0.4 ms]
   [Other: 12.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 9.7 ms]
      [Ref Enq: 0.2 ms]
      [Free CSet: 0.1 ms]
   [Eden: 800.0M(800.0M)->0.0B(800.0M) Survivors: 128.0M->128.0M Heap: 924.0M(92.0G)->346.8M(92.0G)]
 [Times: user=0.85 sys=0.17, real=0.14 secs] 
2013-05-15T01:42:42.549+0000: 17.391: [GC pause (G1 Evacuation Pause) (young) 17.391: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 195551, predicted base time: 239.45 ms, remaining time: -89.45 ms, target pause time: 150.00 ms]
 17.391: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 25 regions, survivors: 4 regions, predicted young region time: 2300.04 ms]
 17.391: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 25 regions, survivors: 4 regions, old: 0 regions, predicted pause time: 2539.48 ms, target pause time: 150.00 ms]
, 0.1440030 secs]
   [Parallel Time: 136.3 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 17391.3, Avg: 17391.5, Max: 17391.7, Diff: 0.4]
      [Ext Root Scanning (ms): Min: 2.8, Avg: 3.4, Max: 4.9, Diff: 2.0, Sum: 27.5]
      [Update RS (ms): Min: 47.4, Avg: 48.7, Max: 49.6, Diff: 2.2, Sum: 389.6]
         [Processed Buffers: Min: 88, Avg: 96.6, Max: 109, Diff: 21, Sum: 773]
      [Scan RS (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.2, Sum: 1.3]
      [Object Copy (ms): Min: 82.6, Avg: 82.9, Max: 83.4, Diff: 0.8, Sum: 663.5]
      [Termination (ms): Min: 0.0, Avg: 0.5, Max: 0.6, Diff: 0.6, Sum: 3.7]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
      [GC Worker Total (ms): Min: 135.6, Avg: 135.7, Max: 135.9, Diff: 0.4, Sum: 1085.9]
      [GC Worker End (ms): Min: 17527.2, Avg: 17527.2, Max: 17527.2, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Clear CT: 0.3 ms]
   [Other: 7.3 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 4.4 ms]
      [Ref Enq: 0.2 ms]
      [Free CSet: 0.1 ms]
   [Eden: 800.0M(800.0M)->0.0B(800.0M) Survivors: 128.0M->128.0M Heap: 1146.8M(92.0G)->550.1M(92.0G)]
 [Times: user=1.01 sys=0.05, real=0.15 secs]

@ryangardner
Copy link
Contributor Author

I've personally also seen a few lines where it has the cause "(G1 Humongous Allocation)" instead of "(G1 Evacuation Pause)"

in theory, there could be a number of other options for each of the given types but I would have to walk the source a lot to get the complete list of possible combinations

@chewiebug
Copy link
Owner

Hi Ryan,

Thank you for the sample. I have generated a few of my own to get a better impression on the possible changes the new flag introduces.

The straight forward place to extend is AbstractGCEvent and add the new Type constants. Probably not the best idea because a lot of new events might be introduced that look very much like duplicates of existing ones.

Probably the best place to introduce this change is AbstractDataReaderSun (Line 369). If null is returned, try to remove the text of the first brackets (for GC pause (G1 Evacuation Pause) (young) this would mean to reduce to G1 (young)) and try another call to parse. I would probably be a good idea to make the attribute type in AbstractGCEvent.Type mutable to be able to retain the full name of the parsed event. Like this one can see the GC reason in the event details tab. With this solution the change is limited to the gc logs produced by Sun / Oracle / OpenJDK JDKs.
Assumption: -XX:+PrintGCCause will always introduce the additional text as the first text in brackets in the event like in the example above.

Should that assumption be wrong, the next best place would probably be to extend AbstractGCEvent.Type to support more than one representation of an event and register all representations with the internal TYPE_MAP. Then the Type#parse method would still be very fast looking up the correct event. It should be well checked in this solution that typeand rep fields don't get confused somewhere else in the code. The distinction is not very clear today.

What do you think about this approach?

Regards, Jörg

@ryangardner
Copy link
Contributor Author

I believe the assumption about -XX:+PrintGCCause is correct for Java 7 and 8

The assumption that you have regarding the PrintGCCause always outputting the cause the way it does now I believe is valid. I think there is one more assumption that could help simplify things. If PrintGCCause is enabled, it should always have a GCCause line - when parsing the first GC log line if it detects that there is a GC Cause field in the log line (this could be detected by trying to parse the event, and if it gets back null have it try to parse the event with the first set of parentheses removed - if that returns something valid, assume that it is using PrintGCCause mode and from then on only send in the ones with the first set of parens stripped out by default [and potentially falling back to having it try to parse one with the parens still in it if parsing it without them failed])

The approach of having it pick which case is the default to try first is just an optimization on the approach you described - we could try that first and then add in an optimization around it later.

Keeping the full text of the cause (making the type mutable) would be valuable - it would help retain that GCCause information and make it easy to see if a certain type of GC has a different duration based on what causes it.

@ecki
Copy link
Contributor

ecki commented May 28, 2013

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7166894

Actually you can see here a discussion which gives some backgrounds:

http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-May/thread.html

And here the commited changeset for JDK8, seems to be the same (but default off) for JDK7:

http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/9d679effd28c

@chewiebug
Copy link
Owner

@ecki: Thanks for the links, they are very good input!

@ryangardner: Your optimisation to my approach sounds valid. I'd first
implement the simplest case and optimise it as the second step if it
helps performance. Would you like to add this patch now?

Regards, Jörg

@ryangardner
Copy link
Contributor Author

I have done some preliminary work on this - it's up on a branch - but it isn't done yet.

Making the "type" string field immutable brings with it another problem: all of the Type objects are reused.

I realized this when I got it working for G1 and for CMS tests with / without the GCCause, but I saw a bunch of unrelated test failures such as:

Expected :GC ParNew:
Actual   :GC (Allocation Failure) ParNew:

For a test that had nothing to do the PrintGCCause testing - mutating that field will change it for all of the instances that re-use it.

To handle parsing logs that have PrintGCCause in them and simply ignoring the PrintGCCause strings - this solution is simple, just remove the line that sets it and it would be parsed as the correct type.

I assume that the reason for using immutable instances of the Type was to avoid a new instance of the Type object every time a GC log line is parsed...

@ryangardner
Copy link
Contributor Author

I just updated the branch so it is useable (it doesn't track the GCCause at the moment, but it will parse a file that has it in there and ignore it)

@chewiebug
Copy link
Owner

I haven't thought of the immutable Type objects beeing reused, but it
certainly makes sense.

Maybe we need to add the name of the event to the AbstractGCEvent class
or introduce a simple "holder" class holding a reference to the current
Type class and the name of the type as parsed from the log. That holder
class should be instantiated for all GCEvents.

@chewiebug
Copy link
Owner

Hi Ryan,

May I pull in what you committed to your branch even if it doesn't cover all features we discussed? I think the implementation as it is is far better than nothing.

Regards, Jörg

@ryangardner
Copy link
Contributor Author

Sure. Want me to send a pull request or do you just want to pull it
directly? Either way works for me.

Ryan
On Sep 8, 2013 12:34 PM, "chewiebug" notifications@github.com wrote:

Hi Ryan,

May I pull in what you committed to your branchhttps://github.com/DealerDotCom/GCViewer/tree/PrintGCCauseeven if it doesn't cover all features we discussed? I think the
implementation as it is is far better than nothing.

Regards, Jörg


Reply to this email directly or view it on GitHubhttps://github.com//issues/66#issuecomment-24024099
.

@chewiebug
Copy link
Owner

Hi Ryan,

Thanks! I'll just pull it in directly within the next few days.

Regards, Jörg

@ryangardner
Copy link
Contributor Author

I had to play around with some stuff today anyway, so I fixed a few more
issues for parsing g1 logs with 1.7u40 logs (including tests for them) and
sent a pull request.
On Sep 8, 2013 4:37 PM, "chewiebug" notifications@github.com wrote:

Hi Ryan,

Thanks! I'll just pull it in directly within the next few days.

Regards, Jörg


Reply to this email directly or view it on GitHubhttps://github.com//issues/66#issuecomment-24028903
.

@chewiebug
Copy link
Owner

Hi Ryan,

I have merged your pull request #77. Now "only" the cause is missing. I'll leave this issue open until we have the cause displayed.

Thank you very much for your contribution!

Regards, Jörg

@chewiebug
Copy link
Owner

Hi Ryan,

I have pushed a commit to show the cause. Does this look like you expect it to?

Regards, Jörg

@chewiebug
Copy link
Owner

Hi Ryan,

I close this issue, since it works for me. If it doesn't for you, please let me hear what you are missing.

Regards, Jörg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants