Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Brian Moore
committed
May 16, 2013
1 parent
908f5af
commit 5f561ad
Showing
25 changed files
with
1,471 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
/* | ||
* | ||
* Copyright 2013 Netflix, Inc. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
* | ||
*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
This is gcviz, a set of programs that help generate visualizations | ||
from gc.log, a log file that the HotSpot, a Java Virual Machine, | ||
writes when configured with the following flags: | ||
-verbose:gc | ||
-verbose:sizes | ||
-Xloggc:/apps/tomcat/logs/gc.log | ||
-XX:+PrintGCDetails | ||
-XX:+PrintGCDateStamps | ||
-XX:+PrintTenuringDistribution | ||
|
||
gcviz is intended to be used as a webapp when installed on the same | ||
host as tomcat, or any other Java web container. The gcviz program | ||
itself is served by apache httpd inside netflix, but could be served | ||
by any webserver that supports CGI. gcviz can also be used in 'remote' | ||
mode where the visualizer runs on local hardware but the log files are | ||
scraped from a remote tomcat. | ||
|
||
By default gcviz is available at: | ||
http://127.0.0.1:8080/AdminGCViz/index | ||
|
||
Internally, gcviz is a bundle of four sorts of things: | ||
* python programs that require matplotlib, numpy, pylab, etc. | ||
* cgi scripts that invoke these python programs | ||
* some minor assistive perl scripts | ||
* very minor rpm infrastructure to package the previous things. | ||
|
||
I wrote gcviz to address challenges we face inside Netflix. If you | ||
feel that any changes you might propose could be helpful for Netflix | ||
or for the community at large, please write. | ||
|
||
Brian Moore |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
Name: @name@ | ||
Summary: Netflix GC Visualization | ||
Version: @version@ | ||
Release: @release@ | ||
License: NFLX | ||
Packager: Engineering Tools | ||
Vendor: Netflix, Inc. | ||
Group: Netflix Base | ||
AutoReqProv: no | ||
Requires: nflx-python-matplotlib, nflx-python-numpy, nflx-python-scipy | ||
|
||
%description | ||
GC Visualization | ||
---------- | ||
@build.metadata@ | ||
|
||
%install | ||
cat $0 > %{_topdir}/install.txt | ||
mkdir -p $RPM_BUILD_ROOT | ||
mv ${RPM_BUILD_DIR}/* $RPM_BUILD_ROOT | ||
|
||
%clean | ||
rm -rf $RPM_BUILD_ROOT/* | ||
|
||
%files | ||
%defattr(-,root,root) | ||
/ | ||
|
||
%post |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
|
||
# configuration for GC visualization admin app | ||
# mounted under /AdminGCViz and /AdminGCVizImages | ||
|
||
ScriptAlias /AdminGCViz/ "/apps/apache/htdocs/AdminGCViz/" | ||
Alias /AdminGCVizImages/ "/mnt/logs/gc-reports/" | ||
|
||
<LocationMatch "^/AdminGCViz"> | ||
Order deny,allow | ||
Deny from all | ||
Allow from 127.0.0.1/32 | ||
</LocationMatch> | ||
|
||
# send requests to /AdminGCViz [with and without trailing /] to /index | ||
RewriteRule ^/AdminGCViz/?$ /AdminGCViz/index [R] | ||
# Force apache to handle the rest [this catches AdminGCVizImages too] | ||
RewriteRule ^/AdminGCViz - [L] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
This is more of a todo list than things that are deeply wrong, but I want to make the fringes that I know about public. | ||
|
||
* all of the BUGs and Potential BUGs in all of the sources | ||
* visualize-cluster does not work | ||
* compute throughput, allocation rate from gc data (do this with PrintGCStats?) | ||
* sar data (cpu, etc) needs to be visualized | ||
|
||
|
||
|
||
* netflix internal: need to visualize facet data | ||
* netflix internal: need to ensure that clients output vms cache refresh event overall and facet-level timing | ||
* gps | ||
* api | ||
* merchweb | ||
* ecweb (curently no facet-level timing for demand-fill) | ||
* accountweb (curently no facet-level timing for demand-fill) | ||
* ... | ||
* netflix-internal: delta fail needs to be added to catalina parsing (right now all lines are plotted as begin/end overall) | ||
* netflix-internal, maybe outside too: truncated gc logs (when ec2rotate logs purges stuff older than 7 days creates "unknown" gc events. | ||
* netflix-internal: have if -z checks for status-properties-1, -2. delete empty files. don't do both if we get the output for one. Consider getting URL from discovery/entrypoints. | ||
* netflix-internal: the squirreled away vms-gc-reports location is <appname>/<date>/<data> and doesn't have instance id in the path... This could be a problem for visualize-cluster. | ||
* netflix-internal: Consider grabbing some number of recent ttime and threaddump files from /apps/tomcat/logs/cores | ||
drwxrwsr-x 2 root nac 94208 Apr 17 23:44 . | ||
-rw-r--r-- 1 merchwebprod nac 168050 Apr 17 23:44 ttime.20120417.234401.1586.txt | ||
lrwxrwxrwx 1 merchwebprod nac 59 Apr 17 23:44 latest -> /apps/tomcat/logs/cores/threaddump.20120417.234401.1586.txt | ||
-rw-r--r-- 1 merchwebprod nac 1117242 Apr 17 23:44 threaddump.20120417.234401.1586.txt | ||
-rw-r--r-- 1 merchwebprod nac 168251 Apr 17 23:34 ttime.20120417.233401.1586.txt | ||
-rw-r--r-- 1 merchwebprod nac 1119200 Apr 17 23:34 threaddump.20120417.233401.1586.txt | ||
-rw-r--r-- 1 merchwebprod nac 168371 Apr 17 23:24 ttime.20120417.232401.1586.txt | ||
-rw-r--r-- 1 merchwebprod nac 1120525 Apr 17 23:24 threaddump.20120417.232401.1586.txt | ||
-rw-r--r-- 1 merchwebprod nac 168490 Apr 17 23:15 ttime.20120417.231401.1586.txt | ||
* netflix-internal: visualize objectCache lines in ${OUTPUTDIR}/vms-object-cache-stats |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
* This program is split into thee conceptual parts: | ||
* a top-level "driver" (visualize-instance.sh and visualize-cluster.py) | ||
* a remote data collection component (remote-data-collection/collect_remote_data.sh) | ||
* a visualization component (visualize-gc.py) | ||
|
||
It's difficult to get a cross-platform visualization component, so I | ||
opted for python's matplotlib, which is cross platform and has a | ||
single-click installer available for windows, macintosh and linux. | ||
|
||
* To visialize the data you'lll need python's matplotlib. One | ||
simple-to-install (and free) distribution that contains this is EPD: | ||
http://www.enthought.com/repo/free/ | ||
This will (attempt) to patch your .profile equivalent to place itself first on your PATH. | ||
|
||
* A note about how GC events are parsed. This software does not, at | ||
the time of this writing, use PrintGCFixup. Instead it uses the | ||
-XX:+PrintGCDateStamps datetime stamps (if available, secs since vm | ||
boot if not) as an anchor, and treats each of those things as an | ||
event. | ||
|
||
In this context: | ||
2012-04-04T19:07:40.395+0000: 510958.888: [GC [1 CMS-initial-mark: 18431999K(18432000K)] 18939679K(29491200K), 0.5050890 secs] [Times: user=0.50 sys=0.00, real=0.50 secs] | ||
2012-04-04T19:07:40.903+0000: 510959.397: [CMS-concurrent-mark-start] | ||
2012-04-04T19:07:56.564+0000: 510975.058: [CMS-concurrent-mark: 15.410/15.661 secs] [Times: user=49.94 sys=1.89, real=15.66 secs] | ||
2012-04-04T19:07:56.565+0000: 510975.058: [CMS-concurrent-preclean-start] | ||
2012-04-04T19:08:23.054+0000: 511001.548: [Full GC 511001.549: [CMS2012-04-04T19:08:48.906+0000: 511027.400: [CMS-concurrent-preclean: 51.957/52.341 secs] [Times: user=76.72 sys=0.15, real=52.34 secs] | ||
(concurrent mode failure): 18431999K->16174249K(18432000K), 106.0788490 secs] 29491199K->16174249K(29491200K), [CMS Perm : 69005K->69005K(115372K)], 106.0801410 secs] [Times: user=106.01 sys=0.00, real=106.06 secs] | ||
2012-04-04T19:10:09.150+0000: 511107.644: [GC [1 CMS-initial-mark: 16174249K(18432000K)] 16363184K(29491200K), 0.0263250 secs] [Times: user=0.02 sys=0.00, real=0.03 secs] | ||
|
||
GC events of this form: | ||
2012-04-04T19:08:23.054+0000: 511001.548: [Full GC 511001.549: [CMS2012-04-04T19:08:48.906+0000: 511027.400: [CMS-concurrent-preclean: 51.957/52.341 secs] [Times: user=76.72 sys=0.15, real=52.34 secs] | ||
(concurrent mode failure): 18431999K->16174249K(18432000K), 106.0788490 secs] 29491199K->16174249K(29491200K), [CMS Perm : 69005K->69005K(115372K)], 106.0801410 secs] [Times: user=106.01 sys=0.00, real=106.06 secs] | ||
|
||
are represented as a Full GC requiring 106 seconds rather than the | ||
CMS-preclean part of ~52 seconds. I'm not sure if ~106 or 106-52 is | ||
the stop-the-world part, but at that long of a pause, I'm not entirely | ||
convinced that it matters. Bill Jackson votes that 106-52 is the stop | ||
the world part. He's probably right. I'm surprised that the preclean | ||
wasn't aborted. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
ParNew (stop-the-world) | ||
CMS-initial-mark (stop-the-world) | ||
CMS-concurrent-mark (concurrent includes yields to other theads) | ||
CMS-concurrent-abortable-preclean (concurrent) | ||
CMS-concurrent-preclean (concurrent) | ||
CMS-remark (stop the world) | ||
CMS-concurrent-sweep (concurrent) | ||
CMS-concurrent-reset (concurrent?) | ||
concurrent mode failure (stop the world) | ||
promotion failed (stop the world) | ||
Full GC (stop the world) | ||
|
||
markers | ||
CMS-concurrent-mark-start | ||
CMS-concurrent-preclean-start | ||
CMS-concurrent-sweep-start | ||
CMS-concurrent-reset-start | ||
CMS-concurrent-abortable-preclean-start |
Oops, something went wrong.