diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..e49a236 --- /dev/null +++ b/LICENSE @@ -0,0 +1,17 @@ +/* + * + * Copyright 2013 Netflix, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ diff --git a/README b/README new file mode 100644 index 0000000..4fb803c --- /dev/null +++ b/README @@ -0,0 +1,31 @@ +This is gcviz, a set of programs that help generate visualizations +from gc.log, a log file that the HotSpot, a Java Virual Machine, +writes when configured with the following flags: + -verbose:gc + -verbose:sizes + -Xloggc:/apps/tomcat/logs/gc.log + -XX:+PrintGCDetails + -XX:+PrintGCDateStamps + -XX:+PrintTenuringDistribution + +gcviz is intended to be used as a webapp when installed on the same +host as tomcat, or any other Java web container. The gcviz program +itself is served by apache httpd inside netflix, but could be served +by any webserver that supports CGI. gcviz can also be used in 'remote' +mode where the visualizer runs on local hardware but the log files are +scraped from a remote tomcat. + +By default gcviz is available at: + http://127.0.0.1:8080/AdminGCViz/index + +Internally, gcviz is a bundle of four sorts of things: + * python programs that require matplotlib, numpy, pylab, etc. + * cgi scripts that invoke these python programs + * some minor assistive perl scripts + * very minor rpm infrastructure to package the previous things. + +I wrote gcviz to address challenges we face inside Netflix. If you +feel that any changes you might propose could be helpful for Netflix +or for the community at large, please write. + +Brian Moore diff --git a/README.md b/README.md deleted file mode 100644 index 083c96f..0000000 --- a/README.md +++ /dev/null @@ -1,4 +0,0 @@ -gcviz -===== - -Garbage Collector Visualization Tool/Framework diff --git a/conf/rpm.spec b/conf/rpm.spec new file mode 100644 index 0000000..3e08557 --- /dev/null +++ b/conf/rpm.spec @@ -0,0 +1,29 @@ +Name: @name@ +Summary: Netflix GC Visualization +Version: @version@ +Release: @release@ +License: NFLX +Packager: Engineering Tools +Vendor: Netflix, Inc. +Group: Netflix Base +AutoReqProv: no +Requires: nflx-python-matplotlib, nflx-python-numpy, nflx-python-scipy + +%description +GC Visualization +---------- +@build.metadata@ + +%install +cat $0 > %{_topdir}/install.txt +mkdir -p $RPM_BUILD_ROOT +mv ${RPM_BUILD_DIR}/* $RPM_BUILD_ROOT + +%clean +rm -rf $RPM_BUILD_ROOT/* + +%files +%defattr(-,root,root) +/ + +%post diff --git a/root/apps/apache/conf.d/admin_gc_viz.conf b/root/apps/apache/conf.d/admin_gc_viz.conf new file mode 100644 index 0000000..a66619c --- /dev/null +++ b/root/apps/apache/conf.d/admin_gc_viz.conf @@ -0,0 +1,17 @@ + +# configuration for GC visualization admin app +# mounted under /AdminGCViz and /AdminGCVizImages + +ScriptAlias /AdminGCViz/ "/apps/apache/htdocs/AdminGCViz/" +Alias /AdminGCVizImages/ "/mnt/logs/gc-reports/" + + + Order deny,allow + Deny from all + Allow from 127.0.0.1/32 + + +# send requests to /AdminGCViz [with and without trailing /] to /index +RewriteRule ^/AdminGCViz/?$ /AdminGCViz/index [R] +# Force apache to handle the rest [this catches AdminGCVizImages too] +RewriteRule ^/AdminGCViz - [L] diff --git a/root/apps/apache/htdocs/AdminGCViz/BUGS b/root/apps/apache/htdocs/AdminGCViz/BUGS new file mode 100644 index 0000000..e6d74e9 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/BUGS @@ -0,0 +1,32 @@ +This is more of a todo list than things that are deeply wrong, but I want to make the fringes that I know about public. + +* all of the BUGs and Potential BUGs in all of the sources +* visualize-cluster does not work +* compute throughput, allocation rate from gc data (do this with PrintGCStats?) +* sar data (cpu, etc) needs to be visualized + + + +* netflix internal: need to visualize facet data +* netflix internal: need to ensure that clients output vms cache refresh event overall and facet-level timing + * gps + * api + * merchweb + * ecweb (curently no facet-level timing for demand-fill) + * accountweb (curently no facet-level timing for demand-fill) + * ... +* netflix-internal: delta fail needs to be added to catalina parsing (right now all lines are plotted as begin/end overall) +* netflix-internal, maybe outside too: truncated gc logs (when ec2rotate logs purges stuff older than 7 days creates "unknown" gc events. +* netflix-internal: have if -z checks for status-properties-1, -2. delete empty files. don't do both if we get the output for one. Consider getting URL from discovery/entrypoints. +* netflix-internal: the squirreled away vms-gc-reports location is // and doesn't have instance id in the path... This could be a problem for visualize-cluster. +* netflix-internal: Consider grabbing some number of recent ttime and threaddump files from /apps/tomcat/logs/cores + drwxrwsr-x 2 root nac 94208 Apr 17 23:44 . + -rw-r--r-- 1 merchwebprod nac 168050 Apr 17 23:44 ttime.20120417.234401.1586.txt + lrwxrwxrwx 1 merchwebprod nac 59 Apr 17 23:44 latest -> /apps/tomcat/logs/cores/threaddump.20120417.234401.1586.txt + -rw-r--r-- 1 merchwebprod nac 1117242 Apr 17 23:44 threaddump.20120417.234401.1586.txt + -rw-r--r-- 1 merchwebprod nac 168251 Apr 17 23:34 ttime.20120417.233401.1586.txt + -rw-r--r-- 1 merchwebprod nac 1119200 Apr 17 23:34 threaddump.20120417.233401.1586.txt + -rw-r--r-- 1 merchwebprod nac 168371 Apr 17 23:24 ttime.20120417.232401.1586.txt + -rw-r--r-- 1 merchwebprod nac 1120525 Apr 17 23:24 threaddump.20120417.232401.1586.txt + -rw-r--r-- 1 merchwebprod nac 168490 Apr 17 23:15 ttime.20120417.231401.1586.txt +* netflix-internal: visualize objectCache lines in ${OUTPUTDIR}/vms-object-cache-stats diff --git a/root/apps/apache/htdocs/AdminGCViz/README b/root/apps/apache/htdocs/AdminGCViz/README new file mode 100644 index 0000000..5826306 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/README @@ -0,0 +1,40 @@ +* This program is split into thee conceptual parts: + * a top-level "driver" (visualize-instance.sh and visualize-cluster.py) + * a remote data collection component (remote-data-collection/collect_remote_data.sh) + * a visualization component (visualize-gc.py) + +It's difficult to get a cross-platform visualization component, so I +opted for python's matplotlib, which is cross platform and has a +single-click installer available for windows, macintosh and linux. + +* To visialize the data you'lll need python's matplotlib. One + simple-to-install (and free) distribution that contains this is EPD: + http://www.enthought.com/repo/free/ + This will (attempt) to patch your .profile equivalent to place itself first on your PATH. + +* A note about how GC events are parsed. This software does not, at + the time of this writing, use PrintGCFixup. Instead it uses the + -XX:+PrintGCDateStamps datetime stamps (if available, secs since vm + boot if not) as an anchor, and treats each of those things as an + event. + +In this context: +2012-04-04T19:07:40.395+0000: 510958.888: [GC [1 CMS-initial-mark: 18431999K(18432000K)] 18939679K(29491200K), 0.5050890 secs] [Times: user=0.50 sys=0.00, real=0.50 secs] +2012-04-04T19:07:40.903+0000: 510959.397: [CMS-concurrent-mark-start] +2012-04-04T19:07:56.564+0000: 510975.058: [CMS-concurrent-mark: 15.410/15.661 secs] [Times: user=49.94 sys=1.89, real=15.66 secs] +2012-04-04T19:07:56.565+0000: 510975.058: [CMS-concurrent-preclean-start] +2012-04-04T19:08:23.054+0000: 511001.548: [Full GC 511001.549: [CMS2012-04-04T19:08:48.906+0000: 511027.400: [CMS-concurrent-preclean: 51.957/52.341 secs] [Times: user=76.72 sys=0.15, real=52.34 secs] + (concurrent mode failure): 18431999K->16174249K(18432000K), 106.0788490 secs] 29491199K->16174249K(29491200K), [CMS Perm : 69005K->69005K(115372K)], 106.0801410 secs] [Times: user=106.01 sys=0.00, real=106.06 secs] +2012-04-04T19:10:09.150+0000: 511107.644: [GC [1 CMS-initial-mark: 16174249K(18432000K)] 16363184K(29491200K), 0.0263250 secs] [Times: user=0.02 sys=0.00, real=0.03 secs] + +GC events of this form: +2012-04-04T19:08:23.054+0000: 511001.548: [Full GC 511001.549: [CMS2012-04-04T19:08:48.906+0000: 511027.400: [CMS-concurrent-preclean: 51.957/52.341 secs] [Times: user=76.72 sys=0.15, real=52.34 secs] + (concurrent mode failure): 18431999K->16174249K(18432000K), 106.0788490 secs] 29491199K->16174249K(29491200K), [CMS Perm : 69005K->69005K(115372K)], 106.0801410 secs] [Times: user=106.01 sys=0.00, real=106.06 secs] + +are represented as a Full GC requiring 106 seconds rather than the +CMS-preclean part of ~52 seconds. I'm not sure if ~106 or 106-52 is +the stop-the-world part, but at that long of a pause, I'm not entirely +convinced that it matters. Bill Jackson votes that 106-52 is the stop +the world part. He's probably right. I'm surprised that the preclean +wasn't aborted. + diff --git a/root/apps/apache/htdocs/AdminGCViz/gc_event_types b/root/apps/apache/htdocs/AdminGCViz/gc_event_types new file mode 100644 index 0000000..24a3595 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/gc_event_types @@ -0,0 +1,18 @@ +ParNew (stop-the-world) +CMS-initial-mark (stop-the-world) +CMS-concurrent-mark (concurrent includes yields to other theads) +CMS-concurrent-abortable-preclean (concurrent) +CMS-concurrent-preclean (concurrent) +CMS-remark (stop the world) +CMS-concurrent-sweep (concurrent) +CMS-concurrent-reset (concurrent?) +concurrent mode failure (stop the world) +promotion failed (stop the world) +Full GC (stop the world) + +markers +CMS-concurrent-mark-start +CMS-concurrent-preclean-start +CMS-concurrent-sweep-start +CMS-concurrent-reset-start +CMS-concurrent-abortable-preclean-start diff --git a/root/apps/apache/htdocs/AdminGCViz/generate b/root/apps/apache/htdocs/AdminGCViz/generate new file mode 100755 index 0000000..b595de9 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/generate @@ -0,0 +1,206 @@ +#!/bin/bash + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/generate#3 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Author: mooreb $ +# $Change: 1838706 $ + +if [[ "POST" != "${REQUEST_METHOD}" ]] ; then + cat < +405 Method Not Allowed + +

Error

+

generate cannot be called with anything but a POST

+ + +END_OF_405 + exit 0; +fi + +############################################################################################################################################### +## +## lifted from http://oinkzwurgl.org/bash_cgi thank you Philippe +## +# (internal) routine to store POST data +function cgi_get_POST_vars() +{ + # check content type + # FIXME: not sure if we could handle uploads with this.. + [ "${CONTENT_TYPE}" != "application/x-www-form-urlencoded" ] && \ + echo "bash.cgi warning: you should probably use MIME type "\ + "application/x-www-form-urlencoded!" 1>&2 + # save POST variables (only first time this is called) + [ -z "$QUERY_STRING_POST" \ + -a "$REQUEST_METHOD" = "POST" -a ! -z "$CONTENT_LENGTH" ] && \ + read -n $CONTENT_LENGTH QUERY_STRING_POST + # prevent shell execution + local t + t=${QUERY_STRING_POST//%60//} # %60 = ` + t=${t//\`//} + t=${t//\$(//} + QUERY_STRING_POST=${t} + return +} + +# (internal) routine to decode urlencoded strings +function cgi_decodevar() +{ + [ $# -ne 1 ] && return + local v t h + # replace all + with whitespace and append %% + t="${1//+/ }%%" + while [ ${#t} -gt 0 -a "${t}" != "%" ]; do + v="${v}${t%%\%*}" # digest up to the first % + t="${t#*%}" # remove digested part + # decode if there is anything to decode and if not at end of string + if [ ${#t} -gt 0 -a "${t}" != "%" ]; then + h=${t:0:2} # save first two chars + t="${t:2}" # remove these + v="${v}"`echo -e \\\\x${h}` # convert hex to special char + fi + done + # return decoded string + echo "${v}" + return +} + +# routine to get variables from http requests +# usage: cgi_getvars method varname1 [.. varnameN] +# method is either GET or POST or BOTH +# the magic varible name ALL gets everything +function cgi_getvars() +{ + [ $# -lt 2 ] && return + local q p k v s + # prevent shell execution + t=${QUERY_STRING//%60//} # %60 = ` + t=${t//\`//} + t=${t//\$(//} + QUERY_STRING=${t} + # get query + case $1 in + GET) + [ ! -z "${QUERY_STRING}" ] && q="${QUERY_STRING}&" + ;; + POST) + cgi_get_POST_vars + [ ! -z "${QUERY_STRING_POST}" ] && q="${QUERY_STRING_POST}&" + ;; + BOTH) + [ ! -z "${QUERY_STRING}" ] && q="${QUERY_STRING}&" + cgi_get_POST_vars + [ ! -z "${QUERY_STRING_POST}" ] && q="${q}${QUERY_STRING_POST}&" + ;; + esac + shift + s=" $* " + # parse the query data + while [ ! -z "$q" ]; do + p="${q%%&*}" # get first part of query string + k="${p%%=*}" # get the key (variable name) from it + v="${p#*=}" # get the value from it + q="${q#$p&*}" # strip first part from query string + # decode and evaluate var if requested + [ "$1" = "ALL" -o "${s/ $k /}" != "$s" ] && \ + eval "$k=\"`cgi_decodevar \"$v\"`\"" + done + return +} + +cgi_getvars POST ALL + +## +## +## +############################################################################################################################################### + +#cat < +#200 OK: early return +# +#jmap_histo_live is ${jmap_histo_live} +#vms_refresh_events is ${vms_refresh_events} +# +# +#END_OF_EARLY_RETURN +#exit 0 + +if [ -z "$jmap_histo_live" ]; then + jmap_histo_live="nojmap_histo_live" +else + jmap_histo_live="jmap_histo_live" +fi + +if [ -z "$vms_refresh_events" ]; then + vms_refresh_events="novms_refresh_events" +else + vms_refresh_events="vms_refresh_events" +fi + + +NFENV=/etc/profile.d/netflix_environment.sh +if [ -f ${NFENV} ]; then + . ${NFENV} +else + NFENV="" +fi + +cd `dirname $0` + +prog=`basename $0` + +cat < + + AdminGCViz + + +EndOfHeader + +echo "
"
+NOW=`date +%Y-%m-%dT%T`
+bash /apps/apache/htdocs/AdminGCViz/remote-data-collection/collect_remote_data.sh ${NOW} ${jmap_histo_live} ${vms_refresh_events}
+/usr/bin/python2.7 /apps/apache/htdocs/AdminGCViz/visualize-gc.py ${NOW} /mnt/logs/gc-reports/${NOW}
+
+
+if [ -n "${NFENV}" ]; then
+    GCREPORT=/mnt/logs/gc-reports/${NOW}.tar.gz 
+    S3LOCATION=s3://netflix.bulkdata.${NETFLIX_ENVIRONMENT}/gc-report-${NETFLIX_APP}-${NOW}.tar.gz
+    echo bundling up results on ${GCREPORT}
+    tar -C /mnt/logs/gc-reports -zcf ${GCREPORT} ${NOW}
+    echo copying gc report to ${S3LOCATION}
+    s3cp ${GCREPORT} ${S3LOCATION}
+fi
+
+echo
+echo java -version is:
+cat /mnt/logs/gc-reports/${NOW}/java-version
+echo
+
+echo
+echo java commandline is:
+cat /mnt/logs/gc-reports/${NOW}/cmdline
+echo
+
+echo starting to render gc events and heap size images:
+echo "
" + +for fullpath in `ls /mnt/logs/gc-reports/${NOW}/*.png`; +do + filename=`basename ${fullpath}` + echo "" +done + +cat < + +EndOfFooter diff --git a/root/apps/apache/htdocs/AdminGCViz/index b/root/apps/apache/htdocs/AdminGCViz/index new file mode 100755 index 0000000..ebfbece --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/index @@ -0,0 +1,52 @@ +#!/bin/bash + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/index#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Author: mooreb $ +# $Change: 1838706 $ + +cd `dirname $0` + +prog=`basename $0` + +NFENV=/etc/profile.d/netflix_environment.sh +if [ -f ${NFENV} ]; then + . ${NFENV} + NETFLIX_VMS_EVENTS=checked +else + NFENV="" +fi + +cat < + + AdminGCViz + + +EndOfHeader + +cat < + jmap -histo:live
+ parse catalina logs looking for netflix VMS events
+ + +EndOfGenerate + +echo Look at previous reports +echo "
    " +for f in `find /mnt/logs/gc-reports -type f -name "*.png" | LANG=C sort -rn`; +do + u=`echo ${f} | sed 's|/mnt/logs/gc-reports/||'` + echo "
  • ${f}" +done +echo "
" + + +cat < + +EndOfFooter diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/collect_remote_data.sh b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/collect_remote_data.sh new file mode 100755 index 0000000..5d8541f --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/collect_remote_data.sh @@ -0,0 +1,149 @@ +#!/bin/sh + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/collect_remote_data.sh#5 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +# This script must be run on the instance where the gc.log and catalina.out must be analyzed. + +NFENV=/etc/profile.d/netflix_environment.sh +if [ -f ${NFENV} ]; then + . ${NFENV} +else + NFENV="" +fi + +# NOW=`date +%Y-%m-%dT%T` +if [ -z "$1" ]; then + echo Usage: $0 {iso8601 combined date and time representation} [ options ... ] + echo " where options are: [no]jmap_histo_live [no]vms_refresh_events ... (absence implies the option is on)" + exit 1 +fi +NOW=$1 + +# start processing optional arguments +if [ -z "$2" ]; then + jmap_histo_live=true + vms_refresh_events=true +else + shift + while [ $# -gt 0 ]; + do + case "$1" in + nojmap_histo_live) + jmap_histo_live=false + ;; + jmap_histo_live) + jmap_histo_live=true + ;; + novms_refresh_events) + vms_refresh_events=false + ;; + vms_refresh_events) + vms_refresh_events=true + ;; + esac + shift + done +fi + + +# Potential BUG: this assumes tomcat. It might be better to use jps, but the corresponding jps expression: +# JAVAPID=`jps | grep Bootstrap | awk '{print $1}'` +# doesn't seem much (any?) better +JAVAPID=`ps -elfww | grep java | grep Bootstrap | grep -v grep | awk '{print $4}'` +if [ -z "${JAVAPID}" ]; then + echo CANNOT DETERMINE JAVAPID. EXITING. + exit 1 +fi + +JAVACMDLINE=`cat /proc/${JAVAPID}/cmdline` +# This does not work due to crontabs or other programs (or users!) being able to touch files in /proc +# BOOTTIME=`ls --full-time /proc/${JAVAPID}/cmdline | awk '{print $6,$7,$8}' | sed 's/[.][0-9][0-9]*//' | sed 's/ /T/' | sed 's/ //'` +BOOTTIME_LOCAL=`ps h -p ${JAVAPID} -o lstart` +BOOTTIME=`date --date="${BOOTTIME_LOCAL}" "+%Y-%m-%dT%T+0000"` + +RDCS=/apps/apache/htdocs/AdminGCViz/remote-data-collection +OUTPUTROOT=/mnt/logs/gc-reports +OUTPUTDIR=${OUTPUTROOT}/${NOW} + +echo making output directories +mkdir -p ${OUTPUTDIR}/raw-gc-data +mkdir -p ${OUTPUTDIR}/raw-sar-data + +if [ "true" = "$jmap_histo_live" ]; then + echo 'getting jmap -histo:live logs' + mkdir -p /apps/tomcat/logs/jmap-histos + jmap -histo:live ${JAVAPID} > /apps/tomcat/logs/jmap-histos/jmap-histo-live-`date +%Y-%m-%d--%H-%M-%S` 2>&1 + cp -ar /apps/tomcat/logs/jmap-histos ${OUTPUTDIR} +fi + +echo copying gc logs +cp -a /apps/tomcat/logs/archive/*gc.log* /apps/tomcat/logs/gc.log ${OUTPUTDIR}/raw-gc-data +for f in `ls ${OUTPUTDIR}/raw-gc-data` +do + g=${OUTPUTDIR}/raw-gc-data/${f} + if [ "$(tail -c 1 ${g})" != "" ]; then + echo >> ${g} + fi +done + +echo processing gc logs +cat ${OUTPUTDIR}/raw-gc-data/*tomcat_gc.log* ${OUTPUTDIR}/raw-gc-data/gc.log | tr -d '\000' | perl -w ${RDCS}/gcdotlog_one_event_per_line.pl > ${OUTPUTDIR}/gc-events-one-per-line +cat ${OUTPUTDIR}/gc-events-one-per-line | python ${RDCS}/gcdotlog_convert_relative_into_absolute_time.py ${BOOTTIME} ${OUTPUTDIR} > ${OUTPUTDIR}/gc-events-one-per-line-in-absolute-time +cat ${OUTPUTDIR}/gc-events-one-per-line-in-absolute-time | perl -w ${RDCS}/gcdotlog_extract_time.pl ${OUTPUTDIR} > ${OUTPUTDIR}/gc-events-duration-in-seconds-only +cat ${OUTPUTDIR}/gc-events-duration-in-seconds-only |grep -v secs_since_jvm_boot | awk -F, '{print $1}' | python ${RDCS}/prepend_epoch.py > /tmp/epochs +paste --delim=, /tmp/epochs ${OUTPUTDIR}/gc-events-duration-in-seconds-only > /tmp/foo +mv /tmp/foo ${OUTPUTDIR}/gc-events-duration-in-seconds-only +bash ${RDCS}/gc_events_by_type.sh ${OUTPUTDIR} +cat ${OUTPUTDIR}/gc-events-one-per-line-in-absolute-time | perl -w ${RDCS}/gcdotlog_extract_sizes.pl ${OUTPUTDIR} > ${OUTPUTDIR}/gc-sizes.csv +cat ${OUTPUTDIR}/gc-sizes.csv |grep -v secs_since_jvm_boot | awk -F, '{print $1}' | python ${RDCS}/prepend_epoch.py > /tmp/epochs +paste --delim=, /tmp/epochs ${OUTPUTDIR}/gc-sizes.csv > /tmp/foo +mv /tmp/foo ${OUTPUTDIR}/gc-sizes.csv +rm /tmp/epochs + +if [ -n "${NFENV}" ] && [ "true" = "$vms_refresh_events" ]; then + echo looking for vms refresh events + grep --no-filename vms-refreshexecutor /apps/tomcat/logs/archive/*catalina.out* /apps/tomcat/logs/catalina.out | egrep -e RefreshStat: -e 'Refreshing caches for country' > ${OUTPUTDIR}/vms-cache-refresh-events + grep -o 'CacheRefresh:[0-9][0-9]*' ${OUTPUTDIR}/vms-cache-refresh-events | awk -F : '{print $2}' > ${OUTPUTDIR}/vms-cache-refresh-events-milliseconds + grep --no-filename CacheManagerStats /apps/tomcat/logs/archive/*catalina.out* /apps/tomcat/logs/catalina.out | grep RefreshStat: | grep OVERALL > ${OUTPUTDIR}/vms-cache-refresh-overall-events + grep -o 'CacheRefresh:[0-9][0-9]*.*duration:[0-9][0-9]*' ${OUTPUTDIR}/vms-cache-refresh-overall-events | awk '{print $1,$7}' | awk -F : '{print $2,$3}' | awk '{print $1,$3}' > ${OUTPUTDIR}/vms-cache-refresh-overall-events-milliseconds + + echo looking for vms facet info + grep --no-filename BaseManager /apps/tomcat/logs/archive/*catalina.out* /apps/tomcat/logs/catalina.out | grep CacheRefreshInfo: > ${OUTPUTDIR}/vms-cache-refresh-facet-info + awk '{print $1,$2,$8,$9,$11,$12,$13,$14}' ${OUTPUTDIR}/vms-cache-refresh-facet-info | python ${RDCS}/vms_facet_info_transform.py > ${OUTPUTDIR}/vms-cache-refresh-facet-info.csv 2> ${OUTPUTDIR}/vms-cache-refresh-facet-info-rejected-lines + bash ${RDCS}/facet_events_by_type.sh ${OUTPUTDIR} + bash ${RDCS}/facet_events_by_country.sh ${OUTPUTDIR} + + echo looking for vms objectCache stats + egrep --no-filename -r -e VMClientCacheManager -e objectCache /apps/tomcat/logs/archive/*catalina.out* /apps/tomcat/logs/catalina.out | egrep -e Processed -e objectCache > ${OUTPUTDIR}/vms-object-cache-stats + python ${RDCS}/process_vms_object_cache_stats.py ${OUTPUTDIR}/vms-object-cache-stats ${OUTPUTDIR} +fi + +echo grabbing environment +cat /proc/${JAVAPID}/cmdline | tr '\000' '\n' > ${OUTPUTDIR}/cmdline +echo ${BOOTTIME} > ${OUTPUTDIR}/jvm_boottime +env | LANG=C sort > ${OUTPUTDIR}/env +md5sum /apps/tomcat/webapps/ROOT/WEB-INF/lib/* | LANG=C sort -k2 > ${OUTPUTDIR}/web-inf-lib +/proc/${JAVAPID}/exe -version > ${OUTPUTDIR}/java-version 2>&1 +cat /proc/${JAVAPID}/maps > ${OUTPUTDIR}/maps +python ${RDCS}/parse-proc-pid-maps.py ${OUTPUTDIR}/maps > ${OUTPUTDIR}/maps-parsed +cat /proc/meminfo > ${OUTPUTDIR}/meminfo +cat /proc/cpuinfo > ${OUTPUTDIR}/cpuinfo +free -m > ${OUTPUTDIR}/free-m + +if [ -n "${NFENV}" ]; then + echo grabbing netflix-specific environment + find /apps/tomcat/webapps/ROOT/WEB-INF/classes -name "*.properties" -exec grep videometadata /dev/null {} \; > ${OUTPUTDIR}/videometadata-properties 2> /dev/null + rpm -q ${NETFLIX_APP} > ${OUTPUTDIR}/netflix-rpm-version + rpm -qa | LANG=C sort > ${OUTPUTDIR}/netflix-rpm-all-packages-versions +fi + +echo collecting sar data +for f in `ls /var/log/sa/ | grep ^sa | grep -v sar`; do sar -f /var/log/sa/${f} > ${OUTPUTDIR}/raw-sar-data/sar-cpu-${f}; done +for f in `ls /var/log/sa/ | grep ^sa | grep -v sar`; do sar -n ALL -f /var/log/sa/${f} > ${OUTPUTDIR}/raw-sar-data/sar-network-${f}; done + +# BUG: fix this implicit agreement between collect_remote_data.sh and visualize-instance.sh +# echo bundling up results +# tar -C ${OUTPUTROOT} -zcf ${OUTPUTDIR}.tar.gz ${NOW} diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/facet_events_by_country.sh b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/facet_events_by_country.sh new file mode 100755 index 0000000..5da80e2 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/facet_events_by_country.sh @@ -0,0 +1,23 @@ +#!/bin/bash + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/facet_events_by_country.sh#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +if [ -z "$1" ]; then + echo Usage: $0 output-dir + exit 1 +fi + +OUTPUTDIR=$1 +OUTDIR=${OUTPUTDIR}/facet-events-by-country +INFILE=${OUTPUTDIR}/vms-cache-refresh-facet-info.csv +mkdir -p ${OUTDIR} +EVENT_TYPES=`awk -F, '{print $3}' ${INFILE} | sort -u | grep -v country` +echo event_Types is ${EVENT_TYPES} +for e in ${EVENT_TYPES}; +do + echo egrep -e ,${e}, -e seconds_since_epoch ${INFILE} \> ${OUTDIR}/${e} + egrep -e ,${e}, -e seconds_since_epoch ${INFILE} > ${OUTDIR}/${e} +done diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/facet_events_by_type.sh b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/facet_events_by_type.sh new file mode 100755 index 0000000..0ae02b4 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/facet_events_by_type.sh @@ -0,0 +1,23 @@ +#!/bin/bash + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/facet_events_by_type.sh#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +if [ -z "$1" ]; then + echo Usage: $0 output-dir + exit 1 +fi + +OUTPUTDIR=$1 +OUTDIR=${OUTPUTDIR}/facet-events-by-cache +INFILE=${OUTPUTDIR}/vms-cache-refresh-facet-info.csv +mkdir -p ${OUTDIR} +EVENT_TYPES=`awk -F, '{print $4}' ${INFILE} | sort -u | grep -v cache` +echo event_Types is ${EVENT_TYPES} +for e in ${EVENT_TYPES}; +do + echo egrep -e ${e} -e seconds_since_epoch ${INFILE} \> ${OUTDIR}/${e} + egrep -e ${e} -e seconds_since_epoch ${INFILE} > ${OUTDIR}/${e} +done diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gc_events_by_type.sh b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gc_events_by_type.sh new file mode 100755 index 0000000..42e1531 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gc_events_by_type.sh @@ -0,0 +1,23 @@ +#!/bin/bash + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gc_events_by_type.sh#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +if [ -z "$1" ]; then + echo Usage: $0 output-dir + exit 1 +fi + +OUTPUTDIR=$1 +OUTDIR=${OUTPUTDIR}/gc-events-duration-by-event +INFILE=${OUTPUTDIR}/gc-events-duration-in-seconds-only +mkdir -p ${OUTDIR} +EVENT_TYPES=`awk -F, '{print $4}' ${INFILE} | sort -u | grep -v gc_event_type` +echo event_Types is ${EVENT_TYPES} +for e in ${EVENT_TYPES}; +do + echo egrep -e ${e} -e secs_since_epoch ${INFILE} \> ${OUTDIR}/${e} + egrep -e ${e} -e secs_since_epoch ${INFILE} > ${OUTDIR}/${e} +done diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_convert_relative_into_absolute_time.py b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_convert_relative_into_absolute_time.py new file mode 100755 index 0000000..a81f746 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_convert_relative_into_absolute_time.py @@ -0,0 +1,63 @@ +#!/usr/bin/python2.7 + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_convert_relative_into_absolute_time.py#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +import calendar +import fileinput +import math +import os +import sys +import time +import re + +# Ugh. +sys.path.insert(0, "/apps/apache/htdocs/AdminGCViz") +import vmsgcvizutils + +def mayne(): + TMPFILE = '/tmp/gcdotlog-relative-time-tmpfile' + numArgs = len(sys.argv) - 1 + if(2 != numArgs): + print "Usage: %s {iso8601 combined date and time representations} outputdir" % (sys.argv[0],) + sys.exit(1); + + bootTimeStringISO8601 = sys.argv[1] + bootTimeSecondsSinceEpochString = vmsgcvizutils.timestamp_to_epoch(bootTimeStringISO8601) + bootTimeSecondsSinceEpoch = float(bootTimeSecondsSinceEpochString) + outputDir = sys.argv[2] + bootTimeEpochFile = open(outputDir + '/jvm_boottime.epoch', 'w') + bootTimeEpochFile.write("%s\n" % bootTimeSecondsSinceEpochString) + bootTimeEpochFile.close() + + tmpFile = open(TMPFILE, 'w') + lineStartsWithFloatingPointNumberPattern = re.compile("^([0-9]+[.][0-9]+): ") + lastSecsSinceBoot = 0.0; + for line in fileinput.input('-'): + line = line.rstrip('\r\n') + found = lineStartsWithFloatingPointNumberPattern.search(line) + if found: + secsSinceBootString = found.group(1) + secsSinceBoot = float(secsSinceBootString) + if secsSinceBoot < lastSecsSinceBoot: + # now we need to truncate the output file, since we have + # seen a restart; this is not the most recent JVM boot + tmpFile.close() + tmpFile = open(TMPFILE, 'w') + lastSecsSinceBoot = secsSinceBoot + timeStamp = vmsgcvizutils.convertTimeStamp(bootTimeSecondsSinceEpoch, secsSinceBoot) + tmpFile.write("%s: %s\n" % (timeStamp, line)) + else: + tmpFile.write("%s\n" % (line,)) + + tmpFile.close() + for line in fileinput.input(TMPFILE): + line = line.rstrip('\r\n') + print "%s" % (line,) + + os.unlink(TMPFILE) + +if __name__ == "__main__": + mayne() diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_extract_sizes.pl b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_extract_sizes.pl new file mode 100755 index 0000000..7e27076 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_extract_sizes.pl @@ -0,0 +1,38 @@ +#!/usr/bin/perl -w + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_extract_sizes.pl#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +use strict; + +my $num_args = $#ARGV + 1; + +if($num_args != 1) { + die "Usage: $0 output-dir"; +} + +my $outputdir = $ARGV[0]; +my $fname = "${outputdir}/gcdotlog_extract_sizes_rejected_lines"; +open(REJECTS, ">$fname") or die "cannot open $fname for writing"; + +print "datetimestamp,secs_since_jvm_boot,young_begin_k,young_end_k,young_total_k,whole_heap_begin_k,whole_heap_end_k,whole_heap_total_k\n"; +while() { + my $line = $_; + chomp($line); + + if(0) { + # make all the lines elsif to make moving blocks easier + } + elsif($line =~ m/->.*->.*->/) { + # reject lines with three arrows + print REJECTS "$line\n"; + } + elsif($line =~ m/^([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[.][0-9]{3}[+]0000): ([0-9]+[.][0-9]{3}): .* ([0-9]+)K->([0-9]+)K\(([0-9]+)K\).* ([0-9]+)K->([0-9]+)K\(([0-9]+)K\)/) { + print "$1,$2,$3,$4,$5,$6,$7,$8\n"; + } + else { + print REJECTS "$line\n"; + } +} diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_extract_time.pl b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_extract_time.pl new file mode 100755 index 0000000..690a1d3 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_extract_time.pl @@ -0,0 +1,90 @@ +#!/usr/bin/perl -w + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_extract_time.pl#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +use strict; + +sub classify_gc_event_type { + my ($line) = @_; + + if(0) { + # make all the lines "elsif" so that they are interchangable + } + elsif($line =~ m/Full GC/) { + return "FullGC"; + } + elsif($line =~ m/\(concurrent mode failure\)/) { + return "concurrent-mode-failure"; + } + elsif($line =~ m/\(promotion failed\)/) { + return "promotion-failed"; + } + elsif($line =~ m/ParNew/) { + return "ParNew"; + } + elsif($line =~ m/CMS-initial-mark/) { + return "CMS-initial-mark"; + } + elsif($line =~ m/CMS-concurrent-mark/) { + return "CMS-concurrent-mark"; + } + elsif($line =~ m/CMS-concurrent-abortable-preclean/) { + return "CMS-concurrent-abortable-preclean"; + } + elsif($line =~ m/CMS-concurrent-preclean/) { + return "CMS-concurrent-preclean"; + } + elsif($line =~ m/CMS-remark/) { + return "CMS-remark"; + } + elsif($line =~ m/CMS-concurrent-sweep/) { + return "CMS-concurrent-sweep"; + } + elsif($line =~ m/CMS-concurrent-reset/) { + return "CMS-concurrent-reset"; + } + elsif($line =~ /PSYoungGen/) { + return "ParallelScavengeYoungGen"; + } + elsif($line =~ /DefNew/) { + return "DefNew"; + } + else { + return "unknown"; + } +} + +sub mayne { + my $num_args = $#ARGV + 1; + + if($num_args != 1) { + die "Usage: $0 output-dir"; + } + + my $outputdir = $ARGV[0]; + my $fname = "${outputdir}/gcdotlog_extract_time_rejected_lines"; + open(FP, ">$fname") or die "cannot open $fname"; + + print "datetimestamp,secs_since_jvm_boot,gc_event_type,gc_event_duration_in_seconds\n"; + while() { + my $line = $_; + chomp($line); + # -XX:+PrintGCDateStamps + if($line =~ m/^([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[.][0-9]{3}[+]0000): ([0-9]+[.][0-9]+): .*real=([0-9][0-9]*[.][0-9][0-9]*) secs\]\s*$/) { + my $datestamp = $1; + my $secs_since_jvm_boot = $2; + my $gctime_in_seconds = $3; + my $gc_event_type = classify_gc_event_type($line); + print "${datestamp},${secs_since_jvm_boot},${gc_event_type},${gctime_in_seconds}\n"; + } + else { + print FP "$line\n"; + } + } + close(FP); +} + +mayne() diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_one_event_per_line.pl b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_one_event_per_line.pl new file mode 100755 index 0000000..baf14f5 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_one_event_per_line.pl @@ -0,0 +1,38 @@ +#!/usr/bin/perl -w + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/gcdotlog_one_event_per_line.pl#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +use strict; + +sub flush_previous_record { + my ($previous_record) = @_; + if ("" ne "$previous_record") { + print "$previous_record\n"; + } +} + +my $previous_record = ""; +while() { + my $line = $_; + chomp($line); + # -XX:+PrintGCDateStamps + if($line =~ m/^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}/) { + # starting a new record + flush_previous_record($previous_record); + $previous_record = $line; + } + elsif($line =~ m/^[0-9]+[.][0-9]+: /) { + # starting a new record + flush_previous_record($previous_record); + $previous_record = $line; + } + else { + # append to the previous record + $previous_record = $previous_record . $line; + } +} +flush_previous_record($previous_record); + diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/parse-proc-pid-maps.py b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/parse-proc-pid-maps.py new file mode 100755 index 0000000..f28c137 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/parse-proc-pid-maps.py @@ -0,0 +1,52 @@ +#!/usr/bin/python2.7 + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/parse-proc-pid-maps.py#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +import sys + +def bytesToHumanReadable(n): + kay = 1024 + meg = kay*1024 + gig = meg*1024 + if ((0 <= n) and (n < kay)): + return "%d bytes" % n + elif ((kay <= n) and (n < meg)): + return "%.2fkb" % ((n+0.0)/kay) + elif ((meg <= n) and (n < gig)): + return "%.2fmb" % ((n+0.0)/meg) + else: + return "%.2fgb" % ((n+0.0)/gig) + +def readfile(filename): + segments = [] + numSegments=0L + totalBytes=0L + infile = open(filename, 'r') + line = infile.readline() + while line: + line = line.rstrip() + fields = line.split(None, 5) + if 6 == len(fields): + (memRange, perms, offset, dev, inode, pathname) = fields + elif 5 == len(fields): + (memRange, perms, offset, dev, inode) = fields + pathname = '' + else: + raise Exception('cannot unpack %s' % (line,)) + (begin,end) = memRange.split('-') + t = long(end, 16) - long(begin, 16) + totalBytes = totalBytes + t + numSegments = numSegments + 1 + x = (t, memRange, pathname) + segments.append(x) + line = infile.readline() + print filename + print "\tnum segments = %s" % (numSegments,) + print "\ttotal bytes = %s (%s)" % (totalBytes, bytesToHumanReadable(totalBytes)) + for i in sorted(segments, key=lambda x: x[0], reverse=True): + print "\t\t%12s bytes (%s) %33s %s" % (i[0], bytesToHumanReadable(i[0]), i[1], i[2]) + +readfile(sys.argv[1]) diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/prepend_epoch.py b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/prepend_epoch.py new file mode 100755 index 0000000..d39e384 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/prepend_epoch.py @@ -0,0 +1,22 @@ +#!/usr/bin/python2.7 + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/prepend_epoch.py#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +import fileinput +import sys + +# Ugh. +sys.path.insert(0, "/apps/apache/htdocs/AdminGCViz") +import vmsgcvizutils + +def mayne(): + print "secs_since_epoch" + for line in fileinput.input('-'): + line = line.rstrip('\r\n') + print "%s" % (vmsgcvizutils.timestamp_to_epoch(line),) + +if __name__ == "__main__": + mayne() diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/process_vms_object_cache_stats.py b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/process_vms_object_cache_stats.py new file mode 100755 index 0000000..8636d02 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/process_vms_object_cache_stats.py @@ -0,0 +1,73 @@ +#!/usr/bin/python2.7 + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/process_vms_object_cache_stats.py#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +import fileinput +import os +import re +import sys + +# Ugh. +sys.path.insert(0, "/apps/apache/htdocs/AdminGCViz") +import vmsgcvizutils + +numArgs = len(sys.argv) - 1 +if(2 != numArgs): + print "Usage: %s object-cache-stats-file outputdir" % (sys.argv[0],) + sys.exit(1); + +objectCacheStatsFile = sys.argv[1] +baseOutputDir = sys.argv[2] +outputDir = baseOutputDir + os.path.sep + 'vms-object-cache-stats-by-cache' +os.mkdir(outputDir) +outputFiles = {} +rejects = open(objectCacheStatsFile + '.rejects', 'w') + +timestampPattern = re.compile('^([0-9]{4}-[0-9]{2}-[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2}),([0-9]{3}) INFO (main|vms-timer-refresh) VMClientCacheManager - Processed Countries') +objectCachePattern = re.compile('objectCache\(([^)]*)\) references\(([^)]*)\) size\(([^)]*)\) ratio\(([^)]*)\) prevsize\(([^)]*)\) additions\(([^)]*)\) transfers\(([^)]*)\) hits\(([^)]*)\) orphans\(([^)]*)\)') + +header="secsSinceEpoch,iso8601Timestamp,references,size,ratio,prevsize,additions,transfers,hits,orphans\n" + +iso8601Timestamp = "1970-01-01T00:00:00.000+0000" +secsSinceEpoch = "0.000" +for line in fileinput.input(objectCacheStatsFile): + line = line.rstrip('\r\n') + foundTimestamp = timestampPattern.search(line) + if foundTimestamp: + ymd = foundTimestamp.group(1) + hms = foundTimestamp.group(2) + milliseconds = foundTimestamp.group(3) + iso8601Timestamp = '%sT%s.%s+0000' % (ymd,hms,milliseconds) + secsSinceEpoch = vmsgcvizutils.timestamp_to_epoch(iso8601Timestamp) + continue + + foundObjectCache = objectCachePattern.search(line) + if foundObjectCache: + cacheName = foundObjectCache.group(1) + references = foundObjectCache.group(2) + size = foundObjectCache.group(3) + ratio = foundObjectCache.group(4) + prevsize = foundObjectCache.group(5) + additions = foundObjectCache.group(6) + transfers = foundObjectCache.group(7) + hits = foundObjectCache.group(8) + orphans = foundObjectCache.group(9) + l = "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n" % (secsSinceEpoch,iso8601Timestamp,references,size,ratio,prevsize,additions,transfers,hits,orphans) + f = outputFiles.get(cacheName) + if f: + f.write(l) + else: + f = open(outputDir + os.path.sep + cacheName, 'w') + f.write(header) + f.write(l) + outputFiles[cacheName] = f + continue + else: + rejects.write("%s\n" % (line,)) + +rejects.close() +for fp in outputFiles.values(): + fp.close() diff --git a/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/vms_facet_info_transform.py b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/vms_facet_info_transform.py new file mode 100755 index 0000000..945bb8f --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/vms_facet_info_transform.py @@ -0,0 +1,44 @@ +#!/usr/bin/python2.7 + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/remote-data-collection/vms_facet_info_transform.py#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +import fileinput +import re + +# Ugh. +sys.path.insert(0, "/apps/apache/htdocs/AdminGCViz") +import vmsgcvizutils + +# Input: +# 2012-03-30 00:47:11,771 country(GF) cache(VideoImages) numItems(24189) totalTime(1397531) timeToCopyToDisc(5550) timeToFill(1391981) +# ... +# +# Output +# seconds_since_epoch,datetimestamp,country,cache,numitems,totalTime,timeToCopyToDisc,timeToFill +# 1333068431.771,2012-03-30T00:47:11.771+0000,GF,VideoImages,24189,1397531,5550,1391981 +# ... + +facetPattern = re.compile('^([0-9]{4}-[0-9]{2}-[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2}),([0-9]{3}) country\(([A-Z]{2})\) cache\(([^)]+)\) numItems\(([0-9]+)\) totalTime\(([0-9]+)\) timeToCopyToDisc\(([0-9]+)\) timeToFill\(([0-9]+)\)$') + +print 'seconds_since_epoch,datetimestamp,country,cache,numitems,totalTime,timeToCopyToDisc,timeToFill' +for line in fileinput.input('-'): + line = line.rstrip('\r\n') + found = facetPattern.search(line) + if found: + ymd = found.group(1) + hms = found.group(2) + milliseconds = found.group(3) + iso8601Timestamp = '%sT%s.%s+0000' % (ymd,hms,milliseconds) + secsSinceEpoch = vmsgcvizutils.timestamp_to_epoch(iso8601Timestamp) + country = found.group(4) + cache = found.group(5) + numItems = found.group(6) + totalTime = found.group(7) + timeToCopyToDisc = found.group(8) + timeToFill = found.group(9) + print '%s,%s,%s,%s,%s,%s,%s,%s' % (secsSinceEpoch, iso8601Timestamp, country, cache, numItems, totalTime, timeToCopyToDisc, timeToFill) + else: + sys.stderr.write(line) diff --git a/root/apps/apache/htdocs/AdminGCViz/visualize-facets.py b/root/apps/apache/htdocs/AdminGCViz/visualize-facets.py new file mode 100755 index 0000000..ad848fc --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/visualize-facets.py @@ -0,0 +1,110 @@ +#!/usr/bin/python2.7 + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/visualize-facets.py#2 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.mlab as mlab +import matplotlib.ticker as ticker +import matplotlib.dates as mdates +import matplotlib.lines as lines +import pylab +import sys +import os +import vmsgcvizutils +from mpl_toolkits.mplot3d import Axes3D + +numArgs = len(sys.argv) - 1 +if(2 != numArgs): + print "Usage: %s now:iso8601timestamp vms-gc-report-directory" % (sys.argv[0],) + sys.exit(1); + +now = sys.argv[1] +vmsGCReportDirectory = sys.argv[2] + +# seconds_since_epoch,datetimestamp,country,cache,numitems,totalTime,timeToCopyToDisc,timeToFill +# 1333073736.242,2012-03-30T02:15:36.242+0000,BS,VideoEDFulfillmentData,23242,1352,757,595 +fnameFullPath = vmsGCReportDirectory + os.path.sep + 'vms-cache-refresh-facet-info.csv' +recordset = mlab.csv2rec(fnameFullPath) + +countries = recordset.country +countriesDict = {} +countriesList = [] +countryNum = 0 +for c in countries: + cPrime = countriesDict.get(c) + if cPrime: + countriesList.append(cPrime) + else: + countryNum = countryNum + 1 + countriesDict[c] = countryNum + countriesList.append(countryNum) + +caches = recordset.cache +cachesDict = {} +cachesList = [] +cacheNum = 0 +for c in caches: + cPrime = cachesDict.get(c) + if cPrime: + cachesList.append(cPrime) + else: + cacheNum = cacheNum + 1 + cachesDict[c] = cacheNum + cachesList.append(cacheNum) + +allTimeSpentInAllFacets = 0 +perFacetRecordSets = [] +facetEventByCacheDir = vmsGCReportDirectory + os.path.sep + 'facet-events-by-cache' +dirList=os.listdir(facetEventByCacheDir) +for fname in dirList: + fnameFullPath = facetEventByCacheDir + os.path.sep + fname + r = mlab.csv2rec(fnameFullPath) + allTimeSpentInFacet = sum(r.totaltime) + allTimeSpentInAllFacets = allTimeSpentInAllFacets + allTimeSpentInFacet + timeToCopyFacetToDisc = sum(r.timetocopytodisc) + timeToFillFacet = sum(r.timetofill) + d = {'totaltime' : allTimeSpentInFacet, + 'copytime' : timeToCopyFacetToDisc, + 'filltime' : timeToFillFacet, + 'facetName' : fname, + 'recordset' : r} + perFacetRecordSets.append(d) +perFacetRecordSets.sort(reverse=True, key=lambda d: d['totaltime']) # sort by all time spent in facet + +facetReportFileName = vmsGCReportDirectory + os.path.sep + 'facet-report.txt' +facetReportFP = open(facetReportFileName, 'w') +for r in perFacetRecordSets: + timeThisFacet = r['totaltime'] + s = '%10s milliseconds spent in %25s (%5.2f%%) {copy: %5.2f%%; fill: %5.2f%%}' % ( + timeThisFacet, + r['facetName'], + ((timeThisFacet*100.0)/allTimeSpentInAllFacets), + (r['copytime']*100.0/allTimeSpentInAllFacets), + (r['filltime']*100.0/allTimeSpentInAllFacets), + ) + facetReportFP.write("%s\n" % (s,)) + print s +facetReportFP.close() + +# BUG +sys.exit(0) + +fig = plt.figure() +ax = fig.gca(projection='3d') +ax.plot(countriesList, recordset.totaltime/1000, zs=cachesList, zdir='z', marker='o') + +ax.set_xlabel('country') +ax.set_ylabel('time to fill this country/facet (seconds)') +ax.set_zlabel('facet') +ax.set_title('total time to fill each facet for each country') + +# BUG: save all generated figures. + +plt.show() + +# sort by total time descending: +# sort -t , -k 6 -rn vms-cache-refresh-facet-info.csv | more diff --git a/root/apps/apache/htdocs/AdminGCViz/visualize-gc.py b/root/apps/apache/htdocs/AdminGCViz/visualize-gc.py new file mode 100755 index 0000000..bd70336 --- /dev/null +++ b/root/apps/apache/htdocs/AdminGCViz/visualize-gc.py @@ -0,0 +1,231 @@ +#!/usr/bin/python2.7 + +# $Id: //depot/cloud/rpms/nflx-webadmin-gcviz/root/apps/apache/htdocs/AdminGCViz/visualize-gc.py#4 $ +# $DateTime: 2013/05/15 18:34:23 $ +# $Change: 1838706 $ +# $Author: mooreb $ + +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.mlab as mlab +import matplotlib.ticker as ticker +import matplotlib.dates as mdates +import matplotlib.lines as lines +import pylab +import sys +import os +import vmsgcvizutils + +def isNetflixInternal(): + try: + open("/etc/profile.d/netflix_environment.sh", "r") + return True + except IOError: + return False + +numArgs = len(sys.argv) - 1 +if(2 != numArgs): + print "Usage: %s now:iso8601timestamp vms-gc-report-directory" % (sys.argv[0],) + sys.exit(1); + +now = sys.argv[1] +vmsGCReportDirectory = sys.argv[2] +gcEventDirectory = vmsGCReportDirectory + os.path.sep + 'gc-events-duration-by-event' + +event_to_symbol_and_color = { +"FullGC" : ("mD", 15), #(stop the world) +"concurrent-mode-failure" : ("rh", 10), #(stop the world) +"promotion-failed" : ("rH", 10), #(stop the world) +"ParNew" : ("ro", 5), #(stop-the-world) +"CMS-initial-mark" : ("r^", 5), #(stop-the-world) +"CMS-remark" : ("rs", 5), #(stop the world) +"CMS-concurrent-mark" : ("g,", 1), #(concurrent includes yields to other theads) +"CMS-concurrent-abortable-preclean" : ("g,", 1), #(concurrent) +"CMS-concurrent-preclean" : ("g,", 1), #(concurrent) +"CMS-concurrent-sweep" : ("g,", 1), #(concurrent) +"CMS-concurrent-reset" : ("g,", 1), #(concurrent?) +"ParallelScavengeYoungGen" : ("ro", 5), #(stop-the-world) +"DefNew" : ("ro", 5), #(stop-the-world) +"unknown" : ("r*", 15), #??? +} + +# These markers are present in gc.log but not accounted for above as they have no duration. +# They might be interesting to visualize but we currently minimize the CMS events anyway. +# CMS-concurrent-mark-start +# CMS-concurrent-preclean-start +# CMS-concurrent-sweep-start +# CMS-concurrent-reset-start +# CMS-concurrent-abortable-preclean-start + +## Read environmental information +fullEnvDict = vmsgcvizutils.envFileAsDictionary(vmsGCReportDirectory + os.path.sep + 'env') +smallEnvDict = { + 'ec2PublicHostname' : fullEnvDict.get('EC2_PUBLIC_HOSTNAME', 'no-public-hostname'), + 'instanceID' : fullEnvDict.get('EC2_INSTANCE_ID', 'no-instance-id'), + 'instanceType' : fullEnvDict.get('EC2_INSTANCE_TYPE', 'no-instance-type'), + 'appname' : fullEnvDict.get('NETFLIX_APP', 'unknown-app'), + 'asg' : fullEnvDict.get('NETFLIX_AUTO_SCALE_GROUP', 'unknown-asg'), + 'env' : fullEnvDict.get('NETFLIX_ENVIRONMENT', 'unknown-env'), +} + +## Read GC event records, one file per type. +# The gc event records have this format: +# secs_since_epoch,datetimestamp,secs_since_jvm_boot,gc_event_type,gc_event_duration_in_seconds +# for example: +# 1333055023.424,2012-03-29T21:03:43.424+0000,11.272,ParNew,0.19 +# in numpy-speak: +# dtype=[('secs_since_epoch', '