Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
197 lines (161 sloc) 9.19 KB

Flame Graphs for Spark - Tools and Notes

In this note you can find a few links and basic examples relevant to using Flame Graphs for profiling Apache Spark workloads running in the JVM on Linux.

TL;DR use async-profiler

Download from [https://github.com/jvm-profiling-tools/async-profiler]
Build as in the README (export JAVA_HOME and run make)
Find the pid of the JVM runnign the Spark executor, example:

$ jps
171657 SparkSubmit
171870 Jps

Profile JVM and create the flamegraph, example:

./profiler.sh -d 30 -f $PWD/flamegraph1.svg <pid_of_JVM>

Visualize the on-CPU flamegraph:

firefox flamegraph1.svg

Intro

Stack profiling and on-CPU Flame Graph visualization are very useful tools and techniques for investigating CPU workloads.
See Brendan Gregg's page on Flame Graphs
Stack profiling is useful for understanding and drilling-down on "hot code": you can use it to find parts of the code using considerable amount of time and provide insights for troubleshooting. FlameGraph visualization of the stack profiles brings additional value, including the fact of being an appealing interface and providing context about the running the code, by showing for example the parent functions.

The main challenge that several tools undertake for profiling the JVM is on how to collect stack frames precisely and with low overhead. For more details related to the challenges of profiling Java/JVM see

A list of profilers relevant for troubleshooting Spark workloads


Flame Graph repo:

Download: git clone https://github.com/brendangregg/FlameGraph

Example of usage of async-profiler

Download from [https://github.com/jvm-profiling-tools/async-profiler]
Build as in the README (export JAVA_HOME and make)
Find the pid of the JVM runnign the Spark executor, example:

$ jps
171657 SparkSubmit
171870 Jps

Profile JVM and create the flamegraph, example:

./profiler.sh -d 30 -f $PWD/flamegraph1.svg <pid_of_JVM>

Visualize the on-CPU flamegraph:

firefox flamegraph1.svg

Example of the output:
Click here to get the SVG version of the on-CPU Flamegraph Example


async-profiler by default records stack traces on CPU events, it can also be configured to record stack traces on other type of events. The list of available events is available as in this example:

./profiler.sh list <pid_of_JVM>

Perf events:
  cpu
  page-faults
  context-switches
  cycles
  instructions
  cache-references
  cache-misses
  branches
  branch-misses
  bus-cycles
  L1-dcache-load-misses
  LLC-load-misses
  dTLB-load-misses
Java events:
  alloc
  lock

Example of profile on alloc (heap memory allocation) events

./profiler.sh -d 30  -e alloc -f $PWD/flamegraph_heap.svg <pid_of_JVM>

This is the syntax for an older version of async profiler used to profile heap memory allocations (eventually delete from this doc)

# obsolete syntax for an older version of async-profiler
./profiler.sh -d 30  -m heap -o collapsed -f $PWD/flamegraph_heap.txt <pid_of_JVM>
../FlameGraph/flamegraph.pl --colors=mem flamegraph_heap.txt >flamegraph_heap.svg

Example of the output:
Click here to get the SVG version of the Heap Flamegraph Example


Example of usage of perf for java/Spark:

Get perf-map-agent and build it following instructions at:
https://github.com/jvm-profiling-tools/perf-map-agent

set JAVA_HOME and AGENT_OME for FlameGraph/jmaps

run Spark with extra java options. examples:
--conf "spark.driver.extraJavaOptions"="-XX:+PreserveFramePointer"
or:
--conf "spark.driver.extraJavaOptions"="-XX:+PreserveFramePointer -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints"
note:
similarly add options on executors with --conf "spark.driver.extraJavaOptions"=...

Gather data with (example): perf record -a -g -F 99 -p <pid> sleep 10; FlameGraph/jmaps

Generate the flamegraph: perf script |../FlameGraph/stackcollapse-perf.pl | ../FlameGraph/flamegraph.pl > perfFlamegraph1.svg


Example of usage of JMC and Java Flight Recorder

Start Spark with the extra Java options (only driver options needed if running in local mode):

--conf "spark.driver.extraJavaOptions"="-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:FlightRecorderOptions=stackdepth=1024"
--conf "spark.executor.extraJavaOptions"="-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:FlightRecorderOptions=stackdepth=1024"

Gather data:

jcmd 146903 JFR.start filename=sparkProfile1.jfr duration=30s

Process the Java Flight Recorder file with jfr-report-tool, see instructions at: [https://github.com/lhotari/jfr-report-tool]

jfr-report-tool/jfr-report-tool -e none -m 1 sparkProfile3.jfr

In alternative can use:
[https://github.com/chrishantha/jfr-flame-graph]
jfr-flame-graph/run.sh -f sparkProfile1.jfr -o spark_jfr_out.txt ../FlameGraph/flamegraph.pl spark_jfr_out.txt > perf2.svg