profiling
Profiling is the process of finding out what the execution time of each line of code or method is. This allows you to spot bottlenecks. This may also be referred to as benchmarking
See also: tracing.
Compile CasADi and your C++ program with a -pg
flag.
with package runsnakerun
:
python -m cProfile -o stats.prof myscript.py
runsnake stats.prof
Note that python profiling stops at the SWIG boundary. To really get profiling of C++ as well, try 'perf', a statistical profiler:
perf record -g -- python myscript.py
perf report -n -g graph,0.5,caller --comm=python
Things to try: pressing E/C. Right arrow twice to see timings on source code
Note that perf can possibly disrupt your system permanently.
Profiling the memory life in a python program:
import psutil
import os
from casadi import *
pid = os.getpid()
p = psutil.Process(pid)
def getinfo(p):
ret = []
ret.append(p.get_memory_info())
ret.append(p.get_ext_memory_info())
ret+= [i for i in p.get_memory_maps() if "casadi" in i.path]
return ret
pre = getinfo(p)
print pre
n = 1000
s = Sparsity.dense(n,n)
expected = ((n+1)+n*n)*4
# Assuming int32
print "expected [bytes]:", expected
post = getinfo(p)
print post
def showdiff(pre,post):
for k in pre.__dict__.keys():
pre_ = getattr(pre,k)
post_= getattr(post,k)
if isinstance(pre_,int):
print "%20s: %10d -> %10d | delta = %10d (%0.2f %%)" % (k,pre_,post_, post_ - pre_,100*float(post_ - pre_)/expected)
for pre_,post_ in zip(pre,post):
print "-"*80
if hasattr(pre_,"path"):
print pre_.path
showdiff(pre_,post_)
In your CasADi script, do:
CasadiOptions.startProfiling('prof.log')
On the terminal, run
casadi-build-dir/bin/profilereport prof.log
This will generate a local web page with statistics.
This requires CasADi to be built with ENABLE_PROFILING
options set to ON
There is some work underway to make a combined call-graph and treemap representation in the webpage, e.g.:
First generate C code with
Function::generateCode(generate_main=true)
You may want to edit the generated main
function to make inputs never 0 (avoid divide by zero) or increase the length of the loop for better statistics.
Compile that code with profiling options, like
gcc -pg -lm my_fun.c -o my_fun
Then you run that code with
./my_fun
and generate a profile report with
gprof my_fun
If you want to get really fancy, get graphviz and gprof2dot.py, and run something like
gprof my_fun | ./gprof2dot.py | dot -Tpng -o output.png
You will get a beautiful graph.
With callgrind you can run a binary and magically get profiling output. The disadvantage is that it doesn't handle very large generated code like gprof does. First compile your binary
gcc -g -lm my_fun.c -o my_fun
Run the binary with
valgrind --tool=callgrind ./my_fun
This generates a file like callgrind.out.1234
. You can view this output with a tool like kcachegrind
kcachegrind callgrind.out.1234
Kcachegrind has great interactive tools: also:
More advanced usages of callgrind:
- Investigate only a portion of your program:
--instr-atstart=no
. Performcallgrind_control -i on
within your program and end withcallgrind_control -i off
. - Profiling jitted/external parts of CasADi needs some extra care.
shell_compiler
issues a dlclose, which makes callgrind ignore the annotated source code. Make sure to dump before the compiled function goes out of scope:callgrind_control -d
see https://docs.kde.org/stable/en/kdesdk/kcachegrind/using-kcachegrind.html for more info
LD_PRELOAD=/usr/local/lib/libprofiler.so CPUPROFILE=test.prof python file.py
pprof --callgrind`which python` > test.callgrind
kcachegrind test.callgrind