Join GitHub today
Profiling is the process of finding out what the execution time of each line of code or method is. This allows you to spot bottlenecks. This may also be referred to as benchmarking
See also: tracing.
How to profile a CasADi C++ program?
Compile CasADi and your C++ program with a
How to profile a CasADi python program?
python -m cProfile -o stats.prof myscript.py runsnake stats.prof
Note that python profiling stops at the SWIG boundary. To really get profiling of C++ as well, try 'perf', a statistical profiler:
perf record -g -- python myscript.py perf report -n -g graph,0.5,caller --comm=python
Things to try: pressing E/C. Right arrow twice to see timings on source code
Note that perf can possibly disrupt your system permanently.
Profiling the memory life in a python program:
import psutil import os from casadi import * pid = os.getpid() p = psutil.Process(pid) def getinfo(p): ret =  ret.append(p.get_memory_info()) ret.append(p.get_ext_memory_info()) ret+= [i for i in p.get_memory_maps() if "casadi" in i.path] return ret pre = getinfo(p) print pre n = 1000 s = Sparsity.dense(n,n) expected = ((n+1)+n*n)*4 # Assuming int32 print "expected [bytes]:", expected post = getinfo(p) print post def showdiff(pre,post): for k in pre.__dict__.keys(): pre_ = getattr(pre,k) post_= getattr(post,k) if isinstance(pre_,int): print "%20s: %10d -> %10d | delta = %10d (%0.2f %%)" % (k,pre_,post_, post_ - pre_,100*float(post_ - pre_)/expected) for pre_,post_ in zip(pre,post): print "-"*80 if hasattr(pre_,"path"): print pre_.path showdiff(pre_,post_)
How to profile CasADi virtual machines
In your CasADi script, do:
On the terminal, run
This will generate a local web page with statistics.
This requires CasADi to be built with
ENABLE_PROFILING options set to
There is some work underway to make a combined call-graph and treemap representation in the webpage, e.g.:
Generate C code and use C profiling tools
First generate C code with
You may want to edit the generated
main function to make inputs never 0 (avoid divide by zero) or increase the length of the loop for better statistics.
Compile that code with profiling options, like
gcc -pg -lm my_fun.c -o my_fun
Then you run that code with
and generate a profile report with
If you want to get really fancy, get graphviz and gprof2dot.py, and run something like
gprof my_fun | ./gprof2dot.py | dot -Tpng -o output.png
You will get a beautiful graph.
callgrind + kcachegrind
With callgrind you can run a binary and magically get profiling output. The disadvantage is that it doesn't handle very large generated code like gprof does. First compile your binary
gcc -g -lm my_fun.c -o my_fun
Run the binary with
valgrind --tool=callgrind ./my_fun
This generates a file like
callgrind.out.1234. You can view this output with a tool like kcachegrind
Kcachegrind has great interactive tools: also:
More advanced usages of callgrind:
- Investigate only a portion of your program:
callgrind_control -i onwithin your program and end with
callgrind_control -i off.
- Profiling jitted/external parts of CasADi needs some extra care.
shell_compilerissues a dlclose, which makes callgrind ignore the annotated source code. Make sure to dump before the compiled function goes out of scope:
see https://docs.kde.org/stable/en/kdesdk/kcachegrind/using-kcachegrind.html for more info