Skip to content
jgillis edited this page Dec 16, 2019 · 32 revisions

Profiling is the process of finding out what the execution time of each line of code or method is. This allows you to spot bottlenecks. This may also be referred to as benchmarking

See also: tracing.

How to profile a CasADi C++ program?

Compile CasADi and your C++ program with a -pg flag.

How to profile a CasADi python program?

with package runsnakerun:

python -m cProfile -o stats.prof myscript.py
runsnake stats.prof

Note that python profiling stops at the SWIG boundary. To really get profiling of C++ as well, try 'perf', a statistical profiler:

perf record -g -- python myscript.py
perf report -n -g graph,0.5,caller --comm=python

Things to try: pressing E/C. Right arrow twice to see timings on source code

Note that perf can possibly disrupt your system permanently.

Profiling the memory life in a python program:

import psutil

import os
from casadi import *

pid = os.getpid()

p = psutil.Process(pid)

def getinfo(p):
  ret = []
  ret.append(p.get_memory_info())
  ret.append(p.get_ext_memory_info())
  ret+= [i for i in p.get_memory_maps() if "casadi" in i.path]
  return ret

pre = getinfo(p)
print pre

n = 1000
s = Sparsity.dense(n,n)

expected = ((n+1)+n*n)*4
# Assuming int32
print "expected [bytes]:", expected

post = getinfo(p)
print post


def showdiff(pre,post):
  for k in pre.__dict__.keys():
    pre_ = getattr(pre,k)
    post_= getattr(post,k)
    if isinstance(pre_,int):
      print "%20s: %10d -> %10d | delta = %10d  (%0.2f %%)" % (k,pre_,post_, post_ - pre_,100*float(post_ - pre_)/expected)

for pre_,post_ in zip(pre,post):
  print "-"*80
  if hasattr(pre_,"path"):
	print pre_.path
  showdiff(pre_,post_)

How to profile CasADi virtual machines

In your CasADi script, do: CasadiOptions.startProfiling('prof.log')

On the terminal, run

casadi-build-dir/bin/profilereport prof.log

This will generate a local web page with statistics. This requires CasADi to be built with ENABLE_PROFILING options set to ON

There is some work underway to make a combined call-graph and treemap representation in the webpage, e.g.: bench

Generate C code and use C profiling tools

First generate C code with

Function::generateCode(generate_main=true)

You may want to edit the generated main function to make inputs never 0 (avoid divide by zero) or increase the length of the loop for better statistics.

Gprof

Compile that code with profiling options, like

gcc -pg -lm my_fun.c -o my_fun

Then you run that code with

./my_fun

and generate a profile report with

gprof my_fun

If you want to get really fancy, get graphviz and gprof2dot.py, and run something like

gprof my_fun | ./gprof2dot.py | dot -Tpng -o output.png

You will get a beautiful graph. output

callgrind + kcachegrind

With callgrind you can run a binary and magically get profiling output. The disadvantage is that it doesn't handle very large generated code like gprof does. First compile your binary

gcc -g -lm my_fun.c -o my_fun

Run the binary with

valgrind --tool=callgrind ./my_fun

This generates a file like callgrind.out.1234. You can view this output with a tool like kcachegrind

kcachegrind callgrind.out.1234

Kcachegrind has great interactive tools: callgrind_treemap also: callgrind_line_by_line

More advanced usages of callgrind:

  • Investigate only a portion of your program: --instr-atstart=no. Perform callgrind_control -i on within your program and end with callgrind_control -i off.
  • Profiling jitted/external parts of CasADi needs some extra care. shell_compiler issues a dlclose, which makes callgrind ignore the annotated source code. Make sure to dump before the compiled function goes out of scope: callgrind_control -d see https://docs.kde.org/stable/en/kdesdk/kcachegrind/using-kcachegrind.html for more info

pprof (part of gperftools 2.0)

LD_PRELOAD=/usr/local/lib/libprofiler.so CPUPROFILE=test.prof python file.py
pprof --callgrind`which python` > test.callgrind
kcachegrind test.callgrind
Clone this wiki locally