jgillis edited this page Aug 21, 2018 · 30 revisions

Profiling is the process of finding out what the execution time of each line of code or method is. This allows you to spot bottlenecks. This may also be referred to as benchmarking

See also: tracing.

How to profile a CasADi C++ program?

Compile CasADi and your C++ program with a -pg flag.

How to profile a CasADi python program?

with package runsnakerun:

python -m cProfile -o

Note that python profiling stops at the SWIG boundary. To really get profiling of C++ as well, try 'perf', a statistical profiler:

perf record -g -- python
perf report -n -g graph,0.5,caller --comm=python

Things to try: pressing E/C. Right arrow twice to see timings on source code

Note that perf can possibly disrupt your system permanently.

Profiling the memory life in a python program:

import psutil

import os
from casadi import *

pid = os.getpid()

p = psutil.Process(pid)

def getinfo(p):
  ret = []
  ret+= [i for i in p.get_memory_maps() if "casadi" in i.path]
  return ret

pre = getinfo(p)
print pre

n = 1000
s = Sparsity.dense(n,n)

expected = ((n+1)+n*n)*4
# Assuming int32
print "expected [bytes]:", expected

post = getinfo(p)
print post

def showdiff(pre,post):
  for k in pre.__dict__.keys():
    pre_ = getattr(pre,k)
    post_= getattr(post,k)
    if isinstance(pre_,int):
      print "%20s: %10d -> %10d | delta = %10d  (%0.2f %%)" % (k,pre_,post_, post_ - pre_,100*float(post_ - pre_)/expected)

for pre_,post_ in zip(pre,post):
  print "-"*80
  if hasattr(pre_,"path"):
	print pre_.path

How to profile CasADi virtual machines

In your CasADi script, do: CasadiOptions.startProfiling('prof.log')

On the terminal, run

casadi-build-dir/bin/profilereport prof.log

This will generate a local web page with statistics. This requires CasADi to be built with ENABLE_PROFILING options set to ON

There is some work underway to make a combined call-graph and treemap representation in the webpage, e.g.: bench

Generate C code and use C profiling tools

First generate C code with


You may want to edit the generated main function to make inputs never 0 (avoid divide by zero) or increase the length of the loop for better statistics.


Compile that code with profiling options, like

gcc -pg -lm my_fun.c -o my_fun

Then you run that code with


and generate a profile report with

gprof my_fun

If you want to get really fancy, get graphviz and, and run something like

gprof my_fun | ./ | dot -Tpng -o output.png

You will get a beautiful graph. output

callgrind + kcachegrind

With callgrind you can run a binary and magically get profiling output. The disadvantage is that it doesn't handle very large generated code like gprof does. First compile your binary

gcc -g -lm my_fun.c -o my_fun

Run the binary with

valgrind --tool=callgrind ./my_fun

This generates a file like callgrind.out.1234. You can view this output with a tool like kcachegrind

kcachegrind callgrind.out.1234

Kcachegrind has great interactive tools: callgrind_treemap also: callgrind_line_by_line

More advanced usages of callgrind:

  • Investigate only a portion of your program: --instr-atstart=no. Perform callgrind_control -i on within your program and end with callgrind_control -i off.
  • Profiling jitted/external parts of CasADi needs some extra care. shell_compiler issues a dlclose, which makes callgrind ignore the annotated source code. Make sure to dump before the compiled function goes out of scope: callgrind_control -d

see for more info

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.