Support for Python's cProfile was recently added to Cython http://hg.cython.org/cython-devel/rev/181618626844. This lets one profile code across the Python/C boundary. See the http://docs.cython.org/src/tutorial/profiling_tutorial.html.
The overhead is quite small, e.g. on a 2.33GHz Intel Core 2 Duo
>>> from profile_profile import * >>> time time_cdef_prof(10**8) CPU times: user 1.00 s, sys: 0.00 s, total: 1.00 s Wall time: 1.01 s 3.3333332833346318e+23 >>> time time_cdef_no_prof(10**8) CPU times: user 0.38 s, sys: 0.00 s, total: 0.38 s Wall time: 0.39 s 3.3333332833346318e+23 >>> time time_def_prof(10**7) CPU times: user 2.07 s, sys: 0.01 s, total: 2.08 s Wall time: 2.08 s 49999995000000.0 >>> time time_def_no_prof(10**7) CPU times: user 2.01 s, sys: 0.01 s, total: 2.02 s Wall time: 2.03 s 49999995000000.0 >>> (1.01 - 0.39) / 10**8 6.2000000000000001e-09 >>> (2.08 - 2.03) / 10**7 5.0000000000000266e-09
In each case, with profiling support compiled, but not used, the overhead was on the order of several nanoseconds, or a dozen clock cycles. If the function bodies were at all non-trivial, one probably wouldn't notice at all.
There is limited support for Robert Kern's line profiler , see https://github.com/cython/cython/blob/master/tests/run/line_profile_test.srctree
Only def functions are supported at the moment.
valgrind --tool=callgrind -v --dump-instr=yes --trace-jump=yes --callgrind-out-file=callgrind.log python myscript.py kcachegrind callgrind.log
OSX itself provides very good profiling tool called
``Instruments.app``. Valgrind now also runs under OSX, and kcachegrind can be installed via macports.
A quick way to get profiling info is to