Join GitHub today
Profiling Using Valgrind
How to "profile" (analyze time and memory usage) your Castle Game Engine applications
Table of Contents:
Note that this document is just a summary. For the full description, read the documentation of FPC, Valgrind and Callgrind manual:
If you use our Build Tool, just recompile your project like this:
castle-engine clean castle-engine compile --mode=valgrind
Otherwise, configure your compilation options to
Make sure to use these options:
- You MUST use -gv option, this adds stuff necessary for valgrind.
- You SHOULD use -gl (line info) to get line number information.
- You SHOULD NOT use -Xs (strip debug info), it would strip useful function info from your exe.
With the exception of the options mentioned above, everything else should be configured like for a release build. Otherwise you may find serious "time eaters" in code related to range or overflow checking, and they will skew your results. You want to profile the application version that you release to users, which should have range/overflow checks turned off (for maximum speed). See here for a description what are range and overflow checks.
If you compile on the command-line using direct
fpc ...command and
@castle-fpc.cfgfile, then you can apply the options indicated above inside the
castle-fpc.cfgfor Valgrind options and uncomment them. Be sure to also comment out
Make sure to recompile all the units. Call
castle-engine cleanor whatever other command you use to force recompiling all the code. Otherwise, you will not get profiling info inside some routines.
Note that running program through callgrind adds an enormous slowdown, especially with instrumentation (this is when actual measurements take place). So it's advised to start without instrumentation, and only turn it on for the interested code part.
valgrind --tool=callgrind --instr-atstart=no ./my-program # from other shell: callgrind_control -i on callgrind_control -i off # investigate the report: kcachegrind
There's lots of useful information shown by
kcachegrind. Personally I found it easiest to look at the "Call Graph" tab. "Drill down" by moving in this graph (and clicking on routines) to find the bottleneck that you can fix.
Analyze memory usage
valgrind --tool=massif --run-libc-freeres=no ./my-program
There are some more useful Valgrind options, we have them in massif_fpc script in https://github.com/castle-engine/cge-scripts/blob/master/massif_fpc . So just get https://raw.githubusercontent.com/castle-engine/cge-scripts/master/massif_fpc , place it in your
$PATH, and then execute
Afterwards investigate the resulting massif.out.xxx file, by
ms_print massif.out.xxx > massif_output.txt
massif_output.txt in any text editor. It may look scary, but remain calm :)
You usually want to find the "peak" snapshot (moment when your application was using the most memory). You can find it looking at the
Number of snapshots: 58 Detailed snapshots: [1, 2, ..., 42 (peak), 46, 52]
The "peak" is at 42nd snapshot in the example above. A graph (above the
Number of snapshotsline) should confirm that this is the highest moment in time.
Then find the analysis of this "peak" in
massif_output.txtfile, e.g. searching regexp
Browse it, and the main "memory eater" should be visible.
Note that a memory may be allocated in some other library, e.g. inside OpenGL.
This often happens because you use a lot of texture memory. Use TextureMemoryProfiler to analyze your texture memory usage. Use various optimization hints related to textures to decrease texture memory usage.
Alternative profiling methods, without Valgrind
Valgrind is really powerful, and I advice getting familiar with it. But if it seems too difficult (or not available on your platform), there are other ways to profile speed and memory usage of your programs.
You can measure the speed of operations using TCastleProfiler (since CGE 6.5). It's automatically used for various CGE loading operations (all you need to do is enable it, and show somewhere the report). It's trivial to use it also for your own routines. The gathered times are grouped in a tree structure, so you can see what contributed to what.
You can measure the speed of your routines using ProcessTimer from
CastleTimeUtilsunits. There's an example code under that link.
You can measure the memory usage of your textures using TextureMemoryProfiler. It measures the memory usage on GPU, so it's actually something very different than what
massifmeasures, and it makes sense independently if you use
See the manual about optimization for more ideas.
As a general rule, avoid judging the speed "by a hunch". Our intuitions about "what is fast / what is slow" are often wrong, it's always better to actually measure the thing you want to optimize. And optimized code is usually harder to read/maintain, so you will do wisely by optimizing only what is really necessary.