Alex is a software profiler for C and C++ programs on Linux. With it, you can locate performance issues and the parts of your code that cause them.
C and C++ dependencies
git clone https://github.com/curtsinger-lab/alex.git cd alex make
To see an example of Alex in action:
npm run example
To collect performance data from a program:
node . collect /path/to/your/program
To visualize data already collected from a program:
node . visualize /path/to/your/data.bin
How does it work?
Alex has three main components: data collection, visualization, and analysis.
The data collection component of Alex works as an
LD_PRELOADed shared object. It utilizes the Linux
libpfm4 libraries to analyze certain performance attributes of a target program. The primary information used is the number of CPU cycles and instructions; they determine the speed of the program's execution. Additionally, stack frames are used to find the call stack of a given sample. Various other events can also be added, such as
MEM_LOAD_RETIRED.L3_MISS (which lists retired memory load instructions that caused cache misses on the L3 cache or its counterpart) and
MEM_LOAD_RETIRED.L3_HIT (which lists such instructions that caused cache hits). It then outputs these data as protocol buffers, a space-efficent data format.
The visualization portion of Alex is contained in an Electron app, which takes the results of the data collection and creates scatterplots of resource usage over time using D3. A plot is displayed for each resource collected by the data collector, and data points are colored differently depending on how tightly packed they are.
Alex's analysis is initiated when you select a region of the scatterplots. You might consider selecting regions with strange spikes, dips, or density, or you might analyze any random part of a plot; regardless, Alex compares the functions found within the selected regions to the ones found outside of them. It applies the statistical technique of logistic regression to accomplish this, using stochastic gradient descent as a minimization algorithm to provide accurate results with minimal delays.
What resources do you currently profile?
- Cache hit and miss counts among L1, L2, and L3 caches, converted to miss rates on graphing
- Branch misprediction rate
- Instructions per cycle
- Overall power, CPU power, and memory power usage
- And more, eventually!