# A quick introduction to profilers
##  Valgrind and Gprof


So that you can skip the long struggle I've been throught


OpenSorce and Free framework for dynamic analysis tools:

* memory error detector, ✔️
* two thread error detectors,
* cache and branch-prediction profiler,
* call-graph generating cache and branch-prediction profiler,
* heap profiler ✔️

**Installation : **
Donwload and install from : http://valgrind.org/downloads/current.html


## Memory Error :

It can detect Memory access error and Memory leaks, quite usefull in C/C++

Just run : `valgrind --tool=memcheck <your_exe> <exe options>`

 


```bash
LEAK SUMMARY:
   definitely lost: 48 bytes in 3 blocks.
   indirectly lost: 32 bytes in 2 blocks.
     possibly lost: 96 bytes in 6 blocks.
   still reachable: 64 bytes in 4 blocks.
        suppressed: 0 bytes in 0 blocks.
        
8 bytes in 1 blocks are definitely lost in loss record 1 of 14
   at 0x........: malloc (vg_replace_malloc.c:...)
   by 0x........: mk (leak-tree.c:11)
   by 0x........: main (leak-tree.c:39)

88 (8 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 14
   at 0x........: malloc (vg_replace_malloc.c:...)
   by 0x........: mk (leak-tree.c:11)
   by 0x........: main (leak-tree.c:25)
        ```

## Heap profiler 

Usefull when memory storage is limmited (thanks PIC). **⚠️** slow down ~40x

Just run : `valgrind --tool=massif <your_exe> <exe options> && ms_print massif.out`




There is an exemple : 

```C
 1      #include <stdlib.h>
 2
 3      void g(void)
 4      {
 5         malloc(4000);
 6      }
 7
 8      void f(void)
 9      {
10         malloc(2000);
11         g();
12      }
13
14      int main(void)
15      {
16         int i;
17         int* a[10];
18
19         for (i = 0; i < 10; i++) {
20            a[i] = malloc(1000);
21         }
22
23         f();
24
25         g();
26
27         for (i = 0; i < 10; i++) {
28            free(a[i]);
29         }
30
31         return 0;
32      }    
```

```bash
9.63^                                               ###                      
     |                                              #                        
     |                                              #  ::                    
     |                                              #  : :::                 
     |                                     :::::::::#  : :  ::               
     |                                     :        #  : :  : ::             
     |                                     :        #  : :  : : :::          
     |                                     :        #  : :  : : :  ::        
     |                           :::::::::::        #  : :  : : :  : :::     
     |                           :         :        #  : :  : : :  : :  ::   
     |                       :::::         :        #  : :  : : :  : :  : :: 
     |                    @@@:   :         :        #  : :  : : :  : :  : : @
     |                  ::@  :   :         :        #  : :  : : :  : :  : : @
     |               :::: @  :   :         :        #  : :  : : :  : :  : : @
     |             :::  : @  :   :         :        #  : :  : : :  : :  : : @
     |           ::: :  : @  :   :         :        #  : :  : : :  : :  : : @
     |        :::: : :  : @  :   :         :        #  : :  : : :  : :  : : @
     |      :::  : : :  : @  :   :         :        #  : :  : : :  : :  : : @
     |   :::: :  : : :  : @  :   :         :        #  : :  : : :  : :  : : @
     | :::  : :  : : :  : @  :   :         :        #  : :  : : :  : :  : : @
   0 +----------------------------------------------------------------------->KB     0                                                                   29.48

Number of snapshots: 25
 Detailed snapshots: [9, 14 (peak), 24]
 ```

```bash 
--------------------------------------------------------------------------------
  n     time(B)     total(B) useful-heap(B) extra-heap(B)  stacks(B)
---------------------------------------------------------------------
 10      10,080       10,080         10,000            80          0
 11      12,088       12,088         12,000            88          0
 12      16,096       16,096         16,000            96          0
 13      20,104       20,104         20,000           104          0
 14      20,104       20,104         20,000           104          0
 
99.48% (20,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->49.74% (10,000B) 0x804841A: main (example.c:20)
| 
->39.79% (8,000B) 0x80483C2: g (example.c:5)
| ->19.90% (4,000B) 0x80483E2: f (example.c:11)
| | ->19.90% (4,000B) 0x8048431: main (example.c:23)
| |   
| ->19.90% (4,000B) 0x8048436: main (example.c:25)
|   
->09.95% (2,000B) 0x80483DA: f (example.c:10)
  ->09.95% (2,000B) 0x8048431: main (example.c:23)
  
  ```

### Speed profiler : gprof

For speed performence : I'm using GNU profiler `gprof` in Fortran and C/C++

* Compile with `-gp`
* exect as usual `< your_exe >`
* read `gprof <your_exe> gmont.out`

### flat profile: Time taken by each function
```bash
Each sample counts as 0.01 seconds.
  %   cumulative   self              
 time   seconds   seconds    calls     name
 58.56    219.11   219.11     2268   functions_mp_extract_grid_
 21.52    299.63    80.52      161   functions_mp_motion_boundaries_
  8.58    331.73    32.10      162   functions_mp_extrapolation_
  3.49    344.80    13.07 54495528   functions_mp_boris_rotation_
  3.38    357.44    12.64 876903620  walls_mp_periodicity_
  3.04    368.83    11.39      161   mccollisions_mp_monte_carlo_collisions_
  0.31    369.99     1.16      162   hist_diag_mp_diagnostics_run_
  0.20    370.72     0.73      162   functions_mp_particle_to_grid_
  0.18    371.40     0.68            __intel_rtc_CheckStackVars
  0.14    371.94     0.54            walls_mp_injecting_couples_
```

### Granularity : function decomposition
```bash
% time    self  children    called     name
  61.7    0.73  230.25     162       functions_mp_particle_to_grid
        219.11   11.07    2268/2268    functions_mp_extract_grid
          0.06    0.00    2428/2750    functions_mp_grid_exchange_
          0.01    0.00    2428/2750    functions_mp_edges_
```