# Understand and use peformance log files in TRUST

## How performance are measured in TRUST

In order to obtain statistics on the performance of a test case, TRUST uses counters. Each counter can be referred by a description/name and an ID. Those IDs are declared in the file stat_counters.h. They are global variables of TRUST. Some counters belong to a counter's family. Such families is used in practice to construct aggregated statistics which track similar types of operation. If a counter does not belong in a peculiar family, then Counter_family is set to (null).

A unique object of the class Statistiques is then defined. It stores data associated with counters in an object called Stats_Internals. In the Statistiques class, are also provided functions for manipulating counters.

During any numerical simulation, counters are started and then stop throughout the calculation by using respectively the functions begin_count(const Stat_Counter_Id& counter_id, bool track_comm) and end_count_(const int id, int quantity = 0, int count = 1). The end_count function also updates two variables, called count and quantity. The variable count refers by default to the number of time a counter has been called and is therfore by default incremented by one after using the function end_count. For some particular counter, the increment can be different. The variable quantity is a custom storage variable which differs from counter to counter. By default, it is set to 0. It is mostly used to store the amount of data exchanged between processors during a peculiar operation.

At the end of the calculation, two files that store statistics of the performance are created. The first one is named CASE_NAME.TU. It is created by the function print_statistics_analyse(const char * message, int mode_append). Macro and aggregated statistics of the performance of the computed test case are available in it. 
The second file is called NOM_DU_CAS_csv.TU. This file contains untreated performance data from every counter aggregated over processors and statistics aggregated over each counter family. By default, only averaged statistics on all processor are printed. For accessing the detail per processor, add 'stat_per_proc_perf_log 1' in the data file. 

This form explains how to reconstruct some aggregated statistics of the CASE_NAME.TU file from this second file. 

For clarity, a table that details for each counter its name, ID, family, quantity increment and if it is a communication counter or not can be found at the end of this form. Another table that gives count increment for counters where it is not equal to 1 is also provided.
 

### Description of the csv file
 
The performance log file CASE_NAME_csv.TU stores the statistics in a table for each counter on a specific test case. This table has the following columns: 

"Overall_simulation_step", "Processor_Number", "Counter_family", "Counter_name",  "Counter_level", "Is_comm", "%_total_time", "time_(s)", "t_min", "t_max", "t_SD", "count", "c_min", "c_max", "c_SD", "time_per_step", "tps_min", "tps_max", "tps_SD", "Quantity", "q_min", "q_max", "q_SD"

Column delimiter is a tabulation. The first four columns create a unique key for a row.

"Overall_simulation_step" can take the three following values: 
   - "Statistiques d'initialisation du calcul"
   - "Statistiques de resolution du probleme"
   - "Statistiques de post resolution". 
   
This stands respectively for the three main steps of the calculation (initialisation, resolution and post processing). 

Column Is_comm enable to know if the counter is a communication counter or not. If it is equal to 1, then it is a communication counter. Otherwise, it is not.

"%_total_time" gives the percentage of time used by a peculiar counter with respect to the total time used for the main step of calculation (initialisation, resolution and post processing).

By default, averaged statistics on all processors are also available under Processor_Number == -1. For accessing the detail per processor, add 'stat_per_proc_perf_log 1' in the data file. Moreover, if count is equal to zero, then the counter was not called during the computation.

Aggregated statistics on the counter family are also already computed and stored under the counter name : "Aggregated over family".

SD stands for standard deviation. Min and max denote the minimum and maximum value over every processor of the accounted quantity. The "time_per_step" column stores the average time elapsed in a time step on the tracked operation.


## Exemple on the simple test case Obstacle

In [None]:
from trustutils import run

run.TRUST_parameters("1.9.4_beta")

In [None]:
run.reset()
case1 = run.addCase(".","Obstacle.data",nbProcs=2)
case1.partition()
run.printCases()
run.runCases()

In [None]:
run.tablePerf()

In [None]:
import os
os.chdir(run.BUILD_DIRECTORY)

First, we need to know the number of rows of comments there are before the table starts. Those rows always start with the character #.

In [None]:
perf_file = open('PAR_Obstacle_csv.TU',"r") 

lines=perf_file.readlines()

nb_comment_lines=0

while (lines[nb_comment_lines][0]=='#'):
    nb_comment_lines +=1

print(nb_comment_lines)  

perf_file.closed


The file contains some white spaces, in order to make it readable by a human. They have to be removed so as to have a clean MultiIndex. To do so, three functions are created : strip, make_int and make_float. 

The column names are specified in the variable col_names. Here, by default, the same name as the first row of the table has been used, but you can customize it.

Then, the file is open using pandas library and converted into a DataFrame. A quick user guide for pandas library can be found at : https://pandas.pydata.org/docs/user_guide/10min.html

For opening the file, the function read.csv is used. The first argument is the path to the .csv file. Then, by using sep='\t' we specify that the delimiter of the table is a tabulation. Header = 0 enables to change properly the name of the colums from the first row to the ones contain in the variable col_names. Argument index_col specifies the number of columns used as a key for a row (this will create a MultiIndex). Converters are used to discard useless white spaces in the table.  


In [None]:
import pandas as pd
import numpy as np
import math as math

def strip(text):
    try:
        return text.strip()
    except AttributeError:
        return text

def make_int(text):
    return int(text.strip('" '))

def make_float(text):
    return float(text.strip('" '))

col_names=["Overall_simulation_step", "Processor_Number", "Counter_family", "Counter_name",  "Counter_level", "Is_comm", "%_total_time", "time_(s)", "t_min", "t_max", "t_SD", "count", "c_min", "c_max", "c_SD", "time_per_step", "tps_min", "tps_max", "tps_SD", "Quantity", "q_min", "q_max", "q_SD"]

M=pd.read_csv('PAR_Obstacle_csv.TU', sep='\t', skiprows = nb_comment_lines, header=0, names=col_names, index_col=(0,1,2,3), converters= {"Overall_simulation_step" : strip, "Processor_Number" : make_int, "Counter_family" : strip, "Counter_name" : strip, "Counter_level" : make_int, "Is_comm" : make_int, "%_total_time" : make_float, "time_(s)" : make_float, "t_min" : make_float, "t_max" : make_float, "t_SD" : make_float, "count" : make_float, "c_min" : make_float, "c_max" : make_float, "c_SD" : make_float, "time_per_step" : make_float, "tps_min" : make_float, "tps_max" : make_float, "tps_SD" : make_float, "Quantity" : make_float, "q_min" : make_float, "q_max" : make_float, "q_SD" : make_float  })


If you have a doubt in the MultiIndex, you can print it by using :

In [None]:
idx = M.index

print(M.index)


You can also have generic infromation about the DataFrame by using :

In [None]:
M.info()

An advantage of the MultiIndex is that it is quite easy to make selection on partial index. First, let's create three sub DataFrames, one for each overall simulation step :

In [None]:
M_init = M.loc[("Statistiques d'initialisation du calcul")]
M_reso = M.loc[("Statistiques de resolution du probleme")]
M_post = M.loc[("Statistiques de post resolution")]

## How to reconstruct the agregated data of the .TU file

In the .Tu file, some usefull aggregated values are already computed. In the sequel, we show how to reconstruct them easily from the .csv file. First, let's print the file CASE_NAME.TU :

In [None]:
aggregated_perf_file = open('PAR_Obstacle.TU',"r") 

lines=aggregated_perf_file.readlines()

for l in lines:
    print(l)

aggregated_perf_file.closed

First, we need to know the number of proc used for the simulation. If statistics detailed by processor are printed, you can obtain it with:

In [None]:
Nb_proc = np.max( idx.get_level_values(1) ) + 1    
print(Nb_proc)

Else, the information is also given in the second commented line of the file:

In [None]:
perf_file = open('PAR_Obstacle_csv.TU',"r") 

lines=perf_file.readlines()
print(lines[1])

for c in lines[1].split():
    if c.isdigit():
        Nb_proc=int(c)

perf_file.closed



In the sequel, we give an example for treating statistics of the resolution step. Just change the name of the matrix to change of overall step.

First, let's acces at the total time of the associated overall step:

In [None]:
total_time = np.asarray(M_reso.loc[[(-1,"(null)","Temps total")],["time_(s)"]])[0][0]
print("Total time of the resolution =",total_time, "s")

The number of time steps is simply the count of the counter "Resoudre (timestep loop)". One can acces it by using:

In [None]:
timesteps=np.asarray(M_reso.loc[[(-1,"(null)","Resoudre (timestep loop)")],["count"]])[0][0]
print('Number of time steps =', timesteps)

Under some conditions, such as a strictly positive number of time steps, most of the aggregated quantities of the CASE_NAME.TU file can be retrieved as follow:

In [None]:
solveur_Axb = np.asarray(M_reso.loc[[(-1,"(null)","SolveurSys::resoudre_systeme")],["t_max"]])[0][0]/timesteps
print('Maximum time taken by the solver of Ax =b per time step',solveur_Axb, "s")

In [None]:
solveur_diffusion_implicite = np.asarray(M_reso.loc[[(-1,"(null)","Equation_base::Gradient_conjugue_diff_impl")],["t_max"]])[0][0]/timesteps
print('Maximum time taken for solving the diffusion part per time step = ',solveur_diffusion_implicite ,"s")


In [None]:
assemblage_matrice_implicite = np.asarray(M_reso.loc[[(-1,"(null)","Assembleur::assembler")],["t_max"]])[0][0]/timesteps
print('Maximum time taken for creating the matrix per time step, when using implicit solver = ',assemblage_matrice_implicite,"s")


In [None]:
mettre_a_jour = np.asarray(M_reso.loc[[(-1,"(null)","::mettre_a_jour")],["t_max"]])[0][0]/timesteps
print('Maximum time taken for updating the matrix per time step = ', mettre_a_jour, "s")


In [None]:
update_vars = np.asarray(M_reso.loc[[(-1,"(null)","Schema_Implicite_4eqs::update_vars")],["t_max"]])[0][0]/timesteps
print(update_vars)


In [None]:
update_fields = np.asarray(M_reso.loc[[(-1,"(null)","Probleme_Diphasique_base::updateGivenFields")],["t_max"]])[0][0]/timesteps
print(update_fields)


In [None]:
operateurs_convection = np.asarray(M_reso.loc[[(-1,"(null)","Operateur_Conv::ajouter/calculer")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for computing the operator of convection =',operateurs_convection,'s')


In [None]:
operateurs_diffusion =  np.asarray(M_reso.loc[[(-1,"(null)","Operateur_Diff::ajouter/calculer")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for computing the operator of convection =',operateurs_diffusion,'s')


In [None]:
operateurs_decroissance =  np.asarray(M_reso.loc[[(-1,"(null)","Operateur_Decr::ajouter/calculer")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for computing the operator decreasing =',operateurs_decroissance,'s')          


In [None]:
operateurs_gradient = np.asarray(M_reso.loc[[(-1,"(null)","Operateur_Grad::ajouter/calculer")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for computing the gradient =',operateurs_gradient,'s')    


In [None]:
operateurs_divergence =  np.asarray(M_reso.loc[[(-1,"(null)","Operateur_Div::ajouter/calculer")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for computing the operator divergence =',operateurs_divergence,'s')         


In [None]:
operateurs_source = np.asarray(M_reso.loc[[(-1,"(null)","Source::ajouter/calculer")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for computing source terms =',operateurs_source,'s')          


In [None]:
operations_post_traitement =  np.asarray(M_reso.loc[[(-1,"(null)","Pb_base::postraiter")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for post treatment operations =',operations_post_traitement,'s')          


In [None]:
calcul_dt = np.asarray(M_reso.loc[[(-1,"(null)","Operateur::calculer_pas_de_temps")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for computing the time step =',calcul_dt,'s')          


In [None]:
modele_turbulence = np.asarray(M_reso.loc[[(-1,"(null)","ModeleTurbulence*::mettre_a_jour")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for trating the turbulence =',modele_turbulence,'s')          


In [None]:
operations_sauvegarde = np.asarray(M_reso.loc[[(-1,"(null)","Probleme_base::sauver")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for printing back-up =',operations_sauvegarde,'s')          


In [None]:
marqueur1 = np.asarray(M_reso.loc[[(-1,"(null)","m1")],["t_max"]])[0][0]/timesteps        
print('Maximum time per time step for doing the operation tracked by m1 =',marqueur1,'s')

In [None]:
marqueur2 = np.asarray(M_reso.loc[[(-1,"(null)","m2")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for doing the operation tracked by m2 =',marqueur2,'s')          


In [None]:
marqueur3 = np.asarray(M_reso.loc[[(-1,"(null)","m3")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for doing the operation tracked by m3 =',marqueur3,'s')          


In [None]:
calcul_divers = np.asarray(M_reso.loc[[(-1,"(null)","Divers")],["t_max"]])[0][0]/timesteps
print('Maximum time per time step for doing operations tracked by calcul_divers =',calcul_divers,'s')


In [None]:
Nb_echange_espace_virtuel_per_ts = np.asarray(M_reso.loc[[(-1,"(null)","DoubleVect/IntVect::echange_espace_virtuel")],["c_max"]])[0][0]/timesteps
print("Maximum of bytes exchanged per time steps between partition's domains",Nb_echange_espace_virtuel_per_ts)


In [None]:
Nb_MPI_allreduce_per_ts = np.asarray(M_reso.loc[[(-1,"MPI_allreduce","Aggregated over family")],["c_max"]])[0][0]/timesteps
print('Maximum of MPI_allreduce per time step',Nb_MPI_allreduce_per_ts)


In [None]:
Nb_solveur_per_ts = np.asarray(M_reso.loc[[(-1,"(null)","SolveurSys::resoudre_systeme")],["c_max"]])[0][0]/timesteps
print('Maximum number of solver called per time step =',Nb_solveur_per_ts)


In [None]:
if (np.asarray(M_reso.loc[[(-1,"(null)","SolveurSys::resoudre_systeme")],["c_max"]])[0][0] > 0 ):
    Secondes_per_solveur = np.asarray(M_reso.loc[[(-1,"(null)","SolveurSys::resoudre_systeme")],["t_max"]])[0][0] / np.asarray(M_reso.loc[[(-1,"(null)","SolveurSys::resoudre_systeme")],["c_max"]])[0][0]
    Iterations_per_solveur = np.asarray(M_reso.loc[[(-1,"(null)","SolveurSys::resoudre_systeme")],["q_max"]])[0][0]/np.asarray(M_reso.loc[[(-1,"(null)","SolveurSys::resoudre_systeme")],["c_max"]])[0][0]
    print('Maximum time per step used for solving the inversion problem =',Secondes_per_solveur)
    print('Maximum iteration number for solving the inversion problem = ',Iterations_per_solveur)

In [None]:
if (np.asarray(M_reso.loc[[(-1,"(null)","Probleme_base::sauver")],["c_max"]])[0][0]>0):
    Nb_sauvegardes = np.asarray(M_reso.loc[[(-1,"(null)","Probleme_base::sauver")],["c_max"]])
    Data_per_sauvegarde = (Nb_proc*np.asarray(M_reso.loc[[(-1,"(null)","Probleme_base::sauver")],["Quantity"]]))[0][0]/(np.asarray(M_reso.loc[[(-1,"(null)","Probleme_base::sauver")],["c_max"]])[0][0] * 1024 *1024)
    print('Maximum number of back-up created :', Nb_sauvegardes)
    print('Average bytes stored per back-up :', Data_per_sauvegarde)

In [None]:
if (np.asarray(M_reso.loc[[(-1,"GPU_copy","GPU_copyToDevice")],["c_max"]])>0):
    GPU_libraries = np.asarray(M_reso.loc[[(-1,"GPU_library","GPU_library")],["c_max"]])[0][0]/np.asarray(M_reso.loc[[(-1,"GPU_library","GPU_library")],["t_max"]])[0][0]/1024/1024/1024
    GPU_kernel = np.asarray(M_reso.loc[[(-1,"GPU_kernel","GPU_kernel")],["c_max"]])[0][0]/np.asarray(M_reso.loc[[(-1,"GPU_kernel","GPU_kernel")],["t_max"]])[0][0]/1024/1024/1024
    Copy_H2D = np.asarray(M_reso.loc[[(-1,"GPU_copy","GPU_copyToDevice")],["c_max"]])[0][0]
    print(GPU_libraries)
    print(GPU_kernel)
    print(Copy_H2D)


In [None]:
if (np.asarray(M_reso.loc[[(-1,"IO","write")],["time_(s)"]]) >0):
    Debit_write_seq = Nb_proc*np.asarray(M_reso.loc[[(-1,"IO","write")],["Quantity"]])[0][0]/np.asarray(M_reso.loc[[(-1,"IO","write")],["t_max"]])[0][0]/1024/1024
    print('Sequential writing output :',Debit_write_seq, 'Mo/s')


In [None]:
if (np.asarray(M_reso.loc[[(-1,"IO","MPI_File_write_all")],["time_(s)"]])[0][0] >0):
    Debit_write_par = Nb_proc*np.asarray(M_reso.loc[[(-1,"IO","MPI_File_write_all")],["Quantity"]])/np.asarray(M_reso.loc[[(-1,"IO","MPI_File_write_all")],["time_(s)"]])/1024/1024
    print('Parallel writing output :',Debit_write_par, 'Mo/s')


In [None]:
if (np.asarray(M_reso.loc[[(-1,"MPI_sendrecv","Aggregated over family")],["c_max"]])> 0):
    Communications_avg = 0.1 * math.floor(1000* ( np.asarray(M_reso.loc[[(-1,"MPI_sendrecv","Aggregated over family")],["time_(s)"]])[0][0] + np.asarray(M_reso.loc[[(-1,"MPI_allreduce","Aggregated over family")],["time_(s)"]])[0][0] )/ (total_time + 0.001) )
    Communications_max = 0.1 * math.floor(1000* ( np.asarray(M_reso.loc[[(-1,"MPI_sendrecv","Aggregated over family")],["t_max"]])[0][0] + np.asarray(M_reso.loc[[(-1,"MPI_allreduce","Aggregated over family")],["t_max"]])[0][0] )/ (total_time + 0.001) )
    Communications_min = 0.1 * math.floor(1000* ( np.asarray(M_reso.loc[[(-1,"MPI_sendrecv","Aggregated over family")],["t_min"]])[0][0] + np.asarray(M_reso.loc[[(-1,"MPI_allreduce","Aggregated over family")],["t_min"]])[0][0] )/ (total_time + 0.001) )
    print('Average percent of time of communications = ',Communications_avg)
    print('Minimum percent of time of communications = ',Communications_min)
    print('Maximum percent of time of communications = ',Communications_max)
    if (np.asarray(M_reso.loc[[(-1,"MPI_sendrecv", "MPI_send_recv")],["time_(s)"]]) > 0):
        max_bandwidth = 1.0e-6 * np.asarray(M_reso.loc[[(-1,"MPI_sendrecv", "MPI_send_recv")],["q_max"]])[0][0]/np.asarray(M_reso.loc[[(-1,"MPI_sendrecv", "MPI_send_recv")],["t_min"]])[0][0]
        print('Evaluation of the maximum badnwidth = ',max_bandwidth, ' MB/s')
    Total_network_traffic = Nb_proc * 1.0e-6 * np.asarray(M_reso.loc[[(-1,"MPI_sendrecv","Aggregated over family")],["Quantity"]])[0][0] / timesteps
    Average_message_size = np.asarray(M_reso.loc[[(-1,"MPI_sendrecv","Aggregated over family")],["Quantity"]])[0][0] / np.asarray(M_reso.loc[[(-1,"MPI_sendrecv","Aggregated over family")],["count"]])[0][0]
    print('Average total size of send messages per time step = ',Total_network_traffic, ' B per time step')
    print('Average size of send messages per exchange = ',Average_message_size, ' MB/s')

# Table of the counters in TRUST


| Counter_Id_number | Counter_ID | Counter name | Counter family | Is comm | Quantity increment|
|:--------:| :--------:|:--------:|:--------:|:--------:|:--------:|
| 0 | temps_total_execution_counter_ | Temps total | Null | 0 | 0 |
| 1 | initialisation_calcul_counter_ | Preparer calcul | Null | 0 | 0 |
| 2 | timestep_counter_ | Resoudre (timestep loop) | Null | 0 | 0 |
| 3 | solv_sys_counter_ | SolveurSys::resoudre_systeme | Null | 0 | 0 |
| 4 | solv_sys_petsc_counter_ | Solveurpetsc::resoudre_systeme | Null | 0 | 1 |
| 5 | diffusion_implicite_counter_ | Equation_base::Gradient_conjugue_diff_impl | Null | 0 | 0 |
| 6 | dt_counter_ | Operateur::calculer_pas_de_temps | Null | 0 | 0 | 
| 7 | nut_counter_ | ModeleTurbulence*::mettre_a_jour | Null | 0 | 0| 
| 8 | convection_counter_ | Operateur_Conv::ajouter/calculer | Null | 0 | 0 |
| 9 | diffusion_counter_ | Operateur_Diff::ajouter/calculer | Null | 0 | 0 |
| 10 | decay_counter_ | Operateur_Decr::ajouter/calculer | Null | 0 | 0 |
| 11 | gradient_counter_ | Operateur_Grad::ajouter/calculer | Null | 0 | 0 |
| 12 | divergence_counter_ | Operateur_Div::ajouter/calculer | Null | 0 | 0 |
| 13 | source_counter_ | Source::ajouter/calculer | Null | 0 | 0 |
| 14 | postraitement_counter_ | Pb_base::postraiter | Null | 0 | 0 |
| 15 | sauvegarde_counter_ | Probleme_base::sauver | Null | 0 | number of bytes saved when using the function int Probleme_base::sauvegarder(Sortie& os) |
| 16 | temporary_counter_ | temporary | Null | 0 | 0 |
| 17 | assemblage_sys_counter_ | Assembleur::assembler | Null | 0 | 0 |
| 18 | update_vars_counter_ | Schema_Implicite_4eqs::update_vars | Null | 0 | 0 |
| 19 | update_fields_counter_ | Probleme_Diphasique_base::updateGivenFields | Null | 0 | 0 |
| 20 | mettre_a_jour_counter_ | ::mettre_a_jour | Null | 0 | 0
| 21 | divers_counter_ | Divers | Null | 0 | 0 |
| 22 | m1_counter_ | m1 | Null | 0 | 0 |
| 23 | m2_counter_ | m2 | Null | 0 | 0 |
| 24 | m3_counter_ | m3 | Null | 0 | 0 |
| 25 | probleme_fluide_ | pb_fluide | Null | 0 | 0 |
| 26 | probleme_combustible_ | pb_combustible | Null | 0 | 0 |
| 27 | echange_vect_counter_ | DoubleVect/IntVect::echange_espace_virtuel | 0 | 1 | 0 |
| 28 | mpi_sendrecv_counter_  | MPI_send_recv" | "MPI_sendrecv" | 1 | size of the exchanged data in bytes | 
| 29 | mpi_send_counter_ | MPI_send | MPI_sendrecv | 1 | size of the exchanged data in bytes | 
| 30 | mpi_recv_counter_ | MPI_recv | MPI_sendrecv | 1 | size of the exchanged data in bytes |
| 31 | mpi_bcast_counter_ | MPI_broadcast | MPI_sendrecv | 1 | size of the exchanged data in bytes |
| 32 | mpi_alltoall_counter_ | MPI_alltoall | MPI_sendrecv | 1 size of the exchanged data in bytes |
| 33 | mpi_allgather_counter_ | MPI_allgather | MPI_sendrecv | 1 | size of the exchanged data in bytes |
| 34 | mpi_gather_counter_ | MPI_gather | MPI_sendrecv | 1 | size of the exchanged data in bytes |
| 35 | mpi_partialsum_counter_ | MPI_partialsum | MPI_allreduce | 1 | 0 |
| 36 | mpi_sumdouble_counter_ | MPI_sumdouble | MPI_allreduce | 1 | 0 |
| 37 | mpi_mindouble_counter_ | MPI_mindouble | MPI_allreduce | 1 | 0 |
| 38 | mpi_maxdouble_counter_ | MPI_maxdouble | MPI_allreduce | 1 | 0 |
| 39 | mpi_sumfloat_counter_ | MPI_sumfloat | MPI_allreduce | 1 | 0 |
| 40 | mpi_minfloat_counter_ | MPI_minfloat | MPI_allreduce | 1 | 0 |
| 41 | mpi_maxfloat_counter_  | MPI_maxfloat | MPI_allreduce | 1 | 0 |
| 42 | mpi_sumint_counter_ | MPI_sumint | MPI_allreduce | 1 | 0 |
| 43 | mpi_minint_counter_ | MPI_minint | MPI_allreduce | 1 | 0 |
| 44 | mpi_maxint_counter_ | MPI_maxint | MPI_allreduce | 1 | 0 |
| 45 | mpi_barrier_counter_ | MPI_barrier | MPI_allreduce | 1 | 0 |
| 46 | gpu_library_counter_  | GPU_library | GPU_library | 0 | 0 |
| 47 | gpu_kernel_counter_  | GPU_kernel | GPU_kernel | 0 | 0 |
| 48 | gpu_copytodevice_counter_ | GPU_copyToDevice | GPU_copy | 0 | Size of the exchanged data |
| 49 | gpu_copyfromdevice_counter_ | GPU_copyFromDevice | GPU_copy | 0 | 0 if called in Solv_Petsc::Update_solution(DoubleVect& solution) , number of rows of the local TRUST Matrix if used in Solv_rocALUTION::resoudre_systeme(const Matrice_Base& a, const DoubleVect& b, DoubleVect& x) |
| 50 | IO_EcrireFicPartageMPIIO_counter_ | MPI_File_write_all | IO | 0 | size of written data | 
| 51 | IO_EcrireFicPartageBin_counter_ | write | IO | 0 | buffer size of the accumulated data on each processor when using syncfile()|
| 52 | interprete_scatter_counter_ | Scatter | Null | 0 | 0 |

Moreover, two counters have custom count increment :

| Counter_Id_number | Counter_ID | Counter name | count custom increment|
|:--------:| :--------:|:--------:|:--------:|
| 28 | mpi_sendrecv_counter_  | MPI_send_recv" | mpi_nrequests_ |
| 47 | gpu_kernel_counter_  | GPU_kernel | number of exchange | 

    