-
Notifications
You must be signed in to change notification settings - Fork 0
Darshan LDMS Metric Definitions
Below is a table of the current metrics being collected and published to LDMS streams. This is likely to change depending on the end users wants or needs.
Type Of Data : To reduce the size of JSON messages published to an LDMS streams daemon, any meta data metrics (i.e. MET) are replaced with "N/A" when type=MOD
.
Constant: This defines which metrics will not change throughout the entire application run unless the darshanConnector is reconnected/restarted.
Metric Name | Definition | Type Of Data | Constant |
---|---|---|---|
schema | Schema name of the data collected by the darshanConnector. This is an LDMS related metric and is only used for storing the data to the correct location in DSOS. | MET | Yes |
job_id | The Job ID of the application run. | MOD | Yes |
uid | User ID of the job run. | MOD | Yes |
exe | Full path to the application executable. Only set to the full path when the type metric is set to MET . Otherwise it is set to N/A. |
MET | Yes |
ProducerName | Name of the compute node the application is running on. | MOD | Yes |
file | Path to the filename of the I/O operations. Only set to the full path when the type metric is set to MET . Otherwise it is set to N/A. |
MET | No |
record_id | Darshan file record ID of the file the dataset belongs to. | MOD | No |
module | Name of the Darshan module data being collected. | MOD | No |
switches | Number of times access alternated between read and write. | MOD | No |
rank | Rank of the processes at I/O | MOD | No |
flushes | Number of times the flush operation was performed. For H5F and H5D it is the HDF5 file flush and dataset flush operation counts, respectively. | MOD | No |
max_byte | Highest offset byte read and written (i.e. Darshan's <MODULE>_MAX_BYTE_* parameter). |
MOD | No |
type | The type of json data being published. It is either set to MOD for gathering module data or MET for gathering static meta data (i.e. record id, rank ,etc.) |
MOD | No |
op | Type of operation being performed (i.e. read, open, close, write). | MOD | No |
cnt | The count of the operations (op field) performed per module per rank. Resets to 0 after each close operation. |
MOD | No |
seg | Contains the following array metrics from the operation (op field) |
MOD | No |
seg:pt_sel | HDF5 number of different access selections. | MOD | No |
seg:reg_hslab | HDF5 number of regular hyperslabs. | MOD | No |
seg:irreg_hslab | HDF5 number of irregular hyperslabs. | MOD | No |
seg:ndims | HDF5 number of dimensions in dataset's dataspace. | MOD | No |
seg:npoints | HDF5 number of points in dataset's dataspace. | MOD | No |
seg:off | Cumulative total bytes read and cumulative total bytes written, respectively, for each module per rank. (i.e. Darshan's "offset" DXT parameter) | MOD | No |
seg:len | Number of bytes read/written for the given operation per rank. | MOD | No |
seg:start | Start time (seconds) of each I/O operation performed for the given rank | MOD | No |
seg:dur | Duration of each operation performed for the given rank. (i.e. a rank takes "X" time to perform a r/w/o/c operation.) | MOD | No |
seg:total | Cumulative time since the application run after the I/O operation (i.e. start of application + dur) | MOD | No |
seg:timestamp | End time of given operation (i.e. op field) for the given rank (i.e. rank field). In epoch time. |
MOD | No |
The environment variable export ENABLE_LDMS_EXTRA=
in the set-darshan-<\machine-name>.sh file is not part of the SOS store and has not been completely tested yet
so please avoid enabling this variable.
- This functionality will not be included in the final darshanConnector code pull request. It may only be added later on if there is demand for these metrics. Also, more "extra" parameters might be added.
-
The current plan is to create a new json string with the following Darshan metrics and publish to a new streams:
-
module:
Name of the Darshan module data being collected. -
ProducerName:
Name of the compute node the application is running on. -
file:
Path to the filename of the I/O operations. Only set to the full path when the "type" metric is set to "MET". Otherwise it is set to N/A. -
rank:
Rank of the processes at I/O -
record_id:
Darshan file record ID of the file the dataset belongs to. -
type:
The type of json data being published. It is either set to MOD for gathering "module" data or MET for gathering static "meta" data (i.e. record id, rank ,etc.) -
job_id:
The Job ID of the application run. -
fast_rank:
Fastest rank calculated of the application run. -
fast_rank_tm:
Fastest rank time calculated of the application run. -
slow_rank:
Slowest rank calculated of the application run. -
slow_rank_tm:
Slowest rank time calculated of the application run. -
op:
Type of operation being performed (i.e. read, open, close, write). -
seg:
Following array metrics contain the histogram of total sizes for read cand write operations.
100_1K 1K_10K 10K_100K 100K_1M 1M_4M 4M_10M 10M_100M 100M_1G 1G_PLUS