Skip to content

Darshan LDMS Metric Definitions

Sara W edited this page Mar 13, 2024 · 11 revisions

Metric Definitions

Below is a table of the current metrics being collected and published to LDMS streams. This is likely to change depending on the end users wants or needs.

Type Of Data : To reduce the size of JSON messages published to an LDMS streams daemon, any meta data metrics (i.e. MET) are replaced with "N/A" when type=MOD.

Constant: This defines which metrics will not change throughout the entire application run unless the darshanConnector is reconnected/restarted.

Metric Name Definition Type Of Data Constant
schema Schema name of the data collected by the darshanConnector. This is an LDMS related metric and is only used for storing the data to the correct location in DSOS. MET Yes
job_id The Job ID of the application run. MOD Yes
uid User ID of the job run. MOD Yes
exe Full path to the application executable. Only set to the full path when the type metric is set to MET. Otherwise it is set to N/A. MET Yes
ProducerName Name of the compute node the application is running on. MOD Yes
file Path to the filename of the I/O operations. Only set to the full path when the type metric is set to MET. Otherwise it is set to N/A. MET No
record_id Darshan file record ID of the file the dataset belongs to. MOD No
module Name of the Darshan module data being collected. MOD No
switches Number of times access alternated between read and write. MOD No
rank Rank of the processes at I/O MOD No
flushes Number of times the flush operation was performed. For H5F and H5D it is the HDF5 file flush and dataset flush operation counts, respectively. MOD No
max_byte Highest offset byte read and written (i.e. Darshan's <MODULE>_MAX_BYTE_* parameter). MOD No
type The type of json data being published. It is either set to MOD for gathering module data or MET for gathering static meta data (i.e. record id, rank ,etc.) MOD No
op Type of operation being performed (i.e. read, open, close, write). MOD No
cnt The count of the operations (op field) performed per module per rank. Resets to 0 after each close operation. MOD No
seg Contains the following array metrics from the operation (op field) MOD No
seg:pt_sel HDF5 number of different access selections. MOD No
seg:reg_hslab HDF5 number of regular hyperslabs. MOD No
seg:irreg_hslab HDF5 number of irregular hyperslabs. MOD No
seg:ndims HDF5 number of dimensions in dataset's dataspace. MOD No
seg:npoints HDF5 number of points in dataset's dataspace. MOD No
seg:off Cumulative total bytes read and cumulative total bytes written, respectively, for each module per rank. (i.e. Darshan's "offset" DXT parameter) MOD No
seg:len Number of bytes read/written for the given operation per rank. MOD No
seg:start Start time (seconds) of each I/O operation performed for the given rank MOD No
seg:dur Duration of each operation performed for the given rank. (i.e. a rank takes "X" time to perform a r/w/o/c operation.) MOD No
seg:total Cumulative time since the application run after the I/O operation (i.e. start of application + dur) MOD No
seg:timestamp End time of given operation (i.e. op field) for the given rank (i.e. rank field). In epoch time. MOD No

SECTION BELOW IS OUTDATED AS OF 1/12/2024 - DO NOT REFER TO THIS!!!

Extra Parameters

The environment variable export ENABLE_LDMS_EXTRA= in the set-darshan-<\machine-name>.sh file is not part of the SOS store and has not been completely tested yet so please avoid enabling this variable.

  • This functionality will not be included in the final darshanConnector code pull request. It may only be added later on if there is demand for these metrics. Also, more "extra" parameters might be added.

For Documentation Purposes Only

  • The current plan is to create a new json string with the following Darshan metrics and publish to a new streams:

  • module: Name of the Darshan module data being collected.

  • ProducerName: Name of the compute node the application is running on.

  • file: Path to the filename of the I/O operations. Only set to the full path when the "type" metric is set to "MET". Otherwise it is set to N/A.

  • rank: Rank of the processes at I/O

  • record_id: Darshan file record ID of the file the dataset belongs to.

  • type: The type of json data being published. It is either set to MOD for gathering "module" data or MET for gathering static "meta" data (i.e. record id, rank ,etc.)

  • job_id: The Job ID of the application run.

  • fast_rank: Fastest rank calculated of the application run.

  • fast_rank_tm: Fastest rank time calculated of the application run.

  • slow_rank: Slowest rank calculated of the application run.

  • slow_rank_tm: Slowest rank time calculated of the application run.

  • op: Type of operation being performed (i.e. read, open, close, write).

  • seg: Following array metrics contain the histogram of total sizes for read cand write operations.

100_1K
1K_10K
10K_100K
100K_1M
1M_4M
4M_10M
10M_100M
100M_1G
1G_PLUS