Resource summary file on BU #59

smorovic · 2015-01-21T17:17:24Z

Presently the information of state of CPU resource usage is available through box info files updated by each FU in ramdisk. It was proposed that BU hltd should instead summarize this into a number of available resources and provide to consumers (BU application).

In the updated version, a file /fff/ramdisk/appliance/resource_summary (JSON file) is written, containing also other summmarized information (taking care that it is taken from box files updated within last 10s). For example:
{
"ramdisk_occupancy": 0.32000000000000001,
"active_resources": 1,
"activeFURun": 127042,
"activeRunNumQueuedLS": 0,
"broken": 0,
"idle": 0,
"used": 1,
"cloud": 0
}

ramdisk_occupancy is ration between used and total size of ramdisk partition
active_resources - sum of idle and used resources in FUs
activeFURun: most recent run found in all active_runs boxinfo files
activeRunNumQueuedLS - worst-case number of lumisection data sitting in anelastic.py queue on FUs.
This indicates number of EoLS files found in queue in anelastic.py, which is used to store inotify file events before they are handled by the script. High value can indicate problems in disk IO or NFS file copying to BU. Value is -1 if there is no FU active run or the script is not initialized yet. Value is only taken from FUs with the same last active run as indicated in the summary.
broken/idle/used/cloud summarize core resources in more detail

smorovic · 2015-03-20T16:24:56Z

We have already switched to using resource_summary file.

Current version contains
"activeRunCMSSWMaxLS" - integer, initial value: -1
(this will show max LS seen by any CMSSW once initialized, needs version >=7_3_2_patch5)

hltd 1.7 will add:
"stale_resources - integer, default value: 0

this will be 0 unless we detetect lag or problems with updating files on ramdisk via data network (assuming box files are updated through control network).
In case of problems, resources will be counted here instead as active_resources and those that are acounted in it: idle,used,broken). "cloud" resources are never counted as stale.

smorovic added the enhancement label Jan 21, 2015

smorovic self-assigned this Jan 23, 2015

smorovic closed this as completed Mar 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource summary file on BU #59

Resource summary file on BU #59

smorovic commented Jan 21, 2015

smorovic commented Mar 20, 2015

Resource summary file on BU #59

Resource summary file on BU #59

Comments

smorovic commented Jan 21, 2015

smorovic commented Mar 20, 2015