You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Presently the information of state of CPU resource usage is available through box info files updated by each FU in ramdisk. It was proposed that BU hltd should instead summarize this into a number of available resources and provide to consumers (BU application).
In the updated version, a file /fff/ramdisk/appliance/resource_summary (JSON file) is written, containing also other summmarized information (taking care that it is taken from box files updated within last 10s). For example:
{
"ramdisk_occupancy": 0.32000000000000001,
"active_resources": 1,
"activeFURun": 127042,
"activeRunNumQueuedLS": 0,
"broken": 0,
"idle": 0,
"used": 1,
"cloud": 0
}
ramdisk_occupancy is ration between used and total size of ramdisk partition
active_resources - sum of idle and used resources in FUs
activeFURun: most recent run found in all active_runs boxinfo files
activeRunNumQueuedLS - worst-case number of lumisection data sitting in anelastic.py queue on FUs.
This indicates number of EoLS files found in queue in anelastic.py, which is used to store inotify file events before they are handled by the script. High value can indicate problems in disk IO or NFS file copying to BU. Value is -1 if there is no FU active run or the script is not initialized yet. Value is only taken from FUs with the same last active run as indicated in the summary.
broken/idle/used/cloud summarize core resources in more detail
The text was updated successfully, but these errors were encountered:
We have already switched to using resource_summary file.
Current version contains
"activeRunCMSSWMaxLS" - integer, initial value: -1
(this will show max LS seen by any CMSSW once initialized, needs version >=7_3_2_patch5)
hltd 1.7 will add:
"stale_resources - integer, default value: 0
this will be 0 unless we detetect lag or problems with updating files on ramdisk via data network (assuming box files are updated through control network).
In case of problems, resources will be counted here instead as active_resources and those that are acounted in it: idle,used,broken). "cloud" resources are never counted as stale.
Presently the information of state of CPU resource usage is available through box info files updated by each FU in ramdisk. It was proposed that BU hltd should instead summarize this into a number of available resources and provide to consumers (BU application).
In the updated version, a file /fff/ramdisk/appliance/resource_summary (JSON file) is written, containing also other summmarized information (taking care that it is taken from box files updated within last 10s). For example:
{
"ramdisk_occupancy": 0.32000000000000001,
"active_resources": 1,
"activeFURun": 127042,
"activeRunNumQueuedLS": 0,
"broken": 0,
"idle": 0,
"used": 1,
"cloud": 0
}
This indicates number of EoLS files found in queue in anelastic.py, which is used to store inotify file events before they are handled by the script. High value can indicate problems in disk IO or NFS file copying to BU. Value is -1 if there is no FU active run or the script is not initialized yet. Value is only taken from FUs with the same last active run as indicated in the summary.
The text was updated successfully, but these errors were encountered: