Skip to content

Graphic operational monitoring

Marco Roda edited this page Jun 2, 2023 · 4 revisions

Opmon has the capability to broadcast the metrics and the ERS logs (all severity levels) to a database. From here, a graphic user interface based on Grafana can be used to monitor the status of the system.

In general, the receiving database and the Grafana server can be setup anywhere using pocket. But running from CERN, it's possible use the system installed there. What is left to the user is simply to pass the information to the configuration python script on where to broadcast the data. This is simply done using a block like

  "boot": {
    "disable_trace": true,
    "opmon_impl": "cern",
    "ers_impl": "cern"
  },

in the json fed to daqconf. These instructions focus on the CERN variant of the implementation, that can be used from any server in the CERN domain, not only np04 machines.

Access to the CERN Grafana server

From your own machine, outside the CERN domain, the first thing to do is to open a ssh tunnel. This can be achieved simply by

ssh -N -D 8080 <your_cern_username>@lxtunnel.cern.ch

You will be required your password, then the command will hang there. It is correct.

The next thing to do is to configure your browser with a SOCKS proxy. I found that Firefox has the best support for this feature as you simply need to install the plugin foxyproxy and the system will run easily. Whatever solution you use to add the proxy, the settings will be

host: localhost port: 8080 no usernames or password should be required [note from Kurt: choose Proxy Type of SOCKS5, and ensure that the "Send DNS through SOCKS5 proxy" selection is ON] The last thing to do is to connect to the CERN grafana server at the address http://np04-srv-009:3000/ via the configured browser.

The grafana instance is configured to prompt you directly at the main DAQ dashboard. If you want to be able to edit the dashboards, you need credentials: instructions are here.

Grafana dashboards

Once you log in you should see something like DAQ overview

From here you can just select your partition and start monitoring your processes. The dashboard can be open either before or after you started the process: it does not matter because the intermediate database is always online.

The whole thing is rather intuitive yet documentation is available here.

Your own process can be monitored and disentangled from all others as each one of your metric has a specific name that depend on the partition, application, etc. If a dashboard is created well enough, variables can be used to identify the partitions, the application, etc. There is no need to change the dashboard every time a user changes the partition.

Dashboards can be saved and shared as json files. They are stored in a dedicated repository: grafana dashboards.

Useful options for configuration

It is possible to configure how often the system publishes metric values. This can be done by changing the boot.json file created by daqconf. The section env can contain a variable

"DUNEDAQ_OPMON_INTERVAL": 2

That is the number of second between get_info() calls. The default is 10 (seconds).

Credentials for the CERN grafana

Credentials for grafana are necessary in case you want editing privileges.

The username and password for this server has to be given to you. Pop a message to #np04-daq-integration slack channel and someone will email you the email to set the credential. You will receive an email with a link to create your account. It is possible that the link you will receive will start with localhost: in that case the link might not work. Just copy it in your browser with the SOCKS proxy, and replace localhost with np04-srv-009:, it should work.

Clone this wiki locally