New Monitoring

New Roles

DeepSea now knows the prometheus and grafana roles and deploys the monitoring stack accordingly. This allows for e.g. multiple prometheus instances for an HA setup. The prometheus role also includes the alertmanager.

There is also code to remove Prometheus and Grafana from nodes that do not have the respective roles. If you have a Prometheus/Grafana installation that is managed outside of DeepSea on DeepSea minions, make sure to add rescind-[prometheus|grafana]: default-nop to your pillar, otherwise DeepSea will remove your installation.

Pillar variables

The pillar variables below are available to all nodes by default. Either alter the global in /srv/pillar/ceph/stack/global.yml to change the pillar variables for all nodes or alter /srv/pillar/ceph/stack/<cluster_name>/minions/<host> if specific minion configs should be altered. Refer to stack pillar doc

monitoring:
  alertmanager:
    config: salt://path/to/config
    additional_flags: ''
  grafana:
    ssl_cert: False # self-signed certs are created by default
    ssl_key: False # self-signed certs are created by default
  prometheus:
    # pass additional configration to prometheus
    additional_flags: ''
    alert_relabel_config: []
    rule_files: []
    # per exporter config variables
    scrape_interval:
      ceph: 10
      node_exporter: 10
      prometheus: 10
      grafana: 10
    relabel_config:
      alertmanager: []
      ceph: []
      node_exporter: []
      prometheus: []
      grafana: []
    metric_relabel_config:
      ceph: []
      node_exporter: []
      prometheus: []
      grafana: []
    target_partition:
      ceph: '1/1'
      node_exporter: '1/1'
      prometheus: '1/1'
      grafana: '1/1'

Prometheus

The exporter based configuration that can be passed through the pillar. These groups map to exporters that provide data. The node exporter is present on all nodes, ceph is exported by the mgr nodes, prometheus and grafana is exported by the respective prometheus and grafana nodes.

scrape_interval: change the scrape interval, how often an exporter is to be scraped.
target_partition: When multiple prometheus instances are deployed it can be desirable to partition scrape targets and have some instances scrape only part of all exporter instances (currently this is only implemented for node_exporter targets). Say there're two prometheus instances and the available node_exporter targets should be divided between them. Configure the pillar so that on instance sees monitoring:prometheus:target_partition:node_exporter:'1/2' while the other sees monitoring:prometheus:target_partition:node_exporter:'2/2. A prometheus instance seeing 0/X in its pillar will remove all scrape targets of that kind.

relabel_config and metrics_relabel_config: Refer to the prometheus documentation

Alertmanager

config: As the alertmanager config contains only user specific configuration, we rely on the user to provide a alertmanager config in the pillar. The location of the file should be accessible by salts salt:// file server url, for instance srv/salt/ceph/monitoring/alertmanager/files/myconfig.yml would translate to salt://ceph/monitoring/alertmanager/files/myconfig.yml as the pillar content. DeepSea will then take this file and deploy it. If the pillar variable is not set, DeepSea will only ensure that there is a file. That can either be the default config file installed by the rpm or a user managed file.
additional_flags: DeepSea will create the needed --cluster.peer flags for a highly available alertmanager setup (if more then one node has the prometheus role). If you want to pass additional flags (see prometheus-alertmanager --help for available flags), list them as a spaces-separated string in this pillar variable.