Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement CEPH OSD topology #75

Closed
wants to merge 10 commits into from

Conversation

cristicalin
Copy link
Contributor

@cristicalin cristicalin commented Nov 8, 2017

This PR adds a new metric called ceph_osd_exists which is computed from the ceph osd tree output.
This metric has a topology label which is composed of the tree topology (not sorted). This sort of implements the requirement from #40

The resulting metric is a bit messy at the moment and needs metric_relabel_configs on the prometheus side to be cleanly stored.

The idea behind this is to have a lookup metric we can use with the label_values function to extract templated variables for use by grafana.

Here is my job definition with an example metric_relabel_config:

  - job_name: 'ceph-exporter'
    static_configs:
      - targets: ['172.18.0.2:9128']
        labels:
          alias: ceph-exporter
    metric_relabel_configs:
      - source_labels: [topology]
        regex:         '.*,root=([\w\-]+),.*'
        target_label:  'root'
        replacement:   '$1'
      - source_labels: [topology]
        regex:         '.*,room=([\w\-]+),.*'
        target_label:  'room'
        replacement:   '$1'
      - source_labels: [topology]
        regex:         '.*,host=([\w\-]+),.*'
        target_label:  'host'
        replacement:   '$1'
      - regex: 'topology'
        action: labeldrop

I'm currently asking for some feedback on the approach thus the [WIP] in the name.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.8%) to 85.323% when pulling 71736a5 on cristicalin:add_osd_tree into ac79cf7 on digitalocean:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.8%) to 85.323% when pulling 0bc08b3 on cristicalin:add_osd_tree into ac79cf7 on digitalocean:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.0%) to 85.115% when pulling 5735461 on cristicalin:add_osd_tree into ac79cf7 on digitalocean:master.

OSDExists: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: cephNamespace,
Name: "osd_exists",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just name this osd_tree. That way it will be easier to understand what it means. We can also add more labels to it in future if need be.

@@ -500,6 +533,79 @@ func (o *OSDCollector) collectOSDDump() error {

}

func (o *OSDCollector) collectOSDTree() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to throw a quick test for collectOSDTree()? There are examples in osd_test.go which should help provide a sample.

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.0%) to 85.115% when pulling 2e50ab7 on cristicalin:add_osd_tree into ac79cf7 on digitalocean:master.

... to simplify function calls in the recursive walker

[ci skip]
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.9%) to 85.218% when pulling 2de1cfe on cristicalin:add_osd_tree into c41f6ae on digitalocean:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.9%) to 85.209% when pulling 40433e5 on cristicalin:add_osd_tree into c41f6ae on digitalocean:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.5%) to 84.552% when pulling 7454d50 on cristicalin:add_osd_tree into c41f6ae on digitalocean:master.

@cristicalin
Copy link
Contributor Author

I changed the approach to fetch the topology in the beginning of the run.
The compiled topology is then used for each osd specific metric.
The result of this approach is that OSD metrics will be duplicated in the output if the crushmap has multiple overlapping roots.
This also breaks the tests so still WIP.

Again comments are welcome to validate the approach before I ask for a merge.

@neurodrone neurodrone closed this Jun 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants