Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elastic - add missing metrics #2758

Merged
merged 1 commit into from Sep 6, 2016

Conversation

mdelaney
Copy link
Contributor

What does this PR do?

This adds a bunch of metrics that are missing. I've tracked down the versions of Elasticsearch that each new metric was added, so this should be useful regardless of what version of ES you run.

It's worth noting that there are definitely more metrics missing (primary shard level metrics in particular). If we end up needing these metrics I'll create another pull request.

Motivation

We need these metrics to accurately see what's happening inside our production clusters. DataDog support advised me that there was no specific timeline to getting these metrics added so I decided to fix it myself and contribute so others can benefit.

Additional Notes

I wasn't able to get the tests to pass locally. I've set up a single Elasticsearch node locally running on the correct port and tests fail (with or without my changes). There seem to be some issues in the tests for the elastic check that should probably be addressed at some point (among which is expecting a cluster to be yellow, this seems wrong). At any rate, I've confirmed these to be working in a live environment (we're actively using this in our production systems now). I don't really have time to debug the full test script but if there is any advise on how to get these tests passing I'd appreciate it. :-)

@remh remh self-assigned this Aug 12, 2016
@remh remh added this to the 5.9.0 milestone Aug 12, 2016
@@ -227,14 +244,75 @@ class ESCheck(AgentCheck):
}

ADDITIONAL_METRICS_POST_1_4_0 = {
"elasticsearch.indices.indexing.throttle_time_in_millis": ("guage", "indices.indexing.throttle_time_in_millis", lambda v: float(v)/1000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos everywhere here :) guage -> gauge

@remh
Copy link
Contributor

remh commented Aug 12, 2016

Thanks! Looks like tests are passing on travis.

Could you fix the tipos btw ?

@mdelaney mdelaney force-pushed the add-missing-elasticsearch-metrics branch from a30d3e0 to bf9ef19 Compare August 12, 2016 22:51
@mdelaney
Copy link
Contributor Author

gah, sorry about the typos! Fixed :)

@truthbk
Copy link
Member

truthbk commented Aug 19, 2016

@mdelaney there's a couple elasticsearch tests failing. Versions 2.0.2 and 2.1.1 not totally sure if they're related to your changes. Will re-run tests once more, if they fail again, please take a look.

"elasticsearch.indices.segments.doc_values_memory_in_bytes": ("gauge", "indices.segments.doc_values_memory_in_bytes"),
"elasticsearch.indices.segments.norms_memory_in_bytes": ("gauge", "indices.segments.norms_memory_in_bytes"),
"elasticsearch.indices.segments.stored_fields_memory_in_bytes": ("gauge", "indices.segments.stored_fields_memory_in_bytes"),
"elasticsearch.indices.segments.term_memory_in_bytes": ("gauge", "indices.segments.term_memory_in_bytes"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stat term_memory_in_bytes is unavailable - might be a typo that was forgotten here. This is what's causing the two 2.0+ tests to fail. Removal of the metric works fine for me locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gah, should have been term_vectors_memory_in_bytes

@truthbk
Copy link
Member

truthbk commented Aug 24, 2016

@mdelaney the issue is with the term_memory_in_bytes being unavailable (terms_memory_in_bytes is however, so possibly just a typo) - removing makes the tests pass. Could you please fix and then we should be good to go?

"elasticsearch.breakers.fielddata.estimated_size_in_bytes": ("gauge", "breakers.fielddata.estimated_size_in_bytes"),
"elasticsearch.breakers.fielddata.overhead": ("gauge", "breakers.fielddata.overhead"),
"elasticsearch.breakers.fielddata.tripped": ("gauge", "breakers.fielddata.tripped"),
"elasticsearch.breakers.parent.estimated_size_in_bytes": ("gauge", "breakers.fielddata.estimated_size_in_bytes"),
Copy link
Member

@truthbk truthbk Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these should be:

"elasticsearch.breakers.parent.estimated_size_in_bytes": ("gauge", "breakers.parent.estimated_size_in_bytes"),
"elasticsearch.breakers.parent.overhead": ("gauge", "breakers.parent.overhead"),
"elasticsearch.breakers.parent.tripped": ("gauge", "breakers.parent.tripped"),
"elasticsearch.breakers.request.estimated_size_in_bytes": ("gauge", "breakers.request.estimated_size_in_bytes"),
"elasticsearch.breakers.request.overhead": ("gauge", "breakers.request.overhead"),
"elasticsearch.breakers.request.tripped": ("gauge", "breakers.request.tripped"),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@truthbk
Copy link
Member

truthbk commented Aug 25, 2016

@mdelaney added some more feedback as a product of some copy/paste typos.

@mdelaney
Copy link
Contributor Author

Thanks for the feedback. I'll address these today.

@mdelaney mdelaney force-pushed the add-missing-elasticsearch-metrics branch from bf9ef19 to 502f7d4 Compare August 26, 2016 18:36
@mdelaney
Copy link
Contributor Author

The tests related to the metrics added are passing locally now, so assuming they should pass ci tests here now.

@truthbk
Copy link
Member

truthbk commented Aug 26, 2016

@mdelaney thanks a lot for addressing those. We're almost ready to go.... But I do have to pester you with one more thing. I'm so sorry 😅

So the deal is, we have gauges for all new metrics, and I'm not sure if that should always be the case.

For example: elasticsearch.indices.query_cache.hit_count is this a strictly increasing metric, such that its value will never decline over time (until a process restart)? In that case we would probably want to make that metric a rate - to see the rate at which that metric increases. I think there are several metrics where this could be relevant.

On the other hand, there are metrics, such as elasticsearch.thread_pool.force_merge.threads where the number of threads might go up or down during the course of time, and that's what would make a great gauge metric.

Let me know if you need any help with that, or you have any doubts with specific metrics - I know I often do.

Thanks!

@bai
Copy link

bai commented Sep 5, 2016

elasticsearch.indices.query_cache.hit_count will reset on node restart.
I'm really keen to see this PR merged for the next release of the agent, it'd be really helpful.

@mdelaney appreciate your work on this!

@truthbk
Copy link
Member

truthbk commented Sep 6, 2016

Just in case there are doubts, It doesn't really matter if a counter is reset on a node restart, as the rate will reset with it. The main thing here is being able to extract meaning from an otherwise less insightful ever-growing (resets aside) metric.

@mdelaney
Copy link
Contributor Author

mdelaney commented Sep 6, 2016

Sorry for the delay, I had gotten a bit busy. I should have time to finish this up today.

metrics are added for various versions of elasticsearch
@mdelaney mdelaney force-pushed the add-missing-elasticsearch-metrics branch from 502f7d4 to 3c74140 Compare September 6, 2016 18:18
@mdelaney
Copy link
Contributor Author

mdelaney commented Sep 6, 2016

I think I've caught all the stats I added that should not be gauges and caught a small naming mistake for throttle times.

It's worth mentioning that there are almost certainly existing metrics that should not be gauges as well. Likely I'll submit another PR in the future to address that if no one else has. Also, there are definitely other metrics that are still missing but that will be left to a future PR as well :-)

@truthbk
Copy link
Member

truthbk commented Sep 6, 2016

Thanks so much @mdelaney! I agree with the choice of metrics types. And yeah, I wouldn't be surprised if we have some metrics in there that are submitted as gauges when they shouldn't really. Test are green, and all comments have been addressed, so lets :shipit: 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants