Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document metrics generated within OpenWhisk #3884

Merged
merged 3 commits into from
Jul 25, 2018

Conversation

chetanmeh
Copy link
Member

OpenWhisk generates quite a few metrics via Kamon. This PR aims to document all currently generated metrics such that system administrators can configure the right metrics for monitoring

See here for the rendered HTML version of doc

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@rabbah rabbah requested a review from vvraskin July 17, 2018 12:49
@rabbah
Copy link
Member

rabbah commented Jul 17, 2018

@vvraskin @mhenke1 would you review this?

@chetanmeh this is very useful thanks for doing this.

@codecov-io
Copy link

codecov-io commented Jul 17, 2018

Codecov Report

Merging #3884 into master will decrease coverage by 4.72%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3884      +/-   ##
==========================================
- Coverage   75.78%   71.05%   -4.73%     
==========================================
  Files         145      145              
  Lines        6897     6921      +24     
  Branches      418      410       -8     
==========================================
- Hits         5227     4918     -309     
- Misses       1670     2003     +333
Impacted Files Coverage Δ
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.08%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-81.82%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-58.83%) ⬇️
...la/whisk/core/database/cosmosdb/CosmosDBUtil.scala 92% <0%> (-4%) ⬇️
...er/src/main/scala/whisk/core/invoker/Invoker.scala 0% <0%> (ø) ⬆️
...la/src/main/scala/whisk/common/TransactionId.scala 93.44% <0%> (+0.1%) ⬆️
...cala/whisk/core/entitlement/LocalEntitlement.scala 95.65% <0%> (+0.65%) ⬆️
...sk/core/containerpool/docker/DockerContainer.scala 77.92% <0%> (+0.89%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e96c1bb...118e964. Read the comment docs.

Copy link
Contributor

@vvraskin vvraskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neatly done, thank you!
Just a few minor comments.

docs/metrics.md Outdated
@@ -75,6 +75,183 @@ The docker image exposes StatsD via the (standard) port 8125 and a Grafana dashb

The address of your docker host has to be configured in the `metrics_kamon_statsd_host` configuration property.

### Metric Names

All metric names have to prefixed by a prefix that you specify and are subject to modification by graphite, datadog, or statsd. For example if prefix used is `openwhisk` then metric names would be like `openwhisk.counter.controller_activation_start`. This document assumes that metric name prefix is `openwhisk`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some typos, have to be prefixed probably

docs/metrics.md Outdated

#### Controller metrics

Metrics below are emitted from with a Controller instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from within, maybe?

docs/metrics.md Outdated

##### Controller Startup

* `openwhisk.counter.controller_startup<controller_id>_count` (counter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its openwhisk.counter.controller<controller_id>_startup_count

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the stats in our setup and they are named like openwhisk.counter.controller_startup0_count. Also in code the prefix is startup$id

https://github.com/apache/incubator-openwhisk/blob/8b5abe7bbeb4464af586c1993fc3590e0fe8d516/common/scala/src/main/scala/whisk/common/Logging.scala#L257

docs/metrics.md Outdated

Metrics below are emitted per kafka topic.

* `openwhisk.histogram.kafka_<topic name>.delay_start` - Time delay between when a message was pushed to kafka and when it is read within a consumer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also mention that Delay is being emitted for each pool by Invoker, while Queue metric is emitted every 10 seconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that it was be done via a scheduled task. Per default config it should be emitted every 60 seconds

@markusthoemmes markusthoemmes added review Review for this PR has been requested and yet needs to be done. documentation labels Jul 23, 2018
@chetanmeh
Copy link
Member Author

@vvraskin Thanks for review. One aspect I am struggling a bit in our monitoring is the variable metric names i.e. name which are function of invoker or controller id. They make it hard to aggregate metrics across invoker or controller.

It would be better if we record the id as kamon tag and then use it on observation side to segregate metrics per specific invoker/controller.

Copy link
Contributor

@vvraskin vvraskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vvraskin
Copy link
Contributor

@chetanmeh
Which backend do you use for the system metrics?
If you use Grafana+InfluxDB, it allows us to aggregate metrics of several components using regex filters. I'm wondering whether the same approach would be feasible...

@chetanmeh
Copy link
Member Author

@vvraskin We use Datadog and it does not seem to support regex in metric names. Which makes it tricky to aggregate metrics across various invokers.

Does Grafana+Influx support tags? Looking for common denominator here

@vvraskin
Copy link
Contributor

Our statsd integration with influx won't support tagging, but I think that we already have tagging option for Kamon introduced in this PR 6f1a445

@vvraskin vvraskin merged commit b542fa6 into apache:master Jul 25, 2018
@chetanmeh
Copy link
Member Author

Opened #3917 to discuss possible options for supporting setups which do not support regex

BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
* Document metrics

* Trim trailing space

* Update based on Vadim's review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation review Review for this PR has been requested and yet needs to be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants