Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First version of a monitoring standard proposal. #12

Closed
wants to merge 1 commit into from

Conversation

@pilhuhn
Copy link
Contributor

commented Dec 7, 2016

No description provided.


Data is exposed via REST over http under the `/health` base path

* Metrics will respond to GET requests and are exposed in a tree like fashion with sub-trees for various cases described below.

This comment has been minimized.

Copy link
@pilhuhn

pilhuhn Dec 7, 2016

Author Contributor

Describe how it is possible to fetch multiple metrics at once

@OndroMih

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2016

Proposal for Monitoring specification

! DO NOT MERGE this Pull Request BEFORE REVIEW is finished!

Merging this Pull Request means that the proposal is officially accepted by the MicroProfile Core Team as a MicroProfile incubation project. A public discussion and an agreement within the MicroProfile Core Team is expected before a proposal is accepted/rejected.

If this PR is not a proposal for a new specification, delete this template description.

Suggested steps to follow before this proposal is accepted

  • add label proposal to this PR on github
  • initiate a public discussion on the mP mailing list
    • Email Subject: "Review announcement: <<PROPOSAL_NAME>>"
    • Email Body: Based on review announcement template
      • fill in <<PROPOSAL NAME>>, <<REVIEW END DATE>> (not mandatory, can be announced later, but should be at least 7 days after the announcement email is sent)
        and <<REVIEW MANAGER NAME>> (the name of the sender)
      • update the link to a proposal (NNNN-proposal.md) to point to this Pull Request

After the review, the Core Team should form an agreement whether or not to accept the proposal, or whether ask its author for amendments.

Once this Pull Reuest is merged, the proposal is considered accepted and an incubation repository should be created to facilitate collaboration.

@pilhuhn

This comment has been minimized.

Copy link
Contributor Author

commented Dec 8, 2016

You mean at the top of the proposal and it will be removed by someone once it is reviewed and ready to merge (or even merged)?

I can't add a label - as probably most people can't

@OndroMih

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2016

The description is mainly for people who can add label and merge the pull request - not for the submitter :) This is to guide the core team members what to do with the proposal, to avoid accepting the PR prematurely.

@pilhuhn pilhuhn force-pushed the pilhuhn:monitoring branch 2 times, most recently from 08c5e79 to e2ac53b Dec 8, 2016
@karianna karianna added the proposal label Dec 15, 2016
@Emily-Jiang

This comment has been minimized.

Copy link
Member

commented Dec 16, 2016

I think the proposal sounds reasonable.
Q: for the metadata exposed via /monitor, I assume applications can add/update/remove the default info. Do you have any recommendations on the default useful metadata or they are to be defined at a later stage?

@pilhuhn

This comment has been minimized.

Copy link
Contributor Author

commented Dec 17, 2016

Thank you @Emily-Jiang
It is at leisure of the respective implementation on how to create/set the metadata. It must not change during the process life time. So e.g. if the process on startup says it is using "Bytes" as unit for a gauge it can't later say it is using seconds or kilobytes for it. The reason behind it is that e.g. a monitoring agent on Kubernetes will read the metadata once it sees the new process and stores it. It will not periodically re-query the process for the metadata.
In fact that metadata should probably not change during the life-time of the whole container image, as all containers spawned from it will be "the same" and form part of an app, where it would be confusing in an overall view if the same metric has different metadata.

For additional (optional) metadata, I see also optional description and "display name" (a human readable name of the metric). Possibly others, but those 4 would be the basic set.

I will update the proposal.

@pilhuhn pilhuhn force-pushed the pilhuhn:monitoring branch from e2ac53b to dd6ddd3 Dec 19, 2016
@heiko-braun

This comment has been minimized.

Copy link
Contributor

commented Dec 20, 2016

Can you trailing content until the headline "Monitoring via HTTP endpoints"? It doesn't belong into the proposal itself.

@pilhuhn

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2016

@heiko-braun Oh, I apparently misunderstood #12 (comment). Will remove it. Thanks!

@pilhuhn pilhuhn force-pushed the pilhuhn:monitoring branch from dd6ddd3 to fd18076 Dec 20, 2016
Copy link

left a comment

Really liked the proposal, just added some comments so we can better the description of this doc. I would like to participate more actively on these discussions.


Example:

if `GET /metrics/foo` exposes:

This comment has been minimized.

Copy link
@ivanjunckes

ivanjunckes Dec 25, 2016

Can we have a GET /metrics to get all available metrics? If we have 3 metrics:
/metrics/cpu
/metrics/disk
/metrics/memory
Than the response would be an object with all available metrics and their values.

This comment has been minimized.

Copy link
@pilhuhn

pilhuhn Feb 16, 2017

Author Contributor

Will add this.


## Required metrics

Required metrics are exposed under `/metrics/base`

This comment has been minimized.

Copy link
@ivanjunckes

ivanjunckes Dec 25, 2016

Can we have a small description in this section saying required metrics are the metrics that all vendors must implement? I could only understand it reading the vendor specific data section.

Also, can we start describing which metrics will be required?

  • Memory
  • Disk
  • CPU
  • JVM
  • Database pool
  • Network
  • Processes

This comment has been minimized.

Copy link
@pilhuhn

pilhuhn Feb 16, 2017

Author Contributor

Good one on the short description.
And yes, those should be described with the names to be used.
I also would like to think that the list of required metrics should be rather universal so that they could theoretically also be exposed by non-MP applications (e.g. running on vert.x) or even non-JVM ("node JS")
E.g. for memory, it should be total-memory-used or perhaps heap/non-heap, but not individual pools inside the heap.

Also some metrics are rather hard to obtain inside the VM and often require native libraries (Sigar, Oshi) which we should rather avoid.
On top monitoring agents often also monitor the machine/container the application is running on, so that I think we don't exactly need to gather this data inside the application as well.



## Vendor specific data

This comment has been minimized.

Copy link
@ivanjunckes

ivanjunckes Dec 25, 2016

Can we have some examples of vendor specific metrics?

This comment has been minimized.

Copy link
@Emily-Jiang

Emily-Jiang Jan 10, 2017

Member

This proposal is about providing metrics rather than monitoring. The monitoring should be performed by a monitoring agent on Kubernetes, which relies on the metrics data. Can this proposal be changed to be Metrics to avoid confusion?

This comment has been minimized.

Copy link
@ebullient

ebullient Jan 18, 2017

Yes please -- can the title of the proposal please be switched to use "Metrics" instead of monitoring?

This comment has been minimized.

Copy link
@pilhuhn

pilhuhn Feb 16, 2017

Author Contributor

Yes.

* unit: a fixed set of string units from e.g. [1], [UoM] or [Metrics2.0]
* type:
* counter: a monotonically increasing or decreasing numeric value (e.g. total number of requests received)
* gauge: a numeric value that can arbitrarily go up and down

This comment has been minimized.

Copy link
@ivanjunckes

ivanjunckes Dec 25, 2016

Can we have a real life example of gauge? It is not clear to me.

This comment has been minimized.

Copy link
@pilhuhn

pilhuhn Jan 2, 2017

Author Contributor

cpu or disk usage are examples. Will add them


{"fooVal": 12345}

then `OPTIONS /metrics/foo` will expose:

This comment has been minimized.

Copy link
@ivanjunckes

ivanjunckes Dec 25, 2016

How will OPTIONS work if we have multiple values like the example provided?

{
      "thread-count" : 33,
       "peak-thread-count" : 47,
       "total-started-thread-count" : 49,
       "current-thread-cpu-time" : 66855000,
       "current-thread-user-time" : 64003000
}

Like this?

 {
        "thread-count": {
           "unit": "u",
           "type": "counter",
           "description": "",
           "displayName": "",
           "tags": "app=thread"
          },        
        "peak-thread-count":{
           "unit": "u",
           "type": "counter",
           "description": "",
           "displayName": "",
           "tags": "app=thread"
        }
        ...
} 

I think would be good to have an example just to be clear.

* 200 for successful retrieval of an object
* 204 when retrieving a subtree that would exist, but has no content. E.g. when the application-specific subtree has no application specific metrics defined.
* 404 if an directly-addressed item does not exist. This may be a non-existing sub-tree or non-existing object
* 503 to indicate that a health check resulted in "bad health". The body SHOULD contain details { "details": <text> }

This comment has been minimized.

Copy link
@ivanjunckes

ivanjunckes Dec 25, 2016

503 means "Service Unavailable", wondering if this is the right http status as sometimes we could have a very high CPU usage which can mean bad health but not unavailable.

This comment has been minimized.

Copy link
@pilhuhn

pilhuhn Jan 4, 2017

Author Contributor

I think this code is a bit a copy&paste-o from when I also looked at the health-checks, where the semantics is different.
If the server is able to respond, then the "real" values should be reported. If it is not able to calculate values, it should be a 500 Internal server error. What do you think?

This comment has been minimized.

Copy link
@ivanjunckes

ivanjunckes Jan 10, 2017

That makes sense, thanks.

@keilw

This comment has been minimized.

Copy link

commented Mar 9, 2017

+1 I like the idea of at least optional meta-data like "unit" for certain metrics.

And thanks to the Eclipse IP (and SmartHome) team the same libraries that e.g. Parfait uses to support them in a JVM environment are available to Eclipse projects like MicroProfile, too.

@struberg

This comment has been minimized.

Copy link

commented Mar 14, 2017

+1

Some additional thoughts, maybe for a 1.1 version:
What about defining a few standard metrics?

  • CPU count and utilisation
  • Memory sizes and utilisation
  • thread information (kind of remote-thread-dump in memory)

Just as few ideas.
It should not be too much, but to have access to a few important informations in well specified locations is almost always useful.

@pilhuhn

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2017

@struberg Absolutely - this is actually something I want to do once that initial version is merged. If you look at e.g. https://github.com/eclipse/microprofile-evolution-process/pull/12/files#diff-f201d60fc8e92791bfb67d7887a30fcfR159 there are some open questions already marked around this.

I think, while this is targeted at the JVM, to actually expose those in a way that could also work for other environments.

Some metrics are rather hard to obtain from within Java without additional libraries like Oshi or Sigar though.

@pilhuhn pilhuhn force-pushed the pilhuhn:monitoring branch from 90fa81e to 1c99466 May 5, 2017
Signed-off-by: Heiko W. Rupp <hrupp@redhat.com>
@pilhuhn pilhuhn force-pushed the pilhuhn:monitoring branch from 8eb917d to 7a05281 May 8, 2017
@pilhuhn

This comment has been minimized.

Copy link
Contributor Author

commented May 8, 2017

I'll re-open a new one

@pilhuhn pilhuhn closed this May 8, 2017
@gadekarl

This comment has been minimized.

Copy link

commented May 11, 2017

@pilhuhn Why do we want to expose meta data as OPTIONS /metrics/foo? Why not expose metadata under different base path so that lot of generic tools can easily consume the data? For example:
GET /metadata/metrics/foo

Not sure if this is the right place to comment on the proposal as the issue appears to be in closed state.

@pilhuhn

This comment has been minimized.

Copy link
Contributor Author

commented May 11, 2017

@gadekarl This one was superseeded by #26 - your question is still valid though.

I see OPTIONS /bla as the describing counterpart of one of the other verbs like GET, POST etc. See the http1/1 rfc: https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html

The OPTIONS method represents a request for information about the communication options available on the request/response chain identified by the Request-URI. This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.

See also http://zacstewart.com/2012/04/14/http-options-method.html

I agree though that a separate sub-tree with usage of GET would be more easy for e.g. browser usage. In this case we should not open a new endpoint next to /metrics but rather inside like e.g. /metrics/metadata to not "pollute" the endpoint space more than necessary.

@gadekarl

This comment has been minimized.

Copy link

commented May 12, 2017

@pilhuhn I am aware of the links that you posted. I noted my comments specifically from the tooling consumption point of view. /metric/metadata/ would also be fine. Something to consider. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.