Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Microprofile Metrics #257

Merged
merged 2 commits into from
May 3, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions spec/src/main/asciidoc/configuration.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,12 @@ For instance the following config will disable bulkhead policy globally:
Policy will be disabled everywhere ignoring existing policy annotations on methods and classes.

If the above configurations patterns are used with another value than `false` (i.e. `<classname>/<methodname>/<annotation>/enabled=whatever`) non-portable behaviour results.

=== Configuring Metrics Integration

The integration with MicroProfile Metrics can be disabled by setting a config property named `MP_Fault_Tolerance_Metrics_Enabled` to the value `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since MP configuration values can be changed during runtime it should be specified what happens if the value changes. I guess any change during runtime wouldn't have any effect but it's worth to clarify this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will add this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

If this property is absent or set to `true` then the integration with MicroProfile Metrics will be enabled and the metrics listed earlier in this specification
will be added automatically for every method annotated with a `@Retry`, `@Timeout`, `@CircuitBreaker` or `@Bulkhead` annotation.

In order to prevent from any unexpected behaviours, the property `MP_Fault_Tolerance_Metrics_Enabled` will only be read when the application starts.
Any dynamic changes afterwards will be ignored until the application is restarted.
237 changes: 237 additions & 0 deletions spec/src/main/asciidoc/metrics.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
//
// Copyright (c) 2018 Contributors to the Eclipse Foundation
//
// See the NOTICE file(s) distributed with this work for additional
// information regarding copyright ownership.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// You may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Contributors:
// Andrew Rouse

== Integration with Microprofile Metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we clarified on the hangout that all Counters are monotonic and we're going to remove _total suffix we need to state this also in the spec. The Metrics counters can be monotonic but can also be decremented, see Counted.

I suggest to replace "Counter" in every metric type with "Monotonic Counter"

Copy link
Member

@donbourne donbourne Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpMetrics doesn't currently add _total to the end of counters -- I'd suggest we indicate that counter names should end in _total. If mpMetrics adds _total to the end of counter names in prometheus format it should avoid doing so to metric names that already end in _total. (as you've already commented in eclipse/microprofile-metrics#172 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a note at the bottom that all the counters are monotonic. As we're now keeping the total suffix, I think this is sufficient.

I'm reluctant to put "Monotonic Counter" in the type column of the table as Counter is the interface in the Metrics API which must be used.


When Microprofile Fault Tolerance and Microprofile Metrics are used together, metrics are automatically added for each of
the methods annotated with a `@Retry`, `@Timeout`, `@CircuitBreaker` or `@Bulkhead` annotation.

=== Names

The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question - if multiple apps are deployed and some of them use the same class with the same FT annotations, would that mean there's only one combined metric for both annotated methods in both apps?

AFAIK, FT Metrics 1.0 doesn't allow exposing the same metric multiple times for each application. It's possible to add tags to a metric but that wouldn't help if 2 metrics would have the same name.

There's an issue open to add support for metrics coming from different applications: eclipse/microprofile-metrics#65.

We should at least specify that FT metrics should be exposed in the same way as application-defined metrics. Later, when the Metrics spec comes to a solution, FT metrics from different apps would be handled automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good point. I've already added a note to say that the metric names may change in response to enhancements and changes in the metrics spec.

It would also be good to call out that they'll appear in the same way as application-defined metrics.

As well as the problem of metrics from two applications clashing, we've also got a similar problem if a class has two methods with the same name and different arguments. That's on my list to bring up at tomorrow's meeting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this in the Notes section at the bottom of the file.

In the tables below, the placeholder `<name>` should be replaced by the fully qualified method name.

=== Metrics added for `@Retry`, `@Timeout`, `@CircuitBreaker` and `@Bulkhead`

Implementations must ensure that if any of these annotations are present on a method, then the following metrics are added only once for that method.

[cols="8,3,4,9"]
|===
| Name | Type | Unit | Description

|`ft.<name>.invocations.total`
| Counter | None
| The number of times the method was called

|`ft.<name>.invocations.failed.total`
| Counter | None
| The number of times the method was called and, after all Fault Tolerance actions had been processed, threw a `Throwable`
|===

=== Metrics added for `@Retry`

[cols="8,3,4,9"]
|===
| Name | Type | Unit | Description

|`ft.<name>.retry.callsSucceededNotRetried.total`
| Counter | None
| The number of times the method was called and succeeded without retrying

|`ft.<name>.retry.callsSucceededRetried.total`
| Counter | None
| The number of times the method was called and succeeded after retrying at least once

|`ft.<name>.retry.callsFailed.total`
| Counter | None
| The number of times the method was called and ultimately failed after retrying

|`ft.<name>.retry.retries.total`
| Counter | None
| The total number of times the method was retried
|===

Note that the sum of `ft.<name>.retry.callsSucceededNotRetried.total`, `ft.<name>.retry.callsSucceededRetried.total` and `ft.<name>.retry.callsFailed.total` will give the total number of calls for which the Retry logic was run.

=== Metrics added for `@Timeout`

[cols="8,3,4,9"]
|===
| Name | Type | Unit | Description

|`ft.<name>.timeout.executionDuration`
| Histogram | Nanoseconds
| Histogram of execution times for the method

|`ft.<name>.timeout.callsTimedOut.total`
| Counter | None
| The number of times the method timed out

|`ft.<name>.timeout.callsNotTimedOut.total`
| Counter | None
| The number of times the method completed without timing out
|===

Note that the sum of `ft.<name>.timeout.callsTimedOut.total` and `ft.<name>.timeout.callsNotTimedOut.total` will give the total number of calls for which the Timeout logic was run. This may be larger than the total number of invocations of the method it's also annotated with `@Retry` because the Timeout logic is applied to each Retry attempt, not to the whole method invocation time.

=== Metrics added for `@CircuitBreaker`

[cols="8,3,4,9"]
|===
| Name | Type | Unit | Description

|`ft.<name>.circuitbreaker.callsSucceeded.total`
| Counter | None
| Number of calls allowed to run by the circuit breaker that returned successfully

|`ft.<name>.circuitbreaker.callsFailed.total`
| Counter | None
| Number of calls allowed to run by the circuit breaker that then failed

|`ft.<name>.circuitbreaker.callsPrevented.total`
| Counter | None
| Number of calls prevented from running by an open circuit breaker

|`ft.<name>.circuitbreaker.open.total`
| Gauge<Long> | Nanoseconds
| Amount of time the circuit breaker has spent in open state

|`ft.<name>.circuitbreaker.halfOpen.total`
| Gauge<Long> | Nanoseconds
| Amount of time the circuit breaker has spent in half-open state

|`ft.<name>.circuitbreaker.closed.total`
| Gauge<Long> | Nanoseconds
| Amount of time the circuit breaker has spent in closed state

|`ft.<name>.circuitbreaker.opened.total`
| Counter | None
| Number of times the circuit breaker has moved from closed state to open state
|===

Note that the sum of `ft.<name>.circuitbreaker.callsSucceeded.total`, `ft.<name>.circuitbreaker.callsFailed.total` and `ft.<name>.circuitbreaker.callsPrevented.total` will give the total number of calls for which the Bulkhead logic was run.

Similarly, the sum of `ft.<name>.circuitbreaker.open.total`, `ft.<name>.circuitbreaker.halfOpen.total` and `ft.<name>.circuitbreaker.closed.total` will give the total time that the circuit breaker has been active for.


=== Metrics added for `@Bulkhead`

[cols="8,3,4,9"]
|===
| Name | Type | Unit | Description

|`ft.<name>.bulkhead.concurrentExecutions`
| Gauge<Long> | None
| Number of currently running executions

|`ft.<name>.bulkhead.callsAccepted.total`
| Counter | None
| Number of calls accepted by the bulkhead

|`ft.<name>.bulkhead.callsRejected.total`
| Counter | None
| Number of calls rejected by the bulkhead

|`ft.<name>.bulkhead.executionDuration`
| Histogram | Nanoseconds
| Histogram of method execution times. This does not include any time spent waiting in the bulkhead queue.

|`ft.<name>.bulkhead.waitingQueue.population`*
| Gauge<Long> | None
| Number of executions currently waiting in the queue

|`ft.<name>.bulkhead.waiting.duration`*
| Histogram | Nanoseconds
| Histogram of the time executions spend waiting in the queue
|===

*Only added if the method is also annotated with `@Asynchronous`

== Notes

Metrics added by this specification will appear as application metrics for the application which uses the Fault Tolerance annotations.

Future versions of this specification may change the definitions of the metrics which are added to take advantage of
enhancements in the MicroProfile Metrics specification.

If more than one annotation is applied to a method, the metrics associated with each annotation will be added for that method.

All of the counters are monotonic and count the number of events which occurred since the application started.
It is expected that these counters will be sampled regularly by monitoring software which is then able to compute deltas
or moving averages from the gathered samples.

=== Annotation Example

[source, java]
----
package com.exmaple;

@Timeout(1000)
public class MyClass {

@Retry
public void doWork() {
// work
}

}
----

This class would result in the following metrics being added.

* `ft.com.example.MyClass.doWork.invocations.total`
* `ft.com.example.MyClass.doWork.invocations.failed`
* `ft.com.example.MyClass.doWork.retry.callsSucceededNotRetried.total`
* `ft.com.example.MyClass.doWork.retry.callsSucceededRetried.total`
* `ft.com.example.MyClass.doWork.retry.callsFailed.total`
* `ft.com.example.MyClass.doWork.retry.retries.total`
* `ft.com.example.MyClass.doWork.timeout.executionDuration`
* `ft.com.example.MyClass.doWork.timeout.callsTimedOut.total`
* `ft.com.example.MyClass.doWork.timeout.callsNotTimedOut.total`

Now imagine the `doWork()` method is called and the invocation goes like this:
* On the first attempt, the invocation takes more than 1000ms and times out
* On the second attempt, something goes wrong and the method throws an `IOException`
* On the third attempt, the method returns successfully and the result of this attempt is returned to the user

After this sequence, the value of these metrics would be as follows:

`ft.com.example.MyClass.doWork.invocations.total = 1` +
The method has been called once.

`ft.com.example.MyClass.doWork.invocations.failed = 0` +
No exceptions were propagated back to the caller.

`ft.com.example.MyClass.doWork.retry.callsSucceededNotRetried.total = 0` +
`ft.com.example.MyClass.doWork.retry.callsSucceededRetried.total = 1` +
`ft.com.example.MyClass.doWork.retry.callsFailed.total = 0` +
Only one call was made, and it succeeded after some retries.

`ft.com.example.MyClass.doWork.retry.retries.total = 2` +
Two retries were made during the invocation.

`ft.com.example.MyClass.doWork.timeout.executionDuration` +
The `Histogram` will have been updated with the length of time taken for each attempt. It will show a count of `3` and will have calculated averages and percentiles from the execution times.

`ft.com.example.MyClass.doWork.timeout.callsTimedOut.total = 1` +
One of the attempts timed out.

`ft.com.example.MyClass.doWork.timeout.callsNotTimedOut.total = 2` +
Two of the attempts did not time out.
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ include::circuitbreaker.asciidoc[]

include::bulkhead.asciidoc[]

include::configuration.asciidoc[]

include::metrics.asciidoc[]

include::configuration.asciidoc[]

7 changes: 7 additions & 0 deletions tck/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,13 @@
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.eclipse.microprofile.metrics</groupId>
<artifactId>microprofile-metrics-api</artifactId>
<version>1.1</version>
<scope>provided</scope>
</dependency>

<dependency>
<!-- actually only referenced in JavaDoc -->
<groupId>org.apache.geronimo.specs</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@

/**
* Verify the asynchronous invocation
*
* @author
*/
public class AsynchronousTest extends Arquillian {

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/*
*******************************************************************************
* Copyright (c) 2018 Contributors to the Eclipse Foundation
*
* See the NOTICE file(s) distributed with this work for additional
* information regarding copyright ownership.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* You may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*******************************************************************************/
package org.eclipse.microprofile.fault.tolerance.tck.metrics;

import java.time.temporal.ChronoUnit;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.Future;

import javax.enterprise.context.RequestScoped;

import org.eclipse.microprofile.faulttolerance.Asynchronous;
import org.eclipse.microprofile.faulttolerance.Bulkhead;
import org.eclipse.microprofile.faulttolerance.CircuitBreaker;
import org.eclipse.microprofile.faulttolerance.Retry;
import org.eclipse.microprofile.faulttolerance.Timeout;

@RequestScoped
public class AllMetricsBean {

@Retry(maxRetries = 5)
@Bulkhead(3)
@Timeout(value = 1000, unit = ChronoUnit.MILLIS)
@CircuitBreaker(failureRatio = 1.0, requestVolumeThreshold = 20)
@Asynchronous
public Future<Void> doWork() {
return CompletableFuture.completedFuture(null);
}

}
Loading