Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics support for EdgeHub #1290

Closed
wants to merge 26 commits into from
Closed

Conversation

varunpuranik
Copy link
Contributor

@varunpuranik varunpuranik commented Jun 3, 2019

  • Remove existing metrics from EdgeHub codebase
  • Remove existing metrics infrastructure in Util
  • Add new generic metrics infrastructure in Util
  • Add support for prometheus style metrics
  • Add messaging metrics to EdgeHub
    Other metrics coming up in next PR.

@varunpuranik varunpuranik marked this pull request as ready for review June 4, 2019 01:40
@varunpuranik varunpuranik changed the title Draft: Metrics support for EdgeHub Metrics support for EdgeHub Jun 4, 2019
@myagley
Copy link
Contributor

myagley commented Jun 6, 2019

Is it possible to add a dump from the http endpoint to see what the metrics are reported as to this PR description?

<PackageReference Include="App.Metrics.Reporting.InfluxDB" Version="3.0.0" />
<PackageReference Include="App.Metrics.Reporting.TextFile" Version="3.0.0" />
<PackageReference Include="Microsoft.Extensions.Logging" Version="2.2.0" />
<PackageReference Include="Newtonsoft.Json" Version="12.0.1" />
<PackageReference Include="Nito.AsyncEx" Version="5.0.0-pre-05" />
<PackageReference Include="OpenCensus" Version="0.1.0-alpha-42253" />
<PackageReference Include="OpenCensus.Exporter.Prometheus" Version="0.1.0-alpha-42253" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the latest versions? I understand open census is still early, but is there a non dev version of app.metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to remove these.. we don't need them..

static class Metrics
{
static readonly IMetricsTimer MessagesTimer = Util.Metrics.Metrics.Instance.CreateTimer(
"message_send_latency_milliseconds",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the best practice, we should probably standardize on seconds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message_send_duration_seconds

@@ -234,6 +235,86 @@ async Task HandleTwinOperationException(string correlationId, Exception e)
}
}

#region IDeviceProxy

public Task SendC2DMessageAsync(IMessage message) => this.underlyingProxy.SendC2DMessageAsync(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out why this diff is so big and what changed. Is this new code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just moved product code to be above the Events class (it got moved around during the cleanup sometiem back). No changes here other than Metrics class.


public static IDisposable MessageLatency(IIdentity identity) => Util.Metrics.Latency(GetTags(identity), EdgeHubMessageLatencyOptions);
static readonly IMetricsHistogram MessagesHistogram = Util.Metrics.Metrics.Instance.CreateHistogram(
"message_size",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message_size_bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, changed.

}
}
static readonly IMetricsHistogram MessagesProcessLatency = Util.Metrics.Metrics.Instance.CreateHistogram(
"message_process_latency_milliseconds",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message_process_duration_seconds

Task completedTask = await Task.WhenAny(taskCompletionSource.Task, Task.Delay(MessageResponseTimeout));
if (completedTask != taskCompletionSource.Task)
static readonly IMetricsTimer MessagesTimer = Util.Metrics.Metrics.Instance.CreateTimer(
"message_send_latency_milliseconds",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message_send_duration_seconds

static class Metrics
{
static readonly IMetricsMeter SentMessagesMeter = Util.Metrics.Metrics.Instance.CreateMeter(
"messages_sent",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

messages_sent_total?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if we use a meter, AppMetrics adds _total to the name! :)

static class Metrics
{
static readonly IMetricsMeter MessagesMeter = Util.Metrics.Metrics.Instance.CreateMeter(
"messages_received",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

messages_received_total?

"metrics": {
"enabled": true,
"listener": {
"port": 80,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably try to allocate a port here: https://github.com/prometheus/prometheus/wiki/Default-port-allocations and then default to it. For now, we can start with something around 9600

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, sounds good

@@ -12,11 +12,11 @@ namespace Microsoft.Azure.Devices.Edge.Util
using App.Metrics.Timer;
using Microsoft.Extensions.Configuration;

public static class Metrics
public static class Metrics2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the old one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops :(

{
public interface IEdgeMetrics
{
void InitPrometheusMetrics(int port);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitMetrics? We should probably find a way to remove Prometheus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, reason I kept Prometheus in the name was because it also starts a http server.. The idea was to have different methods for push/pull modes..

Maybe I should rename it to InitMetricsListener..

@varunpuranik
Copy link
Contributor Author

varunpuranik commented Jun 6, 2019

Here is the response from the Http endpoint..

# HELP edgehub_gettwin_total 
# TYPE edgehub_gettwin_total counter
edgehub_gettwin_total{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101"} 1

# HELP edgehub_messages_received_total 
# TYPE edgehub_messages_received_total counter
edgehub_messages_received_total{Protocol="Amqp",id="d101/Sender0",edgeDevice="d101"} 82

# HELP edgehub_messages_sent_total 
# TYPE edgehub_messages_sent_total counter
edgehub_messages_sent_total{Protocol="Amqp",from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101"} 81

edgehub_messages_sent_total{to="IoTHub",from="d101/Sender0",edgeDevice="d101"} 81

# HELP edgehub_reported_properties_update_total 
# TYPE edgehub_reported_properties_update_total counter
edgehub_reported_properties_update_total{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101"} 5

# HELP edgehub_message_process_duration_milliseconds 
# TYPE edgehub_message_process_duration_milliseconds summary
edgehub_message_process_duration_milliseconds_sum{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101"} 7112
edgehub_message_process_duration_milliseconds_count{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101"} 81
edgehub_message_process_duration_milliseconds{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101",quantile="0.5"} 134
edgehub_message_process_duration_milliseconds{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101",quantile="0.75"} 140
edgehub_message_process_duration_milliseconds{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101",quantile="0.95"} 145
edgehub_message_process_duration_milliseconds{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101",quantile="0.99"} 233

edgehub_message_process_duration_milliseconds_sum{to="IoTHub",from="d101/Sender0",edgeDevice="d101"} 35969
edgehub_message_process_duration_milliseconds_count{to="IoTHub",from="d101/Sender0",edgeDevice="d101"} 81
edgehub_message_process_duration_milliseconds{to="IoTHub",from="d101/Sender0",edgeDevice="d101",quantile="0.5"} 138
edgehub_message_process_duration_milliseconds{to="IoTHub",from="d101/Sender0",edgeDevice="d101",quantile="0.75"} 147
edgehub_message_process_duration_milliseconds{to="IoTHub",from="d101/Sender0",edgeDevice="d101",quantile="0.95"} 2229
edgehub_message_process_duration_milliseconds{to="IoTHub",from="d101/Sender0",edgeDevice="d101",quantile="0.99"} 2729

# HELP edgehub_message_size_bytes 
# TYPE edgehub_message_size_bytes summary
edgehub_message_size_bytes_sum{edgeDevice="d101"} 27596
edgehub_message_size_bytes_count{edgeDevice="d101"} 82
edgehub_message_size_bytes{edgeDevice="d101",quantile="0.5"} 337
edgehub_message_size_bytes{edgeDevice="d101",quantile="0.75"} 337
edgehub_message_size_bytes{edgeDevice="d101",quantile="0.95"} 337
edgehub_message_size_bytes{edgeDevice="d101",quantile="0.99"} 337

# HELP edgehub_gettwin_duration_milliseconds 
# TYPE edgehub_gettwin_duration_milliseconds summary
edgehub_gettwin_duration_milliseconds_sum{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101"} 0.1515155
edgehub_gettwin_duration_milliseconds_count{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101"} 1
edgehub_gettwin_duration_milliseconds{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101",quantile="0.5"} 0.1515155
edgehub_gettwin_duration_milliseconds{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101",quantile="0.75"} 0.1515155
edgehub_gettwin_duration_milliseconds{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101",quantile="0.95"} 0.1515155
edgehub_gettwin_duration_milliseconds{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101",quantile="0.99"} 0.1515155

# HELP edgehub_message_send_duration_milliseconds 
# TYPE edgehub_message_send_duration_milliseconds summary
edgehub_message_send_duration_milliseconds_sum{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101"} 2.5499204
edgehub_message_send_duration_milliseconds_count{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101"} 81
edgehub_message_send_duration_milliseconds{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101",quantile="0.5"} 0.0309645
edgehub_message_send_duration_milliseconds{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101",quantile="0.75"} 0.0324292
edgehub_message_send_duration_milliseconds{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101",quantile="0.95"} 0.0350137
edgehub_message_send_duration_milliseconds{from="d101/Sender0",to="d101/Receiver0",edgeDevice="d101",quantile="0.99"} 0.0829133

edgehub_message_send_duration_milliseconds_sum{to="IoTHub",from="d101/Sender0",edgeDevice="d101"} 6.3626492
edgehub_message_send_duration_milliseconds_count{to="IoTHub",from="d101/Sender0",edgeDevice="d101"} 64
edgehub_message_send_duration_milliseconds{to="IoTHub",from="d101/Sender0",edgeDevice="d101",quantile="0.5"} 0.0585582
edgehub_message_send_duration_milliseconds{to="IoTHub",from="d101/Sender0",edgeDevice="d101",quantile="0.75"} 0.0594094
edgehub_message_send_duration_milliseconds{to="IoTHub",from="d101/Sender0",edgeDevice="d101",quantile="0.95"} 0.0677852
edgehub_message_send_duration_milliseconds{to="IoTHub",from="d101/Sender0",edgeDevice="d101",quantile="0.99"} 2.6312966

# HELP edgehub_reported_properties_update_duration_milliseconds 
# TYPE edgehub_reported_properties_update_duration_milliseconds summary
edgehub_reported_properties_update_duration_milliseconds_sum{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101"} 1.0222126
edgehub_reported_properties_update_duration_milliseconds_count{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101"} 5
edgehub_reported_properties_update_duration_milliseconds{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101",quantile="0.5"} 0.1568278
edgehub_reported_properties_update_duration_milliseconds{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101",quantile="0.75"} 0.1592095
edgehub_reported_properties_update_duration_milliseconds{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101",quantile="0.95"} 0.1592095
edgehub_reported_properties_update_duration_milliseconds{Target="IoTHub",id="d101/$edgeHub",edgeDevice="d101",quantile="0.99"} 0.1592095

@myagley
Copy link
Contributor

myagley commented Jun 10, 2019

A couple of comments:

  1. we should make all labels snake_case to be consistent. It looks like it is a mix of snake_case and PascalCase
  2. Is it possible to get more percentiles? Specifically p999 and p9999 would be interesting
  3. Should target/to IoTHub be upstream or $upstream instead?
  4. If the library supports descriptions, it would be nice if we can get them to show up in the HELP sections

@@ -22,6 +22,8 @@

<ItemGroup>
<PackageReference Include="App.Metrics" Version="3.0.0" />
<PackageReference Include="App.Metrics.Formatters.Prometheus" Version="3.2.0-dev0002" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does 3.2.0-dev0002 like a rc version? should we use 3.1.0 instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good point, maybe we should.

{
void Increment(long amount);
void Decrement(long amount);
void Increment(long amount, Dictionary<string, string> tags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of tags will be used here? should it be modeled more specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, I don't think so. This is intended to be a generic Metrics API for the Edge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants