Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include event and handler name as tags with handle_in metrics #187

Merged
merged 4 commits into from
Jan 22, 2023

Conversation

clandry94
Copy link
Contributor

@clandry94 clandry94 commented Dec 9, 2022

Change description

Adds the event name that is present from the channel_handled_in telemetry payload as a tag for the duration metric.

I also noticed that :handler was included in the tags field, but wasn't actually set under tag_values. Maybe this was a bug? I added it as a tag value as well.

There's a lot of value including this tag with duration metrics. Right now, if a specific event handler function is slow, there's no way to identify that with without the event dimension. With it added, it's now possible to group by events and monitor for slow handlers.

Example usage

Nothing really required from the user's side to set this up, they'll just need to update their PromEx version. channel_handled_in_duration_milliseconds metrics will now include the event and handler tags. Example output:

phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket",le="10"} 1
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket",le="100"} 1
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket",le="500"} 1
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket",le="1000"} 1
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket",le="5000"} 1
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket",le="10000"} 1
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket",le="+Inf"} 1
phoenix_channel_handled_in_duration_milliseconds_sum{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket"} 0.721542
phoenix_channel_handled_in_duration_milliseconds_count{endpoint="Server.Endpoint",event="select_elements",handler="Elixir.Server.UserSocket"} 1

Checklist

  • I have added unit tests to cover my changes.
  • I have added documentation to cover my changes.
  • My changes have passed unit tests and have been tested E2E in an example project.

@clandry94 clandry94 changed the title feat: include event and handler name as tags with handle_in metrics Include event and handler name as tags with handle_in metrics Dec 9, 2022
}
end,
tags: [:endpoint],
tags: [:endpoint, :handler, :event],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the phoenix tests only check for event counts and not the tag values, so this is a bit hard to test without editing a decent chunk of code. I could add additional test coverage there for this whole module in a follow-up PR if that would be preferred!

%{
endpoint: normalize_module_name(endpoint)
endpoint: normalize_module_name(endpoint),
event: event,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. The only part that gives me pause is the event since it is a binary and has the potential to be an unbounded value. I think it would be best if this module accepts an option to normalize event values. The default implementation would be a passthrough:

fn event -> 
  event
end

But then the user could override this to do something like so:

fn 
  "join" -> "join"
  "leave" -> "leave"
  _ -> "unknown"
end

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @akoutmos good idea and I can implement it like you've described. Just might be a little slow due to the holiday season!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries at all my friend! Happy holidays!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR with support for a normalizer function. I also added some new tests- one to test for handled_in metrics generally which I noticed were missing and another to test that configuring the normalize_event_names option works as expected.

@clandry94
Copy link
Contributor Author

To verify this works in a Phoenix app, I've pushed up this branch to one of Felt's preview instances. Here's some of the metrics emitted-

# HELP phoenix_channel_handled_in_duration_milliseconds The time it takes for the application to respond to channel messages.
# TYPE phoenix_channel_handled_in_duration_milliseconds histogram
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket",le="10"} 1
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket",le="100"} 1
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket",le="500"} 2
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket",le="1000"} 2
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket",le="5000"} 2
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket",le="10000"} 2
phoenix_channel_handled_in_duration_milliseconds_bucket{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket",le="+Inf"} 2
phoenix_channel_handled_in_duration_milliseconds_sum{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket"} 186.01561399999997
phoenix_channel_handled_in_duration_milliseconds_count{endpoint="FeltServerWeb.Endpoint",event="add_elements",handler="Elixir.FeltServerWeb.UserSocket"} 2

Looks like things are working as expected!

@clandry94
Copy link
Contributor Author

Hey @akoutmos, now that the holidays are over just wanted to poke for another review on this PR.

@coveralls
Copy link

coveralls commented Jan 11, 2023

Coverage Status

Coverage: 79.135% (+0.1%) from 79.022% when pulling b25cc9a on clandry94:channel-names into 22664c1 on akoutmos:master.

@akoutmos
Copy link
Owner

It looks like there are some test failures. Could you address those?

@clandry94
Copy link
Contributor Author

@akoutmos the tests should be fixed now. They were failing in different test files because I've added 2 additional test events for channel_handled_in and those tests depended on counts of metrics appearing.

Copy link
Owner

@akoutmos akoutmos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@akoutmos akoutmos merged commit cefe02c into akoutmos:master Jan 22, 2023
@akoutmos
Copy link
Owner

Thanks for the contribution!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants