APM instrumentation #1141

jyeshe · 2023-01-23T13:58:53Z

Goal

Turn errors and performance bottlenecks visible by instrumenting the Mdw to collect logs, metrics and spans.

Parameters

OpenTelemetry has become the main standard for this and could be used as much as there are corresponding stable libraries for each of the monitoring concerns.
DataDog would be the initial APM service to be supported

Notes

For now only collecting traces is stable with OpenTelemetry. For logs and metrics there are alternatives using phoenix telemetry and exporters like FluentD.

dincho · 2023-01-26T08:22:42Z

The infrastructure in development supports only Prometheus. Not sure where did you read about OpenTelemetry becoming a main standard, but that's not quite true IMO.

dincho · 2023-01-26T08:23:39Z

In regards to logs, you should not worry about it but support console output + switchable machine readable format, usually JSON.

dincho · 2023-01-26T08:25:06Z

The new infrastructure does not implement yet any tracing caps, but In future we'll add https://www.jaegertracing.io

jyeshe · 2023-01-26T12:23:48Z

The infrastructure in development supports only Prometheus. Not sure where did you read about OpenTelemetry becoming a main standard, but that's not quite true IMO.

I get your point. Not referring to the de facto standard because there are multiple APM services which still use different protocols but for the APM that work on push mode (DataDog being our first APM to be supported), OpenTelemetry is currently the most discussed standardized approach (forums and Code BEAM talks for references) to export data with Elixir on push mode. This on the protocol side while internally, to instrument the app there is also another set of libraries called telemetry* that are part of mainstream.

jyeshe · 2023-01-26T12:39:57Z

In regards to logs, you should not worry about it but support console output + switchable machine readable format, usually JSON.

Okay, could we have them exported? If not we can discuss using free tool. One of the goals would be to receive notifications when a 500 error, which we can also instrument on the app if you prefer. To be able to query for logs would be helpful in some situations to avoid asking for them from time to time (on a release basis at least).

jyeshe · 2023-01-26T12:44:47Z

The new infrastructure does not implement yet any tracing caps, but In future we'll add https://www.jaegertracing.io

No problem, meanwhile I will wrap them up as metrics for the duration of some inner workings.

dincho · 2023-01-26T13:02:10Z

Okay, could we have them exported? If not we can discuss using free tool. One of the goals would be to receive notifications when a 500 error, which we can also instrument on the app if you prefer. To be able to query for logs would be helpful in some situations to avoid asking for them from time to time (on a release basis at least).

Of course, that's the point of the logs. By "just keep them flooding the console" I meant that usually that would be collected from the the console, once we're ready to run this in k8s.

Until then logs to files should work (with datadog), but JSON format would be better. You need that format support in future anyway.

dincho · 2023-01-26T13:03:10Z

receive notifications when a 500 error,

I wouldn't recommend using alert on logs, but based on metrics, you're already having response metrics? by status perhaps ?

jyeshe · 2023-01-26T15:26:56Z

Until then logs to files should work (with datadog), but JSON format would be better. You need that format support in future anyway.

Okay, there is one that adheres to the attribute list

jyeshe · 2023-01-26T15:31:59Z

receive notifications when a 500 error,

I wouldn't recommend using alert on logs, but based on metrics, you're already having response metrics? by status perhaps ?

Thought about logging a 500 as an event but thinking as a measurement I will provide it also as a counter tagged with the request path and parameters. Thanks!

jyeshe self-assigned this Jan 23, 2023

jyeshe mentioned this issue Jan 25, 2023

feat: add metrics observability #1145

Merged

This was referenced Jan 27, 2023

feat: monitor error 500 #1149

Merged

feat: add optional json logger #1161

Merged

jyeshe mentioned this issue Mar 9, 2023

chore: aggregate request metric per route #1225

Merged

yaboiishere closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APM instrumentation #1141

APM instrumentation #1141

jyeshe commented Jan 23, 2023 •

edited

Loading

dincho commented Jan 26, 2023

dincho commented Jan 26, 2023

dincho commented Jan 26, 2023

jyeshe commented Jan 26, 2023 •

edited

Loading

jyeshe commented Jan 26, 2023

jyeshe commented Jan 26, 2023

dincho commented Jan 26, 2023

dincho commented Jan 26, 2023 •

edited

Loading

jyeshe commented Jan 26, 2023

jyeshe commented Jan 26, 2023

APM instrumentation #1141

APM instrumentation #1141

Comments

jyeshe commented Jan 23, 2023 • edited Loading

Goal

Parameters

Notes

dincho commented Jan 26, 2023

dincho commented Jan 26, 2023

dincho commented Jan 26, 2023

jyeshe commented Jan 26, 2023 • edited Loading

jyeshe commented Jan 26, 2023

jyeshe commented Jan 26, 2023

dincho commented Jan 26, 2023

dincho commented Jan 26, 2023 • edited Loading

jyeshe commented Jan 26, 2023

jyeshe commented Jan 26, 2023

jyeshe commented Jan 23, 2023 •

edited

Loading

jyeshe commented Jan 26, 2023 •

edited

Loading

dincho commented Jan 26, 2023 •

edited

Loading