Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APM instrumentation #1141

Closed
jyeshe opened this issue Jan 23, 2023 · 10 comments
Closed

APM instrumentation #1141

jyeshe opened this issue Jan 23, 2023 · 10 comments
Assignees

Comments

@jyeshe
Copy link
Member

jyeshe commented Jan 23, 2023

Goal

Turn errors and performance bottlenecks visible by instrumenting the Mdw to collect logs, metrics and spans.

Parameters

  • OpenTelemetry has become the main standard for this and could be used as much as there are corresponding stable libraries for each of the monitoring concerns.
  • DataDog would be the initial APM service to be supported

Notes

For now only collecting traces is stable with OpenTelemetry. For logs and metrics there are alternatives using phoenix telemetry and exporters like FluentD.

@dincho
Copy link
Member

dincho commented Jan 26, 2023

The infrastructure in development supports only Prometheus. Not sure where did you read about OpenTelemetry becoming a main standard, but that's not quite true IMO.

@dincho
Copy link
Member

dincho commented Jan 26, 2023

In regards to logs, you should not worry about it but support console output + switchable machine readable format, usually JSON.

@dincho
Copy link
Member

dincho commented Jan 26, 2023

The new infrastructure does not implement yet any tracing caps, but In future we'll add https://www.jaegertracing.io

@jyeshe
Copy link
Member Author

jyeshe commented Jan 26, 2023

The infrastructure in development supports only Prometheus. Not sure where did you read about OpenTelemetry becoming a main standard, but that's not quite true IMO.

I get your point. Not referring to the de facto standard because there are multiple APM services which still use different protocols but for the APM that work on push mode (DataDog being our first APM to be supported), OpenTelemetry is currently the most discussed standardized approach (forums and Code BEAM talks for references) to export data with Elixir on push mode. This on the protocol side while internally, to instrument the app there is also another set of libraries called telemetry* that are part of mainstream.

@jyeshe
Copy link
Member Author

jyeshe commented Jan 26, 2023

In regards to logs, you should not worry about it but support console output + switchable machine readable format, usually JSON.

Okay, could we have them exported? If not we can discuss using free tool. One of the goals would be to receive notifications when a 500 error, which we can also instrument on the app if you prefer. To be able to query for logs would be helpful in some situations to avoid asking for them from time to time (on a release basis at least).

@jyeshe
Copy link
Member Author

jyeshe commented Jan 26, 2023

The new infrastructure does not implement yet any tracing caps, but In future we'll add https://www.jaegertracing.io

No problem, meanwhile I will wrap them up as metrics for the duration of some inner workings.

@dincho
Copy link
Member

dincho commented Jan 26, 2023

Okay, could we have them exported? If not we can discuss using free tool. One of the goals would be to receive notifications when a 500 error, which we can also instrument on the app if you prefer. To be able to query for logs would be helpful in some situations to avoid asking for them from time to time (on a release basis at least).

Of course, that's the point of the logs. By "just keep them flooding the console" I meant that usually that would be collected from the the console, once we're ready to run this in k8s.

Until then logs to files should work (with datadog), but JSON format would be better. You need that format support in future anyway.

@dincho
Copy link
Member

dincho commented Jan 26, 2023

receive notifications when a 500 error,

I wouldn't recommend using alert on logs, but based on metrics, you're already having response metrics? by status perhaps ?

@jyeshe
Copy link
Member Author

jyeshe commented Jan 26, 2023

Until then logs to files should work (with datadog), but JSON format would be better. You need that format support in future anyway.

Okay, there is one that adheres to the attribute list

@jyeshe
Copy link
Member Author

jyeshe commented Jan 26, 2023

receive notifications when a 500 error,

I wouldn't recommend using alert on logs, but based on metrics, you're already having response metrics? by status perhaps ?

Thought about logging a 500 as an event but thinking as a measurement I will provide it also as a counter tagged with the request path and parameters. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants