Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add opentelemetry tracing and metrics #202

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

tqwewe
Copy link
Member

@tqwewe tqwewe commented Mar 28, 2023

Todo:

  • Each process should have its own single context.
  • Ensure doc comments are written correctly.
  • Make sure host function trait bounds are only whats required.
  • Figure out how to add the environment id to prometheus target_info. (Turns out its not possible)
  • Add push and take functions for resource sharing. Though, this should not "move" the resource, probably would be better to allow multiple processes to share metric resources. I don't think this is needed in this PR for now.
  • What do I do with distributed state?

Spans cannot be shared across processes, as they are in a tree structure, and sharing them means it would be possible to drop a parent span before its child, which wouldn't make sense.

Running the spawn process benchmark, this PR does not seem to affect performance of spawning processes.

Related PRs:
open-telemetry/opentelemetry-rust#1009
open-telemetry/opentelemetry-rust#1018


Screenshots for examples/metrics.rs

https://github.com/lunatic-solutions/lunatic-rs/blob/4681561eb78d1164bc1b2eef7c436bcab36622ab/examples/metrics.rs#L21-L78

Terminal

[2023-04-06T07:22:53Z INFO  metrics] Additional log message, with formatting!
[2023-04-06T07:22:53Z INFO  metrics] formatted object
[2023-04-06T07:22:53Z INFO  metrics] debug object
[2023-04-06T07:22:53Z INFO  metrics] person object
[2023-04-06T07:22:53Z INFO  my_app] a log from my_app
[2023-04-06T07:22:53Z INFO  metrics] a log under my_span

Jaeger
jaeger

Prometheus
prometheus

Comment on lines +648 to +656
tracer: Arc::new(BoxedTracer::new(Box::new(NoopTracer::new()))),
tracer_context: Arc::new(Context::new()),
process_context: Context::new(),
meter_provider: GlobalMeterProvider::new(NoopMeterProvider::new()),
logger: Arc::new(
env_logger::Builder::new()
.filter_level(log::LevelFilter::Off)
.build(),
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully understood about new_dist_state, and have made all logging for this be noops.

Is there any advice on what I should do here? Should logging to the terminal print on the control server?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So dist state only has references to control client and "node". Node implements communication with other nodes (for example, it receives a spawn command from another node). Control client talks to the control server.

We can leave it out for now.

@@ -127,6 +130,7 @@ anyhow = "1.0"
bincode = "1.3"
dashmap = "5.4"
log = "0.4"
opentelemetry = { version = "0.19", git = "https://github.com/tqwewe/opentelemetry-rust", branch = "cow", features = ["metrics"] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this PR uses some changes in opentelemetry-rust which are not yet published.

The two PR's are:
open-telemetry/opentelemetry-rust#1009
open-telemetry/opentelemetry-rust#1018

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw PRs got merged, do they plan to release the newer version soon?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at their previous releases, it seems they don't push releases very frequently :(

I've just made a discussion on it, hopefully we can get more insight there
open-telemetry/opentelemetry-rust#1031

@tqwewe tqwewe marked this pull request as ready for review April 10, 2023 09:30
@tqwewe tqwewe changed the title feat: add opentelemetry metrics and host functions feat: add opentelemetry tracing and metrics Apr 10, 2023
Copy link
Contributor

@withtypes withtypes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good! All metrics include some attributes like: node id, environment id, process id. They are not implicitly set, right? We have to set them on each call? We also want to set them in vm, no guest, so that it can be trusted.

Comment on lines +648 to +656
tracer: Arc::new(BoxedTracer::new(Box::new(NoopTracer::new()))),
tracer_context: Arc::new(Context::new()),
process_context: Context::new(),
meter_provider: GlobalMeterProvider::new(NoopMeterProvider::new()),
logger: Arc::new(
env_logger::Builder::new()
.filter_level(log::LevelFilter::Off)
.build(),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So dist state only has references to control client and "node". Node implements communication with other nodes (for example, it receives a spawn command from another node). Control client talks to the control server.

We can leave it out for now.

crates/lunatic-metrics-api/src/lib.rs Show resolved Hide resolved
@@ -127,6 +130,7 @@ anyhow = "1.0"
bincode = "1.3"
dashmap = "5.4"
log = "0.4"
opentelemetry = { version = "0.19", git = "https://github.com/tqwewe/opentelemetry-rust", branch = "cow", features = ["metrics"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw PRs got merged, do they plan to release the newer version soon?

@tqwewe
Copy link
Member Author

tqwewe commented Apr 18, 2023

Overall it looks good! All metrics include some attributes like: node id, environment id, process id. They are not implicitly set, right? We have to set them on each call? We also want to set them in vm, no guest, so that it can be trusted.

Actually the process_id and environment_id is not attached to every trace/metric, but is only attached to the parent spans.
This might be a problem though since if the runtime exits without closing a span properly (which is quite likely), then we might not know which environment the spans come from.

I'll work on injecting this data to every span/log/metric.

@tqwewe tqwewe marked this pull request as draft April 18, 2023 05:05
@tqwewe tqwewe marked this pull request as ready for review April 18, 2023 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants