Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics, java: add support for Micrometer metrics #2496

Merged
merged 1 commit into from Mar 27, 2024

Conversation

mobuchowski
Copy link
Member

@mobuchowski mobuchowski commented Mar 7, 2024

This PR adds support for Micrometer-based telemetry. This comprises of:

  • adding MeterRegistryFactory mechanism that allows to construct Micrometer's MeterRegistry based on passed configuration
  • adding MicrometerProvider mechanism that allows you to configure metrics from any OpenLineage integration and get configured MeterRegistry instance
  • adding proper metrics config to OpenLineage config
  • adding StatsDMetricsBuilder
  • implementing instrumentation into Java client.

The metrics mechanism can forward metrics to any Micrometer compatible implementation - list here - however compatible wrapper MeterRegistryFactory needs to be added. This PR adds only support for StatsD based one, as well as in-memory SimpleMetricsBuilder and CompositeMetricsBuilder that allows you to configure multiple backends. However, global registry provided by MicrometerProvider is a CompositeMetricsBuilder, so that using it manually is generally unadvisable.

@d-m-h
Copy link
Contributor

d-m-h commented Mar 7, 2024

I know that this is in draft, and I like the idea of gathering telemetry, however can we make this feature opt-in? That is, a user needs to add the JAR to the classpath for telemetry to be gathered.

My motivations are:

  1. Gathering telemetry will require additional compute resources. A team that I am working with in order to gather lineage from has really tight SLOs, and they're quite vocal about the impact adding new things to their offerings because of the impacts it can add.
  2. I'm not certain what's the goal of gathering telemetry?
  3. If we're going to go the telemetry way, instead of creating a bespoke telemetry offering, can we not use something like OpenTelemetry instead?

Comment on lines 103 to 105
sourceCompatibility = JavaVersion.VERSION_11
targetCompatibility = JavaVersion.VERSION_11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will break the Spark integration tests, as they use JDK 8. Furthermore, upgrading the Spark integration tests to JDK 11 may cause the Spark 2.4.8 integration test to break - though I cannot say for sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not worth yet to do full review, I'll revert this back to JDK 8 and let you review once this goes out of draft.

However the questions around whole idea are good and worth having, just not particular lines of code :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @d-m-h it's ready for review.

@mobuchowski
Copy link
Member Author

mobuchowski commented Mar 8, 2024

@d-m-h yes - it will be optional, and not collected when not enabled, except in case of debug context, when the metrics would get attached to debugFacet. We will add micrometer-core to OpenLineage-Spark jar, but not particular implementations like OLTP, StatsD - those would be compile only and had to be provided by users.

I'm not certain what's the goal of gathering telemetry?

The goal is to the contrary to what you've said in the first sentence - with telemetry we aim to make using OpenLineage integration easier operationally. Some large organizations want answers for questions like:

  • Is the plugin running?
  • How busy is the plugin?
  • What's the version of Plugin/Spark
  • Were there any dropped events, or was integration killed in the middle of emitting events?
  • How much memory was consumed when event processing started?

Additionally, we've seen some cases where lineage extraction was taking a long time. Having timing metrics around lineage extraction would help us understand where the things are slow, besides obvious candidates. It will be also easier for some of us to debug cases where things go wrong, as compared to you, we usually don't have access to people's jobs, just what they decide to send to us, which are mostly logs and events.

If we're going to go the telemetry way, instead of creating a bespoke telemetry offering, can we not use something like OpenTelemetry instead?

It's not bespoke - you can send it to OLTP: https://docs.micrometer.io/micrometer/reference/implementations/otlp.html as well as other implementations like StatsD/Datadog which one of our clients use.
Micrometer is just simple abstraction for timers, meters, gauges that can be used with multiple backends, a library similar to SLF4J for logging.

Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primo PR. Beside other comments and questions, my major concern is: does it make sense to combine in a single PR for general metrics feature + metrics per builder/visitor to be included in debug facet? I would say it can be easier to split it into 2 PRs.

@Override
public void onApplicationStart(SparkListenerApplicationStart applicationStart) {
initializeContextFactoryIfNotInitialized(applicationStart.appName());
appStartCounter.increment();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to implement similar counter for other event types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@mobuchowski mobuchowski force-pushed the openlineage-metrics branch 16 times, most recently from 7c013ee to f673013 Compare March 17, 2024 16:45
@mobuchowski mobuchowski marked this pull request as ready for review March 17, 2024 17:18
@mobuchowski mobuchowski requested a review from harels March 17, 2024 17:18
@mobuchowski mobuchowski changed the title metrics: add support for Java Micrometer metrics metrics, java: add support for Micrometer metrics Mar 18, 2024
Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mobuchowski thank you for updating docs and splitting PRs.

Left few minor wording comments - are we using properly meter vs metric through class naming?

@mobuchowski mobuchowski force-pushed the openlineage-metrics branch 2 times, most recently from 9e4581f to 448bd7d Compare March 26, 2024 12:12
Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
@mobuchowski mobuchowski merged commit 2d1ee64 into main Mar 27, 2024
40 checks passed
@mobuchowski mobuchowski deleted the openlineage-metrics branch March 27, 2024 13:47
mobuchowski added a commit that referenced this pull request Mar 29, 2024
Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
blacklight pushed a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
…eage#2496)

Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants