Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(metrics): collect gauge values asynchronously #213

Closed

Conversation

justenwalker
Copy link
Contributor

What

When using the "otel" collector for metrics-reporter. You will get a log message like:

WARNING: Instrument X has recorded multiple values for the same attributes.

This is because the Gauge builder in the OpenTelemetry Java SDK can only collect gauge values asynchronously.

The old code essentially redefined the gauge every time with a callback that produces the same value. The first registration succeeds since there is no gauge defined; but subsequent gauge updates failed because of the duplicate attributes.

How

This fix attempts to solve this by using a synchronous map and only defining the gauge once. The callback uses the sync map to get the latest value. Subsequent calls to gauge update the value in the map rather than re-defining the gauge again.

This sort of a hack. The way that the API of OpenTelemetry is designed is such that you should define your gauge up-front, passing a callback which would perform the measurements periodically by the SDK. However, this API does not conform to the MetricClient interface so this adapter is necessary unless the metrics client is refactored to support this use-case.

Fixes airbytehq/airbyte#15623

Recommended reading order

  1. airbyte-metrics/metrics-lib/src/main/java/io/airbyte/metrics/lib/OpenTelemetryMetricClient.java
  2. airbyte-metrics/metrics-lib/src/test/java/io/airbyte/metrics/lib/OpenTelemetryMetricClientTest.java

Can this PR be safely reverted / rolled back?

  • YES πŸ’š
  • NO ❌

🚨 User Impact 🚨

Should be no user impact (aside from fixing the bug)

@justenwalker justenwalker force-pushed the jwalker/monitoring-otel branch 3 times, most recently from 765be8d to 8aae09b Compare April 4, 2023 14:30
When using the "otel" collector for metrics-reporter.
You will get a log message like:

   WARNING: Instrument X has recorded multiple values for the same attributes.

This is because the Gauge builder in the OpenTelemetry Java SDK can only
collect gauge values asynchronously.

The old code essentially redefined the gauge every time with a callback
that produces the same value. The first registration succeeds since
there is no gauge defined; but subsequent gauge updates failed because
of the duplicate attributes.

This fix attempts to solve this by using a synchronous map and only
defining the gauge once. The callback uses the sync map to get the
latest value. Subsequent calls to gauge update the value in the map
rather than re-defining the gauge again.

This sort of a hack. The way that the API of OpenTelemetry is designed
is such that you should define your gauge up-front, passing a callback
which would perform the measurements periodically  by the SDK.
However, this API does not conform to the MetricClient interface
so this adapter is necessary unless the metrics client is refactored
to support this use-case.

Fixes #15623
@justenwalker
Copy link
Contributor Author

@xiaohansong - would perhaps be the best person to review according to git history.

- Add some comments to explain the implementation and why it is done
  this way.
- A little refactoring of the method body
Copy link
Contributor

@xiaohansong xiaohansong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

I'm thinking maybe gauge registration should happen in initialization function, while the gauge function is simply registering the metric/attributes into map you defined above.

MetricClient is supposed to be singleton. The callback from gauge initialization will query from the map, just like the way you implemented. What do you think?

@justenwalker
Copy link
Contributor Author

justenwalker commented Apr 4, 2023

Thanks for your contribution!

I'm thinking maybe gauge registration should happen in initialization function, while the gauge function is simply registering the metric/attributes into map you defined above.

MetricClient is supposed to be singleton. The callback from gauge initialization will query from the map, just like the way you implemented. What do you think?

We could create all gauges in initialization; however I think that does mean that it's possible to call the gauge method with a metric name that does not exist in the map. It also means that we'll have to remember to add new gauges here or it will fail (perhaps silently, depending on implementation). I'll defer to your judgement if that's acceptable.

Copy link
Contributor

@xiaohansong xiaohansong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justenwalker Ah I misunderstood the change. Sounds like each metric will need to register onto opentelemetry gauge and their corresponding callback functions will be triggered periodically. This makes sense, please ignore my previous comment. Thanks!

@virtualtam
Copy link

virtualtam commented Apr 5, 2023

Hi, OP for airbytehq/airbyte#15623 πŸ‘‹

We tested the PR using a local Airbyte v0.42.1 stack (similar to the one documented on the Discourse thread), and:

  • rebuilt the airbyte/worker and airbyte/metrics-reporter Docker images locally
  • configured Airbyte to send metrics via OpenTelemetry
  • configured several Airbyte connections to run multiple sync jobs

Gauges are properly emitted, and we see no errors in the metrics-reporter nor otel-collector service logs.

image

@xiaohansong
Copy link
Contributor

/create-oss-pr

@xiaohansong
Copy link
Contributor

I run the format fix and then merged the PR. 13b08e6

Thanks for the contribution!

@xiaohansong xiaohansong closed this Apr 5, 2023
@virtualtam
Copy link

Thanks @justenwalker for the contribution and @xiaohansong for merging πŸ‘

One quick remark though: the resulting commit, 13b08e6 , does not reference the original PR ( I guess #5658 is an internal support ticket ID).

The commit does not even reference the original author, and there are 20 people identified as co-authors, which feels kind of weird?

I have wondered at this several times since the airbyte-platform repository was created, as this makes it hard when looking at the history, to find the original PR or issue a given commit is meant to tackle.

Could such commits credit the original author(s), and reference the original issues and PRs, e.g. by adding links in the commit message?

@xiaohansong
Copy link
Contributor

Ack - sorry about that, I think we identified a bug in our merging command. These 20 people are mostly generated from empty commits by merging main branch into this one.

I understand it's frustrating and discouraging to lose author's information. We are pretty new when converting to this airbyte-platform repository and this type of behavior was not tested well, so I apologize for not crediting the contributor, and our team is working on fixing it. Thank you both agin for your contribution.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@justenwalker justenwalker deleted the jwalker/monitoring-otel branch May 4, 2023 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants