-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39898: [C++] Add support for OpenTelemetry logging #39905
Conversation
88ee5bd
to
266a26c
Compare
785e1b1
to
20dbb39
Compare
bb26b4b
to
ec6dcbd
Compare
ec6dcbd
to
d571b33
Compare
I've recently updated the PR description, so check that out for a rundown on all the high-level changes. Let me know if anything needs clarification. As mentioned, I've added some temporary log statements to Flight/FlightSQL for demonstrative purposes. You can output the logs to the console in JSON format by navigating to the test directory and running these commands $ export ARROW_LOGGING_BACKEND=arrow_otlp_stderr
$ ./arrow-flight-sql-test --gtest_filter=TestFlightSqlServer.* For example, here's the output of
|
cpp/src/arrow/telemetry/logging.cc
Outdated
} | ||
|
||
otel::sdk::resource::Resource MakeResource(const ServiceAttributes& service_attributes) { | ||
// TODO: We could also include process info... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could we get to a decision on whether to include these attributes? It seems like it could be useful information but, since users would typically be expected to correlate logs with traces, could this also be superfluous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since users would typically be expected to correlate logs with traces, could this also be superfluous?
To clarify, is this because there would already be process info associated with the trace?
For the record, I don't think this is an ideal place to handle resource creation now that I'm looking at the big picture. This creates and sets attributes for a global OTel resource that could be shared by both a LoggerProvider
and TracerProvider
- but it's currently hidden in the logging implementation exclusively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, is this because there would already be process info associated with the trace?
Yep, that was my thought.
And your point about sharing makes sense to me. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Libraries shouldn't be setting this info. It should be configured once at the top level by the ultimate application. Arrow is just weird because in PyArrow you don't have a way to set it, but I wouldn't want to push this on C++ users.
@lidavidm Somewhat related question: I've noticed that |
It probably shouldn't. But problems like this are why I favor just opening up the gRPC part of Flight so that the user can configure everything themselves... |
7bbd51f
to
174ba4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I started a lot of this work, but we should more seriously consider exposing the underlying gRPC bits and letting applications configure everything themselves instead of trying to wrap a larger and larger API surface - it will be easier for us and easier for developers, and it means we can properly defer the maintenance to gRPC/OpenTelemetry
-DWITH_OTLP_HTTP=ON | ||
-DWITH_OTLP_GRPC=OFF | ||
# Disabled because it seemed to cause linking errors. May be worth a closer look. | ||
-DWITH_FUNC_TESTS=OFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we want tests in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think we can just delete the comment and leave this option off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wouldn't want them, but it's enabled by default (in this version, at least).
@@ -63,6 +64,7 @@ arrow::Result<FlightDescriptor> GetFlightDescriptorForCommand( | |||
arrow::Result<std::unique_ptr<FlightInfo>> GetFlightInfoForCommand( | |||
FlightSqlClient* client, const FlightCallOptions& options, | |||
const google::protobuf::Message& command) { | |||
ARROW_FLIGHT_OTELLOG_SQL_CLIENT(INFO, "[Example message] func=", __func__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't removed from the PR?
@@ -928,6 +929,7 @@ class GrpcClientImpl : public internal::ClientTransport { | |||
Status GetFlightInfo(const FlightCallOptions& options, | |||
const FlightDescriptor& descriptor, | |||
std::unique_ptr<FlightInfo>* info) override { | |||
ARROW_FLIGHT_OTELLOG_CLIENT(INFO, "[Example message] func=", __func__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm wary of having any logs in the library in the first place - I don't want to be in a position where we accidentally log a sensitive payload
cpp/src/arrow/util/logger.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we plan to replace GLog with this facade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I attempted to design it so that it could replace the existing logging system if we wanted to go that route.
-DWITH_OTLP_HTTP=ON | ||
-DWITH_OTLP_GRPC=OFF | ||
# Disabled because it seemed to cause linking errors. May be worth a closer look. | ||
-DWITH_FUNC_TESTS=OFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think we can just delete the comment and leave this option off
-DWITH_FUNC_TESTS=OFF | ||
# These options are slated for removal in v1.14 and their features are deemed stable | ||
# as of v1.13. However, setting their corresponding ENABLE_* macros in headers seems | ||
# finicky - resulting in build failures or ABI-related runtime errors during HTTP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what you mean by setting these macros? IIUC they should be defined just by linking to opentelemetry_api
https://github.com/open-telemetry/opentelemetry-cpp/pull/2435/files#diff-1186bce31e69b2154d84fff38d88b12365327134f5b3ed31e734889784a948dbL128-L131
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into it a bit more. The issue seems to be that the macros do get defined for opentelemetry-cpp
, but they're not exposed to Arrow - which results in an ABI mismatch in the affected headers. I'm pretty sure we're linking against opentelemetry_api
just fine because manually defining the options in ThirdPartyToolchain.cmake
(around here) solves the issue, e.g:
target_compile_definitions(opentelemetry-cpp::api
INTERFACE
ENABLE_HTTP_SSL_PREVIEW
ENABLE_OTLP_HTTP_SSL_PREVIEW
ENABLE_HTTP_SSL_TLS_PREVIEW
ENABLE_OTLP_HTTP_SSL_TLS_PREVIEW)
I haven't found a way to get CMake to automatically pass these options to us, though.
Alternatively, we could further bump the version to 1.14 or 1.15 where this becomes a nonissue. v.1.15 seems to work interchangeably from what I've tested.
Removes the ability to pass attributes/events to the OTel logging APIs along with many of the OTel logger class' extra methods. We may choose to support some of these features at a later date.
85fde99
to
a5955f6
Compare
a5955f6
to
c45b759
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. I don't really want to wrap so much of a third party API but given the constraints we impose on ourselves we have no real choice.
It seems that this breaks CI: https://github.com/apache/arrow/actions/runs/9453191775/job/26038038570#step:7:3293
Could you open a new issue and take a look at this? |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit f2057c5. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 42 possible false positives for unstable benchmarks that are known to sometimes produce them. |
I've meet the problem and created an issue: #42104 |
I'm not sure why this was merged with some CI jobs explicitly failing on the telemetry tests. |
I've opened a PR that should address the CI failures and leftover example logs #42122 |
…2122) ### Rationale for this change Follow-up to #39905, as some issues popped after merging. ### What changes are included in this PR? - Fixes tests failing due to the `ARROW_LOGGING_BACKEND` env var not being defined - Removes example log statements accidentally left in PR ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: #42104 Authored-by: benibus <bpharks@gmx.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
Supporting OTel logs will help us improve diagnostics/debugging where OTel tracing is utilized.
What changes are included in this PR?
Primary changes:
opentelemetry-cpp
version to 1.13.0 to access the stable logs SDKARROW_TELEMETRY
module for these additions (and in anticipation of future OpenMetrics support)telemetry::Logger
class that wraps an OTel logger and an API for creating loggers via OTel's global logger providerSome auxiliary, but significant additions:
ArrowLog
APIs that aims to be more flexible and configurable - which the OTel loggers currently utilize. Notable details:util::Logger
class that enables creating custom loggers.LoggerRegistry
for global access to an arbitrary number of loggersARROW_LOG_*
macros that take individual loggers as a parameter. Additionally, they can be stripped at compile time based on a minimum log levelArrowLogLevel::ARROW_TRACE
as the lowest log level to mirror the equivalent OTel enumsNOTE: I've added some log statements that utilize the OTel facilities to Flight/FlightSQL - which are driven by the FlightSQL server tests. These are for demonstrative purposes only and I intend to remove them prior to merge
Are these changes tested?
Yes (although the OTel-specific tests are currently light)
Are there any user-facing changes?
This will introduce new APIs that are likely to be public.