You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The distributed tracing feature is heavily used in our environment to deal with problems in distributed environment and it has to be proven a powerful feature for a long time that helps us solve or find out many complicated problems.
Most of the work that are implemented in our own branch has been contributed back to the community.
The main features in the master branch now are as follows:
Trace on cluster DDLs
Trace a query on local node and its sub-queries on remote node
Trace async or sync INSERT on distributed table
Trace queries from HTTP/TCP/GRPC
Propagate tracing context to downstream servers via URL engine
But the status of this feature is still marked as experienmental.
From the community perspective, I think it's time for us to give a plan to promot it as a production ready feature.
Before that, here some things that I can come across to be completed:
Standardize the attribute names as defined in the opentelemetry specification.
The attribute names now can be defined in anyway, it's better to use current specification to standardize some of them to allow the logs can be easily handled by some other external visulization tools. This is NOT backward compatible.
Investigate the root cause of Abort in OpenTelemetry::SpanHolder::finish() #49185
Even though it occurs in Debug build, but it indicates that this may lead to incorrect logs in the Release build if it happens
Add trace_id column to system.query_log
This will give a clear info in the query log that if a query is traced or not. And then it can be used to search/join the opentelemetry_span_logs table
Propagate the tracing context to remote S3
Some S3-compatible remote storages support this distributed tracing feature. It would give us the ability to deal with problems between ClickHouse and underlying S3 storage.
The distributed tracing feature is heavily used in our environment to deal with problems in distributed environment and it has to be proven a powerful feature for a long time that helps us solve or find out many complicated problems.
Most of the work that are implemented in our own branch has been contributed back to the community.
The main features in the master branch now are as follows:
But the status of this feature is still marked as experienmental.
From the community perspective, I think it's time for us to give a plan to promot it as a production ready feature.
Before that, here some things that I can come across to be completed:
The attribute names now can be defined in anyway, it's better to use current specification to standardize some of them to allow the logs can be easily handled by some other external visulization tools. This is NOT backward compatible.
OpenTelemetry::SpanHolder::finish()#49185Even though it occurs in Debug build, but it indicates that this may lead to incorrect logs in the Release build if it happens
trace_idcolumn tosystem.query_logThis will give a clear info in the query log that if a query is traced or not. And then it can be used to search/join the
opentelemetry_span_logstableSee: Add OpenTelemetry Support to Materialized View #41672
Some S3-compatible remote storages support this distributed tracing feature. It would give us the ability to deal with problems between ClickHouse and underlying S3 storage.
What do you think? @alexey-milovidov