-
Notifications
You must be signed in to change notification settings - Fork 4.5k
stats/opentelemetry: record retry attempts from clientStream #8342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
stats/opentelemetry: record retry attempts from clientStream #8342
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8342 +/- ##
==========================================
- Coverage 82.27% 82.26% -0.02%
==========================================
Files 413 414 +1
Lines 40430 40437 +7
==========================================
Hits 33264 33264
- Misses 5796 5803 +7
Partials 1370 1370
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we had separate tests for retries. This change should only affect that test. This change shouldn't affect tests which are doing only single attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinothkumarr227 have you tested this to ensure its working correctly? I was under impression that TestTraceSpan_WithRetriesAndNameResolutionDelay will need changes for expected values. I think its not working as expected because you are not setting the count back to ctx after incrementing.
// increment previous rpc attempts applicable for next attempt | ||
atomic.AddUint32(&ai.previousRPCAttempts, 1) | ||
} | ||
if rs.Client { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to stop setting the retry attributes in the server side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes — per gRPC OpenTelemetry tracing proposal (A72), retry information is tracked only on the client side. On the server side, there is a single span per RPC (Recv..), and no per-attempt retries are recorded. Therefore, we do not set retry-related attributes like previous-rpc-attempts or transparent-retry on server spans.
stats/opentelemetry/trace.go
Outdated
// Increment retry count for the next attempt if not a transparent | ||
// retry. | ||
if !rs.IsTransparentRetryAttempt && rs.Client { | ||
ci := getCallInfo(ai.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to get ci
again. The reason to use the pointer is that we can increment ai.previousRPCAttempts
and the change gets reflected in ci.previousRPCAttempts
automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I’ve removed the redundant getCallInfo call.
span.SetAttributes(attrs...) | ||
// Increment retry count for the next attempt if not a transparent | ||
// retry. | ||
if !rs.IsTransparentRetryAttempt && rs.Client { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to increment attempts for only non-transparent retries? Why did we stop incrementing the counter for transparent retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only increment the attempts count for non-transparent retries because, per the gRPC OpenTelemetry tracing spec (A72), the previous-rpc-attempts.
Fixes: #8299
RELEASE NOTES:
grpc.previous-rpc-attempts
) are now recorded as span attributes for non-transparent client retries.