-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats/opentelemetry: add trace event for name resolution delay #8074
base: master
Are you sure you want to change the base?
stats/opentelemetry: add trace event for name resolution delay #8074
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #8074 +/- ##
==========================================
- Coverage 82.25% 82.00% -0.26%
==========================================
Files 393 410 +17
Lines 39143 40248 +1105
==========================================
+ Hits 32197 33004 +807
- Misses 5616 5869 +253
- Partials 1330 1375 +45
🚀 New features to boost your workflow:
|
stats/opentelemetry/e2e_test.go
Outdated
}, | ||
events: []trace.Event{ | ||
{ | ||
Name: "Delayed LB pick complete", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is interesting. So, LB policy provided a new picker as well. Is this always the case when the rpc block on name resolution delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dfawley cc
// event is logged only once using an atomic flag, even across retries. | ||
// Add a "Delayed name resolution complete" event to the call span | ||
// if there was name resolution delay. In case of multiple retry attempts, | ||
// ensure that event is added only once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: period (.) at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return nil | ||
} | ||
|
||
func setupRetryStubServer(t *testing.T, metricsOptions *opentelemetry.MetricsOptions, traceOptions *experimental.TraceOptions) *stubserver.StubServer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add docstring similar to setupStubServer
summarizing the logic of retries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stats/opentelemetry/e2e_test.go
Outdated
// "Delayed name resolution complete" event is recorded in the call trace span | ||
// only once if any of the retry attempt encountered a delay in name resolution | ||
func (s) TestSpan_WithRetriesAndNameResolutionDelay(t *testing.T) { | ||
func TestTraceSpan_WithRetriesAndNameResolutionDelay(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add (s) back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stats/opentelemetry/e2e_test.go
Outdated
if err != nil { | ||
t.Logf("FullDuplexCall error: %v", err) | ||
rpcError <- err | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't return here. Let it continue as we want to fail gracefully below if error is received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stats/opentelemetry/e2e_test.go
Outdated
return | ||
} | ||
|
||
rpcError <- nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to set nil? rpcError will just be appended with next error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stats/opentelemetry/e2e_test.go
Outdated
found := false | ||
for _, span := range spans { | ||
if span.Name == want.name && span.SpanKind.String() == want.spanKind { | ||
found = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need found? Looks like you are breaking the first time itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable found was used to track whether a specific span name or match occurred during the loop execution. I have now renamed the variable to match to align with the following check: if !match { t.Errorf("Span not found: %s (kind: %s)", want.name, want.spanKind) }
@@ -148,6 +149,11 @@ type callInfo struct { | |||
target string | |||
|
|||
method string | |||
// nameResolutionEventAdded is an atomic flag that ensures the name | |||
// resolution delay event is added to the call span only once. If a retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
....name resolution delay event is added to the OpenTelemetry trace
call span only once....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for second sentence. Can just say "only once across retry attempts"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -25,6 +25,7 @@ import ( | |||
otelinternaltracing "google.golang.org/grpc/stats/opentelemetry/internal/tracing" | |||
) | |||
|
|||
const delayedResolutionEventName = "Delayed name resolution complete" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be single block
const (
delayedResolutionEventName = "Delayed name resolution complete"
tracerName = "grpc-go"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
rpcError <- fmt.Errorf("UnaryCall failed: %w", err) | ||
}() | ||
|
||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't create another go routine. Make both Unary call and Streaming call in one go routine. Otherwise there can be unnecessary flakiness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented this by wrapping both waiting operations in separate select blocks, The errors are now stored in these channels and handled within their respective select blocks.
stats/opentelemetry/e2e_test.go
Outdated
}) | ||
|
||
select { | ||
case err := <-rpcError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wouldn't ensure if streaming and unary calls have finished. We need to wait until both are finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stats/opentelemetry: add trace event for name resolution delay.
RELEASE NOTES: None