Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add 'grpc.client.deadline' tag to track gRPC client's deadline value #2200

Merged

Conversation

nvh0412
Copy link
Contributor

@nvh0412 nvh0412 commented Aug 2, 2022

What does this PR do?
resolve #2199

Motivation
I'm using Datadog tracing to methodically trace data from the whole of our internal system (Web server, gRPC, ...). As I use multi-threaded servers to run our gRPC server, hence I'm facing many DeadlineExceed exceptions from the gRPC client ruby because the thread took so long to finish its job, and the timeout setting isn't enough.

Despite talking about the solution, I want to know the value of client timeout when it sends the request to the gRPC server. In this way, I would know how many redundant thread executions are while its client call is terminated because of timeout.

So I'm adding grpc.client.deadline tag to track gRPC client's deadline value

@nvh0412 nvh0412 force-pushed the chore/add-a-new-tag-for-client-deadline branch 3 times, most recently from 7e27dc6 to 08e664e Compare August 2, 2022 16:39
@nvh0412 nvh0412 marked this pull request as ready for review August 2, 2022 16:51
@nvh0412 nvh0412 requested a review from a team August 2, 2022 16:51
@TonyCTHsu
Copy link
Contributor

Hi @nvh0412 , Thanks for the contribution!

Could you provide a positive example from spec which verify the deadline has been set correctly with a value (ActiveCall#deadline would return a Time object).

Also please rebase the branch to the latest master to avoid failing spec in old Rubies. 😄

@nvh0412 nvh0412 force-pushed the chore/add-a-new-tag-for-client-deadline branch 2 times, most recently from 9c08280 to f9c23a4 Compare August 3, 2022 15:44
@nvh0412
Copy link
Contributor Author

nvh0412 commented Aug 3, 2022

Thank you @TonyCTHsu , addressed.

@TonyCTHsu
Copy link
Contributor

I was wondering whether information is accurate enough to diagnose the DeadlineExceed exception.

What would a good general default value for grpc client deadline?

I am asking because when Time object serialized by to_s, it is returning 2022-08-03 17:51:59 +0200.
This is only providing the resolution in seconds. I assumed that the resolution to milliseconds would be more useful?

@nvh0412
Copy link
Contributor Author

nvh0412 commented Aug 3, 2022

Fair enough, the Time object's serialised value might also affect the Datadog query in our Dashboard. Milliseconds would be better, thanks for suggesting me @TonyCTHsu, addressing.

@nvh0412
Copy link
Contributor Author

nvh0412 commented Aug 3, 2022

Please help me review this PR again, thank you.

@codecov-commenter
Copy link

Codecov Report

Merging #2200 (60d0e4f) into master (8c9806d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2200   +/-   ##
=======================================
  Coverage   97.48%   97.48%           
=======================================
  Files        1043     1043           
  Lines       54429    54438    +9     
=======================================
+ Hits        53058    53068   +10     
+ Misses       1371     1370    -1     
Impacted Files Coverage Δ
...tracing/contrib/grpc/datadog_interceptor/client.rb 97.72% <100.00%> (+0.29%) ⬆️
lib/datadog/tracing/contrib/grpc/ext.rb 100.00% <100.00%> (ø)
...ng/contrib/grpc/datadog_interceptor/client_spec.rb 100.00% <100.00%> (ø)
...adog/tracing/contrib/sidekiq/client_tracer_spec.rb 100.00% <0.00%> (+2.08%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@@ -44,6 +44,9 @@ def annotate!(trace, span, metadata, call)
host, _port = find_host_port(call)
span.set_tag(Tracing::Metadata::Ext::TAG_PEER_HOSTNAME, host) if host

deadline = find_deadline(call)
span.set_tag(Ext::TAG_CLIENT_DEADLINE, (deadline.to_time.to_f * 1000).to_i) if deadline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @nvh0412, could you move the time conversion logic of deadline (from object to number) to the find_deadline method?

I think it looks cleaner if the caller here only has to set the deadline, like so:

Suggested change
span.set_tag(Ext::TAG_CLIENT_DEADLINE, (deadline.to_time.to_f * 1000).to_i) if deadline
span.set_tag(Ext::TAG_CLIENT_DEADLINE, deadline) if deadline

@nvh0412 nvh0412 force-pushed the chore/add-a-new-tag-for-client-deadline branch from f7499de to 39823d8 Compare August 4, 2022 01:52
@nvh0412
Copy link
Contributor Author

nvh0412 commented Aug 4, 2022

Hi @TonyCTHsu @marcotc I addressed your feedbacks, also squashed all commit to one, that's why I have to do a push force. Please help me review this PR again, thank you

@TonyCTHsu
Copy link
Contributor

TonyCTHsu commented Aug 4, 2022

👋 @nvh0412
According grpc. There is no true answer for a good timeout. But it seems to be legit that millisecond resolution would be helpful.

There is no single answer to “What is a good deadline/timeout value?”. Your service might be as simple as the Greeter in our quick start guides, in which case 100 ms would be fine. Your service might be as complex as a globally-distributed and strongly consistent database. The deadline for a client query will be different from how long they should wait for you to drop their table.

I took a deeper dive into grpc , and yes. The deadline value could be a Time or TimeSpec object. When user provides a positive number, it would ends up to become a Time object. It is also possible providing TimeSpec and nil which return TimeSpec.

When I tried to write test examples for TimeSpec, I realized that there are three TimeSpec object, referenced by TimeConsts::ZERO , TimeConsts::INFINITE_FUTURE and TimeConsts::INFINITE_PAST. All three of them are referring to value back in 1960s. I wonder whether those TimeSpec values are important/useful to your case?

For Time serialization,

irb(main):001:0> require 'time'
=> true
irb(main):002:0> Time.now.utc.iso8601(3)
=> "2022-08-04T17:24:09.271Z"

This this would provide a human readable and interchangable format, which string comparison should also work for dashboard query.

if TimeSpec is not important for visibility. Perhaps we don't need to tag it. But if it is, consider placing a tag grpc.client.deadline.typewith [TIMESTAMP, INFINITE_FUTURE, INFINITE_PAST, ZERO] and grpc.client.deadline.timestamp with serialized time. This would avoid confusion of ill-formed data of those.

@nvh0412
Copy link
Contributor Author

nvh0412 commented Aug 5, 2022

I don't think TimeConsts::ZERO, TimeConsts:: INFINITE_FUTURE, TimeConsts::INFINITE_PAST are valuable for us. Also, I just figured out that if we tag this deadline on the client interceptor, its value is always Time ruby, TimeSpec only appear on Server interceptor.

So I do believe we can come up with that we don't need to handle TimeSpec type in this PR because we only change the client interceptor. But I will address the code that ensures we won't tag the deadline if it's TimeSpec.

As for the serialized value of Time object, it's a fifty-fifty split milisecond number or a string value (iso8601 format). Because the millisecond value would help others who want to calculate/compare with other values easier than a string value, I believe. (i.e duration value), so let's keep it as millisecond for now. Wdyt? @TonyCTHsu

@TonyCTHsu
Copy link
Contributor

If TimeSpec is not interested, let's just skip it.

I believe I could approve the change after iso8601 format serialization 😄 .

@nvh0412
Copy link
Contributor Author

nvh0412 commented Aug 5, 2022

Addressed @TonyCTHsu , there are also some flaky tests on the master branch https://app.circleci.com/pipelines/github/DataDog/dd-trace-rb?branch=master.

Copy link
Contributor

@TonyCTHsu TonyCTHsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvh0412 Thanks for the great work! 😘

@TonyCTHsu TonyCTHsu merged commit a468c29 into DataDog:master Aug 8, 2022
@github-actions github-actions bot added this to the 0.54.3 milestone Aug 8, 2022
@TonyCTHsu TonyCTHsu modified the milestones: 0.54.3, 1.4.0 Aug 8, 2022
@nvh0412 nvh0412 deleted the chore/add-a-new-tag-for-client-deadline branch August 13, 2022 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for gRPC client timeout value
4 participants