Skip to content

add time in queue#1481

Merged
tylerbenson merged 3 commits into
masterfrom
ziquan/kafka_time_in_queue
May 21, 2020
Merged

add time in queue#1481
tylerbenson merged 3 commits into
masterfrom
ziquan/kafka_time_in_queue

Conversation

@ziquanmiao
Copy link
Copy Markdown
Contributor

Adds visibility on how long transactions spend in a kafka queue
submits the time difference between the start of the consumer span and the record's timestamp to generate an estimation of time spent in queue

@ziquanmiao ziquanmiao requested a review from a team as a code owner May 21, 2020 01:30
Copy link
Copy Markdown
Contributor

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but we should floor wall clock based durations at zero. This also doesn't necessarily measure time in the queue; it depends on whether the timestamp type is CreationTime (timestamp taken at sender) or LogAppendTime (timestamp taken at broker). It's only really the time spent in the queue if the timestamp is a log append time. There should be different tag names based on this, e.g. "record.total_time" if it's a creation timestamp. As you're probably aware log.message.timestamp.type may be configured at the topic level, so traced applications could see very different values for this tag otherwise.


final long produceTime = record.timestamp();
final long consumeTime = TimeUnit.NANOSECONDS.toMillis(span.getLocalRootSpan().getStartTime());
span.setTag("record.time_in_queue_milliseconds", consumeTime - produceTime);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both timestamps are wall clock based, this duration can be negative. If that would break things elsewhere, it should be guarded with a Math.max(consumeTime - produceTime, 0L)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with using max. Also, can we rename to record.queue_time_ms? I think spelling it out is unnecessarily verbose.


final long produceTime = record.timestamp();
final long consumeTime = TimeUnit.NANOSECONDS.toMillis(span.getLocalRootSpan().getStartTime());
span.setTag("record.time_in_queue_milliseconds", consumeTime - produceTime);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add "record.time_in_queue_milliseconds" to DDTags we can avoid materialising its UTF-8 encoding during serialisation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we haven't really done this to all the integrations... for example, see line 62-63. Let's not worry about it in this PR.

Copy link
Copy Markdown
Contributor

@tylerbenson tylerbenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @ziquanmiao! A few changes below... You also need to update KafkaClientTest and KafkaStreamsTest with the new tag.


final long produceTime = record.timestamp();
final long consumeTime = TimeUnit.NANOSECONDS.toMillis(span.getLocalRootSpan().getStartTime());
span.setTag("record.time_in_queue_milliseconds", consumeTime - produceTime);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we haven't really done this to all the integrations... for example, see line 62-63. Let's not worry about it in this PR.


final long produceTime = record.timestamp();
final long consumeTime = TimeUnit.NANOSECONDS.toMillis(span.getLocalRootSpan().getStartTime());
span.setTag("record.time_in_queue_milliseconds", consumeTime - produceTime);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with using max. Also, can we rename to record.queue_time_ms? I think spelling it out is unnecessarily verbose.

span.setTag("offset", record.offset());

final long produceTime = record.timestamp();
final long consumeTime = TimeUnit.NANOSECONDS.toMillis(span.getLocalRootSpan().getStartTime());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final long consumeTime = TimeUnit.NANOSECONDS.toMillis(span.getLocalRootSpan().getStartTime());
final long consumeTime = TimeUnit.NANOSECONDS.toMillis(span.getStartTime());

Semantically, the span passed into the method is the one representing the consuming of the message. Besides, it will always be the root span anyway.

@tylerbenson
Copy link
Copy Markdown
Contributor

@richardstartin I don't think we'd be able to distinguish which kind of timestamp is represented. Perhaps it would be better to use something more generic like record.age.ms?

@richardstartin
Copy link
Copy Markdown
Contributor

@tylerbenson to distinguish just look at the timestamp type on the consumer record :)

@richardstartin
Copy link
Copy Markdown
Contributor

@ziquanmiao I've updated the tests and applied some of the suggestions made. We can forget about the timestamp type for the time being. Just be aware if becomes pertinent to know whether it's a broker or sender timestamp, we could do something about it.

Copy link
Copy Markdown
Contributor

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tylerbenson tylerbenson merged commit d964f18 into master May 21, 2020
@tylerbenson tylerbenson deleted the ziquan/kafka_time_in_queue branch May 21, 2020 20:00
@github-actions github-actions Bot added this to the 0.53.0 milestone May 21, 2020
iNikem pushed a commit to open-telemetry/opentelemetry-java-instrumentation that referenced this pull request Jun 2, 2020
* Add time in queue (DataDog/dd-trace-java#1481)

* Minor upgrades (DataDog/dd-trace-java#1495)

* Allow user to disable kafka time in queue tag (DataDog/dd-trace-java#1487)

* Replace Set<Integer> with BitSet for HTTP statuses (DataDog/dd-trace-java#1496)

* Register WeakMapProvider earlier in AgentInstaller (DataDog/dd-trace-java#1480)

* Update codenarc (DataDog/dd-trace-java#1500)

Co-authored-by: Tyler Benson <tyler.benson@datadoghq.com>
Co-authored-by: Nikolay Martynov <mar.kolya@gmail.com>
Co-authored-by: Richard Startin <richard.startin@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants