Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timestamp field to SegmentMetadataEvent #16613

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

maytasm
Copy link
Contributor

@maytasm maytasm commented Jun 15, 2024

Add timestamp field to SegmentMetadataEvent

Description

Every other metrics emitted by Druid has a timestamp field. The fact that the SegmentMetadataEvent metric does not have a timestamp field can cause problem for the metric system like Druid. If you are ingesting Druid's emitted to Druid, then timestamp is very important. If you are running Druid version earlier than 27 (which doesn't have #14413), then ingestion will fail if there is no timestamp. While if you are running Druid version later than 27, you may be dropping these no timestamp metrics. (Note that the above is assuming you do not have missingValue set for your Druid's supervisor that is ingesting Druid emitted metrics).
Anyway, let's just add a timestamp so that all metrics are consistent.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@maytasm maytasm force-pushed the add_timestamp_to_SegmentMetadataEvent branch from 9dd0457 to 07e23b3 Compare June 15, 2024 04:19
@@ -85,7 +86,7 @@ public SegmentMetadataEvent(
)
{
this.dataSource = dataSource;
this.createdTime = createdTime;
this.createdTime = createdTime != null ? createdTime : DateTimes.nowUtc();
Copy link
Contributor

@kfaraz kfaraz Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think this field is ever passed as null in the current usages of this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is never passed as null right now. The current and only caller of the SegmentMetadataEvent.create pass DateTimes.nowUtc() but I want to be extra safe. This is in case some new user comes along and call SegmentMetadataEvent.create without setting the createdTime (either though using the SegmentMetadataEvent.create without passing eventTime or calling the SegmentMetadataEvent constructor without setting createdTime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it makes sense to ensure that the timestamp is always non-null.
But if it is important for the field to be non-null, then we should be more explicit about the value we use for it. We should probably avoid silently converting a null to "now" (though there are other places in the Druid code that do it 😅 ).

So I would advise to do the following:

  • Retain the public constructor but do a validation of this field thus forcing callers to pass a non-null value.
  • Remove arg eventTime from method .create() and always use DateTime.nowUtc().
  • Add a javadoc to both the constructor and create() clarifying this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing something similar to what ServiceMetricEvent does, which most other metrics use. This is what @gianm suggested at https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1715638010942329?thread_ts=1715634244.240639&cid=C0309C9L90D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, this comment was not a blocker anyway. We can proceed with this merge and cleanup usage of both ServiceMetricEvent and SegmentMetadataEvent later.

@@ -104,6 +105,7 @@ public EventMap toMap()

return EventMap.builder()
.put(FEED, getFeed())
.put("timestamp", createdTime.toString())
Copy link
Contributor

@kfaraz kfaraz Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to set the timestamp field, but if we are emitting this with the SegmentMetadataEvent, we can probably stop emitting the createdTime since it contains the same value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The createdTime may already be used by whoever/whatever is consuming this metric. I want to make sure we do not create regression and not remove/modified any existing fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I assumed that you retained it for backwards compat.

But I think it is fine to make a breaking change in this case as I don't think there would be a lot of downstream applications relying on this particular event. At least I have not come across any major use cases. Moreover, as you point out in the description, this event was already behaving differently than the other emitted events. So it's okay to fix that up.

Although, if you feel differently and have come across applications that depend on the createdTime field, we can retain it while marking it as deprecated so that we can get rid of it eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not come across applications that depend on the createdTime field but I am also not the person who added (or even use) this metric haha

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. In that case, I would advise we get rid of the extra field since I couldn't find in the docs either. So I don't think we are required to have backward compatibility in this case.

But if you want to go ahead with this PR, we can do that later too.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending comments are non-blockers.

@kfaraz
Copy link
Contributor

kfaraz commented Jun 21, 2024

@maytasm , there are some test failures. Please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants