Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider null inputRows and parse errors as unparseable during realtime ingestion. #1506

Merged
merged 1 commit into from
Jul 13, 2015

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jul 9, 2015

Consider null inputRows as unparseable during realtime ingestion. Also, harmonize exception handling between the RealtimeIndexTask and the RealtimeManager.

Related to #1350

continue;
}
}
catch (Exception e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ParseException ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will change this to ParseException, both here and in the RealtimeManager (which has been using Exception). ParseException is probably better since we're wanting to log & continue on formatting problems but not on, like, network glitches and stuff.

@nishantmonu51
Copy link
Member

a simple test to verify the behaviour would be nice.

@gianm
Copy link
Contributor Author

gianm commented Jul 9, 2015

Agree that a test would be nice, I could do one at some point before the PR is merged. i.e., Soon™

For now though does the general approach seem reasonable? Especially in the context of #1350?

@gianm gianm changed the title Consider null inputRows as unparseable during realtime ingestion. Consider null inputRows and parse errors as unparseable during realtime ingestion. Jul 9, 2015
@nishantmonu51
Copy link
Member

LGTM.

@@ -252,8 +252,14 @@ public void run()
try {
try {
inputRow = firehose.nextRow();

if (inputRow == null) {
log.debug("thrown away null input row, considering unparseable");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to give a row # in the debug log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like how many inputRows have been read up until this one? Yes, but would that actually be helpful? I would think it's not helpful and what you'd actually want is the kafka partition and offset. But there isn't really a way to get that at this point in the code, since it's buried inside the firehose.

That could potentially be solved by having Firehoses return InputRowAndMetadata (with two methods: getRow and getMetadata) where the metadata is something that we can toString in this error message. But, that's a Firehose interface change, which is tough to do since it's an external api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also seems like it would add a lot of overhead for what is arguably a corner case.

Ultimately it would be nice if "where did my failure happen" had some sort of breadcrumb here, but given the nature of ingestion it probably makes more sense to have a validation layer well before this point in the data stream (aka not in druid). And any failures here could be assumed to be fault in the communication, making this a safeguard to prevent catastrophic behavior on rare events.

Does that sound like a correct statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think validation pre-Druid is the way to go, so I agree with the statement that unparseable events should be rare and probably indicate something wrong with the data pipeline.

@fjy
Copy link
Contributor

fjy commented Jul 9, 2015

👍

plumber.persist(firehose.commit());
nextFlush = new DateTime().plus(intermediatePersistPeriod).getMillis();
}

continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change make it possible to try and persist an empty event sequence? If I recall correctly that throws an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example: long time until first event, first event is null, plumber tries to flush without having events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because there's a continue; after a null inputRow is found, so the loop will go directly back to firehose.hasMore() + firehose.nextRow() before trying to persist anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, missed it. Thanks!

@drcrallen
Copy link
Contributor

Is this covered in a test somewhere?

…me ingestion.

Also, harmonize exception handling between the RealtimeIndexTask and the RealtimeManager.
Conditions other than null inputRows and parse errors bubble up in both.
@gianm
Copy link
Contributor Author

gianm commented Jul 12, 2015

Added a test.

@nishantmonu51
Copy link
Member

+1, once travis is green :)

@himanshug
Copy link
Contributor

closing/reopening to rebuild, build failure seems unrelated

@himanshug himanshug closed this Jul 13, 2015
@himanshug himanshug reopened this Jul 13, 2015
himanshug added a commit that referenced this pull request Jul 13, 2015
Consider null inputRows and parse errors as unparseable during realtime ingestion.
@himanshug himanshug merged commit 725086c into apache:master Jul 13, 2015
@xvrl
Copy link
Member

xvrl commented Jul 14, 2015

@gianm I think @cheddar had some comments in #1350 that were related, did those get addressed in here?

@gianm
Copy link
Contributor Author

gianm commented Jul 14, 2015

@xvrl I'm actually not totally sure what @cheddar had in mind

@xvrl
Copy link
Member

xvrl commented Jul 14, 2015

@gianm ok, maybe you and @cheddar can chat, and figure out if #1350 is still relevant or not?

seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
@gianm gianm deleted the realtime-plumber-nulls branch September 23, 2022 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants