Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-1701 simple JSON mapping #1327

Closed
wants to merge 1 commit into from

Conversation

kristopherkane
Copy link

No description provided.

} catch (ParseException e) {
this.collector.reportError(e);
this.collector.fail(tuple);
}
Get get = hBaseClient.constructGetRequests(rowKey, projectionCriteria);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rowKey(tuple) call throws ParseException should the rest of the method be skipped too? Can it do anything useful without a valid rowkey?

@dossett
Copy link
Contributor

dossett commented Apr 11, 2016

There is an interesting issue here. If the tuple contains malformed JSON such that rowKey(tuple) throws a ParseException then the HBase bolt will never process that message, so a retry seems fruitless. When a similar situation happens in HiveBolt it acknowledges the tuple, logs the error, and discards the message (https://github.com/apache/storm/blob/master/external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java#L129-L132).

Something similar might be nice here with a new InvalidTuple exception into which each implementation of HBaseMapper can wrap its specific "this tuple will never work" exceptions. That would also have the nice benefit of keeping an implementation specific detail (throws ParseException) out of the declaration of the very generic HBaseMapper and make it easier for users to implement their own mappers and have these errors handled correctly.

This approach could be extended to any bolt that uses the mapper strategy (e.g. ElasticSearch) or could have its own "will never work" errors, e.g. the Avro bolt receiving non-avro messages.

@kristopherkane
Copy link
Author

You're right. There is no sense in replaying the same broken input. I'm conflicted on what the right approach is to notify the topology owner of corrupt JSON or a missing row key. I don't think it is worthy of a collector.reportError() and am wondering if it should be INFO or WARN logging.

I've got the the changes you have suggested complete minus the desired notification. How does this go with the GitHub integration? Squash on my fork and this PR is automatically updated?

@HeartSaVioR
Copy link
Contributor

@kristopherkane Yes, squash and push with force option.

bipinprasad pushed a commit to bipinprasad/storm that referenced this pull request Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants