Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-236] Add a ControlMessage injector as a RecordStreamProcessor #2090

Closed
wants to merge 5 commits into from

Conversation

htran1
Copy link
Contributor

@htran1 htran1 commented Sep 5, 2017

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    Add a ControlMessage injector as a RecordStreamProcessor.
    A ControlMessageInjector inspects an incoming record and can inject ControlMessages based on the content of the incoming record.

One use case for this is the injection of MetadataUpdateControlMessages when the latest schema has been updated to trigger the update of the schema used by other constructs downstream from the ControlMessageInjector.

Long running jobs are more likely to encounter issues due to the schema being updated after the start of the job.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@htran1
Copy link
Contributor Author

htran1 commented Sep 5, 2017

@ibuenros, please review.

@zxcware
Copy link
Contributor

zxcware commented Sep 7, 2017

Hi @htran1, could you add more description in the PR or the jira ticket about the motivation and general idea of control message injector? This can help reviewers obtain a direction of what to look for in terms of design and implementation.

public class GlobalMetadata<S> implements Copyable<GlobalMetadata<S>> {
@Getter
private S schema;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to provide more functional methods:

/** Generate a copy of this object with a new schema. */
public GlobalMetadata<S> withSchema(S newSchema);

That way, when we add new attributes to GlobalMetadata, converters will require no change and still copy the correct metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, added a builder.

.getSchema(), workUnitState));
out = new MetadataUpdateControlMessage<SO, DO>(this.outputGlobalMetadata);
}

getMessageHandler().handleMessage((ControlMessage) in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this receive in or out? If the decision is for in, then I would prefer this line be before the MetadataUpdateControlMessage block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, moved it before. I think in should be the one to be handled so that the handler gets a chance to handle the control message without modifications. The handler can choose to do schema conversion if it needs it.

* @param workUnitState a {@link WorkUnitState} object carrying configuration properties
* @return an initialized {@link ControlMessageInjector} instance
*/
public ControlMessageInjector<SI, DI> init(WorkUnitState workUnitState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be protected instead? Same with all methods except processStream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

this.converter = closer.register(new MultiConverter(converters));

// can't have both record stream processors and converter lists configured
if (!this.recordStreamProcessors.isEmpty() && !converters.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it a bit clearer, can you do something like Preconditions.checkState(!this.recordStreamProcessor.isEmpty() ^ !this.converters.isEmpty(), "Mesasge...");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to call the closer here and both empty is valid so ^ won't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added precondition check wrapped by a try catch block.

if (in instanceof MetadataUpdateControlMessage) {
setInputGlobalMetadata(((MetadataUpdateControlMessage) in).getGlobalMetadata(),
workUnitState);
out = new MetadataUpdateControlMessage<SI, DI>(this.inputGlobalMetadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a new output message? It seems to be identical to the input message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setInputGlobalMetadata() can set this.inputGlobalMetadata to something other than the input metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed inputGlobalMetadata from this class so that there is no confusion over the propagation of metadata. A ControlMessageInjector can store a modified copy of the metadata, but it won't be able to pass that change along to the next construct. Only converters should change schemas.


// update the output schema with the new input schema from the MetadataUpdateControlMessage
if (in instanceof MetadataUpdateControlMessage) {
this.outputGlobalMetadata = GlobalMetadata.<SI, SO>builderWithInput(inputStream.getGlobalMetadata(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be GlobalMetadata.builderWithInput(in.getMetadata, Optional.of...)? (instead of inputStream)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there are no other fields, so it either works. The original idea was to start with the inputStream GlobalMetadata and then overlay from in. But I think the suggestion of using in.getGlobalMetadata() is good since the copy will be handled in builderWithInput.

@ibuenros
Copy link
Contributor

+1

@asfgit asfgit closed this in 3a03573 Sep 14, 2017
jack-moseley pushed a commit to jack-moseley/gobblin that referenced this pull request Sep 26, 2017
zxliucmu pushed a commit to zxliucmu/incubator-gobblin that referenced this pull request Nov 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants