Skip to content

[HUDI-1122] Introduce a kafka implementation of hoodie write commit ca…#1886

Merged
yanghua merged 2 commits intoapache:masterfrom
wangxianghu:HUDI-1122
Aug 20, 2020
Merged

[HUDI-1122] Introduce a kafka implementation of hoodie write commit ca…#1886
yanghua merged 2 commits intoapache:masterfrom
wangxianghu:HUDI-1122

Conversation

@wangxianghu
Copy link
Contributor

…llback

Tips

What is the purpose of the pull request

Introduce a kafka implementation of hoodie write commit callback

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@wangxianghu
Copy link
Contributor Author

@yanghua please take a look when free

@wangxianghu wangxianghu force-pushed the HUDI-1122 branch 2 times, most recently from 916cf08 to 6cc6bd5 Compare July 29, 2020 13:58
@yanghua yanghua self-assigned this Jul 30, 2020
@yanghua yanghua self-requested a review July 30, 2020 01:31
Copy link
Contributor

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mathieu1124 Thanks for your contribution. Left some comments you can consider. Did you test this feature in your local?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do not want to add the comments for @param and @return. Just remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do not want to add the comments for @param and @return. Just remove them.

yes, will remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking one thing: the config options of the callback feature for Kafka are left in hudi-client, while the implementation hosts in hudi-utilities, if it's suitable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking one thing: the config options of the callback feature for Kafka are left in hudi-client, while the implementation hosts in hudi-utilities, if it's suitable.

It is truly a problem. since kafka callback implementation is placed in hudi-utilities, the config should be there too. will move it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use try-with-resource or try-catch block to prevent leaking resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use try-with-resource or try-catch block to prevent leaking resources?

yes, will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, moving the content of this class into HoodieWriteCommitKafkaCallback is reasonable. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, moving the content of this class into HoodieWriteCommitKafkaCallback is reasonable. WDYT?

I once hesitated here, kind of messy to host in one class. But it is reasonable, will move

@wangxianghu
Copy link
Contributor Author

@Mathieu1124 Thanks for your contribution. Left some comments you can consider. Did you test this feature in your local?

Yes, I tested it in my local, it works good.

@wangxianghu wangxianghu force-pushed the HUDI-1122 branch 2 times, most recently from c6cf2ed to 4fd9416 Compare August 4, 2020 15:12
Copy link
Contributor

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mathieu1124 Thanks for addressing my concerns. Left one comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users configure http callback? Why we only consider Kafka callback here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users configure http callback? Why we only consider Kafka callback here?

the config of http callback will be Initialized when HoodieWriteClient set up, while kafka's can't. the kafka's config can only be set up in hudi-utilities module, so the logic above come up.

@wangxianghu
Copy link
Contributor Author

Hi @vinothchandar, I was wondering can we move this implement to hudi-client module just like the way all the implementations of metrics does.

put this implementation in hudi-utilities, the Spark API(hudi-spark moudle) can not benefit from it. besides, in the future, when the flink engine merged in , it can not use it either.
WDYT?

@wangxianghu
Copy link
Contributor Author

wangxianghu commented Aug 7, 2020

@yanghua VC seems busy, do you have any other concern about this pr ?
if it is ok, can we merge this first, and file a new pr if VC agree to move this to hudi-client in the future :)

@vinothchandar
Copy link
Member

As the RM sent out the note, we are only landing small PRs and release blockers, so we can keep master stable for cutting the 0.6.0 RC. Apologize for the inconvenience.

@vinothchandar
Copy link
Member

I was wondering can we move this implement to hudi-client module just like the way all the implementations of metrics does.

I think we can move this down the line. hudi-client or hudi-spark talking a direct dependency on kafka does not feel that clean to me. May be file a follow up JIRA?

@wangxianghu
Copy link
Contributor Author

I was wondering can we move this implement to hudi-client module just like the way all the implementations of metrics does.

I think we can move this down the line. hudi-client or hudi-spark talking a direct dependency on kafka does not feel that clean to me. May be file a follow up JIRA?

ok

@wangxianghu wangxianghu force-pushed the HUDI-1122 branch 2 times, most recently from 3a549b3 to 79f73ba Compare August 9, 2020 09:37
Copy link
Contributor

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxianghu Left some new comments.

*/
public static String convertToJsonString(Object obj) {
// convert to json
ObjectMapper mapper = new ObjectMapper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this variable static to be shared in the whole class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

producer.send(record);
LOG.info(String.format("Send callback message %s succeed", callbackMsg));
} catch (Exception e) {
LOG.error("Send kafka callback msg failed : " + e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG.error("Send kafka callback msg failed : ",e);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

kafkaProducerProps.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer");

LOG.info("Callback kafka producer init with configs: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using LOG.debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

props.getProperty(TABLE_NAME),
callbackMsg);
} else {
return new ProducerRecord<String, String>(topic,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we do not need to break the lines here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxianghu Thanks for addressing my concerns. LGTM, will merge it if CI is OK.

@yanghua yanghua changed the title [HUDI-1122]Introduce a kafka implementation of hoodie write commit ca… [HUDI-1122] Introduce a kafka implementation of hoodie write commit ca… Aug 17, 2020
@yanghua
Copy link
Contributor

yanghua commented Aug 20, 2020

I was wondering can we move this implement to hudi-client module just like the way all the implementations of metrics does.

I think we can move this down the line. hudi-client or hudi-spark talking a direct dependency on kafka does not feel that clean to me. May be file a follow up JIRA?

@wangxianghu Please follow vc's suggestion, file a Jira issue to track the future work.

@yanghua yanghua merged commit b883b6d into apache:master Aug 20, 2020
@wangxianghu
Copy link
Contributor Author

hudi-client or hudi-spark talking a direct dependency on kafka

@yanghua thanks for your time. The ticket filed here: https://issues.apache.org/jira/browse/HUDI-1207

@wangxianghu wangxianghu deleted the HUDI-1122 branch August 28, 2020 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants