[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source #1267

bvaradar · 2020-01-21T23:47:05Z

What is the purpose of the pull request

Update documentation to add deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source

Brief change log

[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source

bvaradar · 2020-01-21T23:47:52Z

@vinothchandar @bhasudha @lamber-ken : Please take a look when you get a chance.

vinothchandar

Some content comments.. Please merge once you have made another pass

docs/_docs/2_6_deployment.md

vinothchandar · 2020-01-22T15:59:57Z

docs/_docs/2_6_deployment.md

+
+### Spark Datasource Writer Jobs
+
+As described in [Writing Data](/docs/writing_data.html#datasource-writer), you can use spark datasource to ingest to hudi table. This mechanism allows you to ingest any spark dataframe in Hudi format. Hudi Spark DataSource also supports spark streaming to ingest a streaming source to Hudi table. For Merge On Read table types, inline compaction is turned on by default which runs after every ingestion run. The compaction frequency can be changed by setting the property "hoodie.compact.inline.max.delta.commits". 


reminds me, that we should have async compaction working for spark streaming as well? may be file a JIRA if you agree?

Agree. https://jira.apache.org/jira/browse/HUDI-575

docs/_docs/2_6_deployment.md

bhasudha

Looks good to me. Just minor comment - it would be useful to link to an example command for each of the modes in deltastreamer like @vinothchandar mentioned.

…eltaStreamer and Spark Data Source

bvaradar

@bhasudha Also added examples for each deployment model.

bvaradar requested a review from vinothchandar January 21, 2020 23:47

bvaradar requested a review from bhasudha January 21, 2020 23:48

bvaradar assigned vinothchandar Jan 21, 2020

vinothchandar approved these changes Jan 22, 2020

View reviewed changes

bhasudha approved these changes Jan 23, 2020

View reviewed changes

[HUDI-403] Publish deployment guide for writing to Hudi using HoodieD…

1c5df3d

…eltaStreamer and Spark Data Source

bvaradar force-pushed the hudi-403 branch from 7afab7b to 1c5df3d Compare January 23, 2020 23:45

bvaradar commented Jan 23, 2020

View reviewed changes

bvaradar merged commit 41754bb into apache:asf-site Jan 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source #1267

[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source #1267

bvaradar commented Jan 21, 2020

bvaradar commented Jan 21, 2020 •

edited

vinothchandar left a comment

vinothchandar Jan 22, 2020

bvaradar Jan 23, 2020

bhasudha left a comment

bvaradar left a comment


		### Spark Datasource Writer Jobs

		As described in [Writing Data](/docs/writing_data.html#datasource-writer), you can use spark datasource to ingest to hudi table. This mechanism allows you to ingest any spark dataframe in Hudi format. Hudi Spark DataSource also supports spark streaming to ingest a streaming source to Hudi table. For Merge On Read table types, inline compaction is turned on by default which runs after every ingestion run. The compaction frequency can be changed by setting the property "hoodie.compact.inline.max.delta.commits".

[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source #1267

[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source #1267

Conversation

bvaradar commented Jan 21, 2020

What is the purpose of the pull request

Brief change log

bvaradar commented Jan 21, 2020 • edited

vinothchandar left a comment

Choose a reason for hiding this comment

vinothchandar Jan 22, 2020

Choose a reason for hiding this comment

bvaradar Jan 23, 2020

Choose a reason for hiding this comment

bhasudha left a comment

Choose a reason for hiding this comment

bvaradar left a comment

Choose a reason for hiding this comment

bvaradar commented Jan 21, 2020 •

edited