New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source #1267
Conversation
@vinothchandar @bhasudha @lamber-ken : Please take a look when you get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some content comments.. Please merge once you have made another pass
|
||
### Spark Datasource Writer Jobs | ||
|
||
As described in [Writing Data](/docs/writing_data.html#datasource-writer), you can use spark datasource to ingest to hudi table. This mechanism allows you to ingest any spark dataframe in Hudi format. Hudi Spark DataSource also supports spark streaming to ingest a streaming source to Hudi table. For Merge On Read table types, inline compaction is turned on by default which runs after every ingestion run. The compaction frequency can be changed by setting the property "hoodie.compact.inline.max.delta.commits". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminds me, that we should have async compaction working for spark streaming as well? may be file a JIRA if you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just minor comment - it would be useful to link to an example command for each of the modes in deltastreamer like @vinothchandar mentioned.
…eltaStreamer and Spark Data Source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhasudha Also added examples for each deployment model.
What is the purpose of the pull request
Update documentation to add deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source
Brief change log
[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source