Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source #1267

Merged
merged 1 commit into from Jan 23, 2020

Conversation

bvaradar
Copy link
Contributor

What is the purpose of the pull request

Update documentation to add deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source

Brief change log

[HUDI-403] Publish deployment guide for writing to Hudi using HoodieDeltaStreamer and Spark Data Source

@bvaradar
Copy link
Contributor Author

bvaradar commented Jan 21, 2020

@vinothchandar @bhasudha @lamber-ken : Please take a look when you get a chance.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some content comments.. Please merge once you have made another pass

docs/_docs/2_6_deployment.md Show resolved Hide resolved
docs/_docs/2_6_deployment.md Outdated Show resolved Hide resolved

### Spark Datasource Writer Jobs

As described in [Writing Data](/docs/writing_data.html#datasource-writer), you can use spark datasource to ingest to hudi table. This mechanism allows you to ingest any spark dataframe in Hudi format. Hudi Spark DataSource also supports spark streaming to ingest a streaming source to Hudi table. For Merge On Read table types, inline compaction is turned on by default which runs after every ingestion run. The compaction frequency can be changed by setting the property "hoodie.compact.inline.max.delta.commits".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminds me, that we should have async compaction working for spark streaming as well? may be file a JIRA if you agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/_docs/2_6_deployment.md Outdated Show resolved Hide resolved
Copy link
Contributor

@bhasudha bhasudha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just minor comment - it would be useful to link to an example command for each of the modes in deltastreamer like @vinothchandar mentioned.

Copy link
Contributor Author

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhasudha Also added examples for each deployment model.

@bvaradar bvaradar merged commit 41754bb into apache:asf-site Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants