Skip to content

feat: add new Hudi demo app hudi-notebooks#14023

Merged
xushiyan merged 4 commits into
apache:masterfrom
deepakpanda93:br_docker_notebooks
Oct 8, 2025
Merged

feat: add new Hudi demo app hudi-notebooks#14023
xushiyan merged 4 commits into
apache:masterfrom
deepakpanda93:br_docker_notebooks

Conversation

@deepakpanda93
Copy link
Copy Markdown
Collaborator

Describe the issue this Pull Request addresses

This PR adds initial support for running Apache Hudi with PySpark inside Jupyter notebooks. While Hudi already supports PySpark, notebook environments like Jupyter require additional handling for:

  • Spark session setup with Hudi-specific JARs
  • Common pitfalls when using Hudi SQL procedures from spark.sql() in Python notebooks

This improvement aims to make it easier for developers to experiment with Hudi inside Jupyter notebooks

Summary and Changelog

  • Adds utility functions for initializing Hudi within Jupyter notebooks
  • Clarifies the usage of Hudi's SQL procedures inside notebook cells

Impact

This change improves the developer experience for PySpark and Jupyter users who work with Apache Hudi. There is no change to core APIs or storage format, and no impact on runtime behavior.

Risk Level

none

Documentation Update

  • Yes. It is in plan to add Jupyter-compatible examples and usage notes
  • Will submit follow-up PR to asf-site under the "Getting Started" or "How-To Guides" section

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions Bot added the size:XL PR with lines of changes > 1000 label Oct 1, 2025
@deepakpanda93 deepakpanda93 marked this pull request as draft October 1, 2025 07:24
@hudi-bot
Copy link
Copy Markdown
Collaborator

hudi-bot commented Oct 7, 2025

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Copy Markdown
Member

@xushiyan xushiyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm on a skim. this PR is self-contained within hudi-notebooks/ not affecting other code paths. expect more future iterations on the content

@xushiyan xushiyan marked this pull request as ready for review October 8, 2025 02:42
@xushiyan
Copy link
Copy Markdown
Member

xushiyan commented Oct 8, 2025

@deepakpanda93 please research on how to add tests for notebooks so the content can be verified by CI. this can be done in next PR specifically for adding tests to CI. it can be a separate job in CI just to test notebooks

@xushiyan xushiyan changed the title feat: Add support for jupyter notebooks feat: add new Hudi demo app hudi-notebooks Oct 8, 2025
@xushiyan xushiyan merged commit ab23ac6 into apache:master Oct 8, 2025
68 of 72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants