Skip to content

docs(blog): new blog for Hudi NBCC#17613

Merged
xushiyan merged 1 commit into
apache:asf-sitefrom
xushiyan:nbcc-blog
Dec 16, 2025
Merged

docs(blog): new blog for Hudi NBCC#17613
xushiyan merged 1 commit into
apache:asf-sitefrom
xushiyan:nbcc-blog

Conversation

@xushiyan
Copy link
Copy Markdown
Member

No description provided.

Copilot AI review requested due to automatic review settings December 16, 2025 22:27
@xushiyan xushiyan merged commit 280efa6 into apache:asf-site Dec 16, 2025
3 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new blog post about Apache Hudi's Non-Blocking Concurrency Control (NBCC) feature, explaining how it eliminates retry storms for concurrent writers and maximizes throughput compared to the traditional Optimistic Concurrency Control (OCC) approach.

Key changes:

  • New blog post with detailed technical explanation of NBCC
  • Three PNG image files for diagrams (p3, p4, p6)
  • Configuration examples and use case recommendations

Reviewed changes

Copilot reviewed 1 out of 7 changed files in this pull request and generated 3 comments.

File Description
website/blog/2025-12-16-maximizing-throughput-nbcc.md Blog post content explaining NBCC concepts, design, and usage
website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p3-recordkey-filegroup.png Diagram showing record key to file group mapping
website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p4-completion-time.png Diagram illustrating completion time ordering
website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png Diagram showing NBCC compaction process

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


Picture this scenario: your streaming pipeline ingests clickstream data every minute from multiple Kafka topics. A nightly GDPR deletion job kicks off at midnight, scanning across thousands of partitions to purge user records—also touching data files the ingestion pipeline is actively writing to. By 3 AM, you get paged—the deletion job has failed repeatedly, burning compute resources while the ingestion writer keeps winning the race to commit.

![p1-occ-retries](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p1-occ-retries.png)
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing image file "p1-occ-retries.png". The blog post references this image on line 22, but only three PNG files are included in the PR (p3-recordkey-filegroup.png, p4-completion-time.png, and p6-nbcc-compaction.png). The images p1-occ-retries.png, p2-nbcc-overview.png, and p5-truetime.gif are referenced but not included.

Suggested change
![p1-occ-retries](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p1-occ-retries.png)
*(Diagram: OCC retries stacking up under heavy contention between long-running and short-running writers.)*

Copilot uses AI. Check for mistakes.

NBCC avoids conflicts by design: let every writer append updates to Hudi’s log files in the Merge-on-Read (MOR) table, then let readers or mergers follow the serialization order based on write completion time. Let's say there are two writers, both updating a record concurrently. Under NBCC, each writer produces its own log file containing the update. Since there's no file contention, there's nothing to conflict on. At read time or during compaction, Hudi follows the write completion time and processes the associated log files in the proper order.

![p2-nbcc-overview](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p2-nbcc-overview.png)
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing image file "p2-nbcc-overview.png". This image is referenced on line 32 but not included in the PR.

Suggested change
![p2-nbcc-overview](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p2-nbcc-overview.png)
![p2-nbcc-overview](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png)

Copilot uses AI. Check for mistakes.

Hudi solves this with a TrueTime-like mechanism inspired by [Google Spanner](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf):

![p5-truetime](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p5-truetime.gif)
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing image file "p5-truetime.gif". This image is referenced on line 70 but not included in the PR.

Suggested change
![p5-truetime](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p5-truetime.gif)
![p5-truetime](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p5-truetime.png)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants