docs(blog): new blog for Hudi NBCC#17613
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new blog post about Apache Hudi's Non-Blocking Concurrency Control (NBCC) feature, explaining how it eliminates retry storms for concurrent writers and maximizes throughput compared to the traditional Optimistic Concurrency Control (OCC) approach.
Key changes:
- New blog post with detailed technical explanation of NBCC
- Three PNG image files for diagrams (p3, p4, p6)
- Configuration examples and use case recommendations
Reviewed changes
Copilot reviewed 1 out of 7 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| website/blog/2025-12-16-maximizing-throughput-nbcc.md | Blog post content explaining NBCC concepts, design, and usage |
| website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p3-recordkey-filegroup.png | Diagram showing record key to file group mapping |
| website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p4-completion-time.png | Diagram illustrating completion time ordering |
| website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png | Diagram showing NBCC compaction process |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Picture this scenario: your streaming pipeline ingests clickstream data every minute from multiple Kafka topics. A nightly GDPR deletion job kicks off at midnight, scanning across thousands of partitions to purge user records—also touching data files the ingestion pipeline is actively writing to. By 3 AM, you get paged—the deletion job has failed repeatedly, burning compute resources while the ingestion writer keeps winning the race to commit. | ||
|
|
||
|  |
There was a problem hiding this comment.
Missing image file "p1-occ-retries.png". The blog post references this image on line 22, but only three PNG files are included in the PR (p3-recordkey-filegroup.png, p4-completion-time.png, and p6-nbcc-compaction.png). The images p1-occ-retries.png, p2-nbcc-overview.png, and p5-truetime.gif are referenced but not included.
|  | |
| *(Diagram: OCC retries stacking up under heavy contention between long-running and short-running writers.)* |
|
|
||
| NBCC avoids conflicts by design: let every writer append updates to Hudi’s log files in the Merge-on-Read (MOR) table, then let readers or mergers follow the serialization order based on write completion time. Let's say there are two writers, both updating a record concurrently. Under NBCC, each writer produces its own log file containing the update. Since there's no file contention, there's nothing to conflict on. At read time or during compaction, Hudi follows the write completion time and processes the associated log files in the proper order. | ||
|
|
||
|  |
There was a problem hiding this comment.
Missing image file "p2-nbcc-overview.png". This image is referenced on line 32 but not included in the PR.
|  | |
|  |
|
|
||
| Hudi solves this with a TrueTime-like mechanism inspired by [Google Spanner](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf): | ||
|
|
||
|  |
There was a problem hiding this comment.
Missing image file "p5-truetime.gif". This image is referenced on line 70 but not included in the PR.
|  | |
|  |
No description provided.