fix: Enhance hudi-azure-bundle#18472
Conversation
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Style & Readability Review — One code reuse issue: URI parsing and container validation logic is duplicated between readObject() and writeObject() methods.
| } else { | ||
| logger.error("Error reading JSON config file: {}", filePath, e); | ||
| } | ||
| return Option.empty(); |
There was a problem hiding this comment.
🤖 nit: this URI parsing and container validation (lines 288–297) is duplicated from readObject(). Could you extract into a private helper method?
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for contributing! The overall structure of AzureStorageLockClient is clean and follows the S3/GCS pattern well. There's one functional bug worth addressing in the ETag handling before merging: the quote-stripping in readCurrentLockFile creates an inconsistency with the ETag format expected by BlobRequestConditions.setIfMatch(), which will break the "take over expired lock" scenario. Details in the inline comment.
| String eTag = response.getHeaders().getValue("ETag"); | ||
| if (eTag != null) { | ||
| // Azure returns ETags wrapped in quotes, remove them | ||
| eTag = eTag.replaceAll("^\"|\"$", ""); |
There was a problem hiding this comment.
🤖 This quote-stripping creates an ETag format inconsistency that breaks conditional writes for the expired-lock takeover path. BlockBlobItem.getETag() (used in createOrUpdateLockFileInternal) returns the ETag WITH surrounding double-quotes (e.g. "0x8D4A"), which is exactly what BlobRequestConditions.setIfMatch() expects — it passes the value directly to the If-Match header, so the quotes must be present for a valid HTTP conditional request. By stripping them here, any setIfMatch call using an ETag sourced from readCurrentLockFile sends If-Match: 0x8D4A (unquoted), while a call using an ETag from a write sends If-Match: "0x8D4A" (correctly quoted). Removing the replaceAll and keeping the raw header value would align both paths and avoid this failure mode.
260619a to
e2d8a0a
Compare
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR turns hudi-azure-bundle into a proper shaded fat jar with Netty/Reactor relocation, mirroring the AWS and GCP bundles. One question on a build-config divergence from the sibling bundles worth a second look. Please take a look at the inline comment, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of spots where a short explanatory comment would save future readers some head-scratching.
| <resource> | ||
| <directory>src/main/resources</directory> | ||
| </resource> | ||
| <resource> |
There was a problem hiding this comment.
🤖 Is there a reason to add src/test/resources to the main <resources> block? Maven's standard layout already exposes it for the test phase, and adding it here will package any test resources into the production shaded jar. The sibling hudi-aws-bundle and hudi-gcp-bundle only declare src/main/resources. Could you drop this entry unless something specifically requires it?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| <directory>src/main/resources</directory> | ||
| </resource> | ||
| <resource> | ||
| <directory>src/test/resources</directory> |
There was a problem hiding this comment.
🤖 nit: including src/test/resources in the main build resources is unusual — could you add a comment explaining why it's needed here (e.g. which runtime config files live there that the bundle requires)?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
There was a problem hiding this comment.
Good catch. This should be removed.
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
| <directory>src/main/resources</directory> | ||
| </resource> | ||
| <resource> | ||
| <directory>src/test/resources</directory> |
There was a problem hiding this comment.
Good catch. This should be removed.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18472 +/- ##
============================================
+ Coverage 68.14% 68.20% +0.05%
- Complexity 29105 29241 +136
============================================
Files 2518 2525 +7
Lines 141221 141667 +446
Branches 17534 17588 +54
============================================
+ Hits 96237 96625 +388
- Misses 37068 37088 +20
- Partials 7916 7954 +38
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
#18471
When running Hudi Spark jobs on Azure (ADLS Gen2), the Azure Storage SDK's Netty and Reactor dependencies conflict with Spark's bundled Netty, causing runtime
NoSuchMethodErrorandStacklessClosedChannelExceptionduring lock acquisition. Specifically,reactor-netty-httpcallsHttpClientCodec.<init>(HttpDecoderConfig, boolean, boolean)— a constructor that only exists in Netty 4.1.94+ — but Spark's older NettyHttpClientCodecis loaded instead. This makes the Azure-basedStorageBasedLockProvider(added in #17951) unusable in Spark environments.Additionally,
hudi-azure-bundleas it stands on master does not include reactor-netty, the Azure identity transitive deps, or Netty/Reactor relocations, so users still have to manage those jars manually on the Spark classpath.Summary and Changelog
This PR is purely additive on top of master's existing
hudi-azure-bundleskeleton. All changes are inpackaging/hudi-azure-bundle/pom.xml:com.nimbusds:*andnet.minidev:*— Azure identity transitive deps required forDefaultAzureCredentialand related auth providers.io.projectreactor.netty:*— the actual reactor-netty HTTP client used by the Azure SDK.org.reactivestreams:reactive-streams— required by Reactor.io.netty.*→org.apache.hudi.io.netty.*io.projectreactor.*→org.apache.hudi.io.projectreactor.*reactor.*→org.apache.hudi.reactor.*org.reactivestreams.*→org.apache.hudi.org.reactivestreams.*Note: this PR was rebased onto current master. The original
AzureStorageLockClientcommits were dropped because master already has an implementation of that class via #17951; this PR now contains only the bundle module additions.Impact
StorageBasedLockProviderto work reliably on Azure/ADLS Gen2 in Spark environments.reactor-netty,reactor-core,reactive-streams, andnetty-resolver-dnsjars on the Spark classpath — everything is self-contained and relocated in the bundle.Risk Level
Low
packaging/hudi-azure-bundle/pom.xml; no source changes.Documentation Update
none
Contributor's checklist