fix(common): normalize hudi table base path in StorageBasedLockProvider#18817
Conversation
…iders
The implicit-key lock providers - DynamoDBBasedImplicitPartitionKeyLockProvider
and ZookeeperBasedImplicitBasePathLockProvider - hash the hudi table base path
to derive a DynamoDB partition key / Zookeeper znode for the lock. The hash
function (XXH64) is avalanche: any byte-level difference in the input produces
a completely different output. Today the only normalization applied before
hashing is s3aToS3 (s3a:// -> s3://). Trailing slashes, repeated slashes, and
surrounding whitespace pass through unchanged.
When two writers for the same hudi table disagree on those benign formatting
details - e.g. one engine supplies "s3://bucket/table" while another supplies
"s3://bucket/table/" - they end up acquiring different lock rows / znodes and
lose mutual exclusion, even though they are targeting the same table. Two
writers that should serialize can then write concurrently and corrupt the
hudi timeline.
Fix: introduce FSUtils.normalizeBasePathForLocking() as the single source of
truth for canonicalization before hashing:
1. Reject null / empty / whitespace-only basePath.
2. trim() surrounding whitespace.
3. Apply existing s3aToS3 (case-insensitive s3a:// -> s3://).
4. Strip all trailing '/' then add exactly one.
Both implicit-key lock providers now route through it. The Dynamo provider
gains a small public static derivePartitionKey(String) so the formula is
testable without a DynamoDB client.
Inner consecutive slashes (s3://b//x vs s3://b/x) are intentionally NOT
collapsed - they can resolve to a legitimate S3 key.
Tests:
- TestFSUtils#testNormalizeBasePathForLocking exercises the normalization
rules directly: trailing slash, multi-slash, whitespace, s3a/s3 schemes,
inner-slash preservation, null/empty rejection.
- TestDynamoDBBasedImplicitPartitionKeyLockProvider verifies that all
trailing-slash, multi-slash, whitespace, and s3a variants of the same
base path produce the same DynamoDB partition key.
- TestZookeeperBasedImplicitBasePathLockProvider verifies the same
invariant for the Zookeeper lock base path.
Compatibility note: locks taken under the previous (no-trailing-slash or
whitespace-sensitive) form effectively orphan at deploy time, since the new
code looks at a different lock row / znode for the same logical table.
Deploys should be coordinated across all writers that share a hudi table,
or accept a brief writer quiesce while rolling out.
- normalizeBasePathForLocking: reject scheme-only inputs (s3://, s3a:///) and all-slash inputs that strip to nothing meaningful to hash - DynamoDBBasedImplicitPartitionKeyLockProvider#derivePartitionKey: switch String.format to SLF4J parameterized logging (consistent with the sibling ZK provider); javadoc note explaining the static helper accepts raw input (super-constructor ordering precludes using the instance field) - ZookeeperBasedImplicitBasePathLockProvider#getLockBasePath: same javadoc note for symmetry - ITTestDynamoDBBasedLockProvider#testAcquireLock: compare against the normalized (trailing-slash) form of the hash input — the previous assertion would have failed after the canonicalization change - TestFSUtils: cover s3:///, s3a:///, all-slash, and a special-char path
The private field stores the post-normalization basePath; rename it so the name reflects that storage shape, matching what the constructor actually assigns.
Two writers for the same Hudi table must derive the same lockFilePath or they lose mutual exclusion. Today the path is built directly from the raw config basePath; while StoragePath absorbs trailing-slash drift, scheme-case drift (s3a:// vs s3://) and surrounding whitespace do route to different lock files. Route the basePath through FSUtils.normalizeBasePathForLocking (the same helper used by the implicit-key LPs from the parent commit — single canonicalization API across all lock providers) before deriving lockFilePath. Rename the stored field to normalizedHudiTableBasePath so the name reflects what is stored. Tests: 3 end-to-end invariance tests on lockFilePath (scheme drift, whitespace, trailing slash). The normalization helper itself is covered by TestFSUtils#testNormalizeBasePathForLocking from the parent commit.
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR completes the basePath canonicalization story by routing StorageBasedLockProvider through FSUtils.normalizeBasePathForLocking, matching the pattern already applied to the implicit-key providers. The helper is well-tested (edge cases for scheme, whitespace, trailing slashes, and scheme-only/all-slash rejection), derivePartitionKey is a nice testability extraction with clear Javadoc on why the instance field can't be used at that point, and parseBucketAndPath in the underlying S3/GCS lock clients preserves audit-config and lock-object identity across the scheme change. The hash-based rollout concern for DynamoDB/Zookeeper providers was already raised and acknowledged on #18814 so I won't reopen it here. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One minor naming suggestion below — the new normalizedHudiTableBasePath field name carries redundant context across the three lock provider classes; otherwise the code is clean and well-documented.
cc @yihua
| private final Option<HoodieLockMetrics> hoodieLockMetrics; | ||
| private Option<AuditService> auditService; | ||
| private final String basePath; | ||
| private final String normalizedHudiTableBasePath; |
There was a problem hiding this comment.
🤖 nit: normalizedHudiTableBasePath feels a bit verbose — within a class that's exclusively a Hudi table lock provider, the HudiTable infix is redundant context. Could you shorten to normalizedBasePath? Same pattern applies in ZookeeperBasedImplicitBasePathLockProvider and DynamoDBBasedImplicitPartitionKeyLockProvider.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18817 +/- ##
=========================================
Coverage 68.92% 68.92%
- Complexity 29076 29086 +10
=========================================
Files 2509 2509
Lines 139470 139485 +15
Branches 17117 17120 +3
=========================================
+ Hits 96130 96144 +14
- Misses 35584 35588 +4
+ Partials 7756 7753 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
Follow-up to #18814, which introduced
FSUtils.normalizeBasePathForLockingfor the implicit-key lock providers.StorageBasedLockProviderderiveslockFilePathdirectly from the raw `hoodie.base.path` config value:```java
this.basePath = config.getHudiTableBasePath();
String lockFolderPath = StorageLockClient.getLockFolderPath(basePath); // new StoragePath(basePath, ".locks")
this.lockFilePath = new StoragePath(lockFolderPath, DEFAULT_TABLE_LOCK_FILE_NAME).toString();
```
`StoragePath` absorbs trailing-slash drift, so `s3://b/t` and `s3://b/t/` already produce the same lock file. But two other classes of benign basePath drift do not get absorbed and route the writers to different lock files:
Two writers that disagree on those benign formatting details acquire different lock files and lose mutual exclusion. Same class of bug as #18814, lower severity because the divergence is a visible second lock file rather than a silent hash split, but still a correctness issue.
Summary and Changelog
Route the raw basePath through `FSUtils.normalizeBasePathForLocking` (the helper introduced in #18814) before building `lockFilePath`. Using the same canonicalization API as the implicit-key LPs keeps the contract for "what counts as the same Hudi table" consistent across all lock providers.
Also rename the stored field `basePath` → `normalizedHudiTableBasePath` so the name reflects what's actually stored.
Impact
Risk Level
low
The change only affects how raw config strings are canonicalized before being fed to existing path-derivation code. Existing trailing-slash callers will see no change in derived lockFilePath (`StoragePath` already absorbed that). The only callers whose lock file moves are those that supplied an `s3a://...` basePath or surrounding whitespace.
Mitigation:
Documentation Update
none — no user-facing config or website change.
Contributor's checklist
Note: This PR is stacked on top of #18814 — its diff currently includes the parent PR's changes until #18814 merges. The StorageBasedLockProvider-specific commit is the last one in the branch.