fix(utils): include subpath in resolveCustomerSecretsName to prevent credential collisions#1577
Conversation
…credential collisions Subpath sites on the same domain (e.g. nba.com/kings, nba.com/lakers) were sharing a single Secrets Manager key because only the hostname was used. Now the full path is included so each subpath site gets its own secret key. Backward compatible: root-domain sites produce the same key as before. Fixes LLMO-4186 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alinarublea
left a comment
There was a problem hiding this comment.
Code review
The fix is correct and backward compatible. Two small observations:
url.host vs url.hostname
The implementation uses url.host (which includes the port, e.g. nba.com:8080) rather than url.hostname. This matches the pre-existing behavior so it is not a regression, but generateDataFolder in spacecat-api-service (the companion fix for LLMO-4186) uses url.hostname. Worth aligning the two for consistency, or at minimum documenting the difference.
Missing test case for nested paths
https://nba.com/us/kings → nba_com_us_kings is the expected behavior (correct), but there is no test covering nested subpaths. A single extra assertion in the new test would document this and guard against future regressions.
Neither is a blocker — the core fix is solid.
- Switch from url.host to url.hostname for consistency with generateDataFolder - Add test for nested subpaths (e.g. /us/kings) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This PR will trigger a patch release when merged. |
dzehnder
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Thanks for the focused fix and the upfront self-review. The core change is correct and the tests cover the headline scenarios. The cluster of concerns below is mostly about what happens to credentials that already exist under the old key shape, plus a few normalization edges that the new path-based key does not fully close.
Strengths
- The fix correctly closes a real cross-tenant credential collision: distinct subpath customers on the same hostname were sharing one Secrets Manager key (
packages/spacecat-shared-utils/src/helpers.js:54). - Backward compatibility for root-domain sites is preserved by the
pathSuffix ? ... : hostternary; thenew URL('https://nba.com/').pathname === '/'case is correctly handled by the^\/|\/$strip. - Switching from
url.hosttourl.hostnamewas the right correctness move - a port should not be part of an identity key. - The new tests in
packages/spacecat-shared-utils/test/helpers.test.js:93-117cover the primary collision scenario, root trailing-slash idempotence, and nested subpaths. The pre-existing'resolves the customer secrets name correctly with valid inputs'test (line 65) implicitly anchors the "no path - same as before" claim. - Both points raised in the self-review (host vs hostname, nested-path test) were addressed before review was requested. Clean change discipline.
Issues
Important (Should Fix)
-
Migration impact for existing subpath customers is unaddressed (
packages/spacecat-shared-utils/src/helpers.js:54)The PR description says "Backward compatible," but this is only true for sites with no path. Any customer whose
baseURLalready carries a path had its credentials written undercustomer-secrets/<host>/<ver>. After this PR ships, the samebaseURLresolves tocustomer-secrets/<host>_<path>/<ver>, a key that does not exist in Secrets Manager. The downstream consumers (spacecat-shared-google-client,spacecat-shared-ims-client,spacecat-shared-content-client, plusspacecat-auth-serviceGoogle/SharePoint handlers) will start failing those lookups.content-clientswallows the lookup error atlog.debugand silently degrades; IMS auth raises and breaks page authentication.Two paths forward, pick one:
- Confirm in the PR description (and ideally in LLMO-4186) that no production Site has a non-empty path in
baseURLtoday, so the "backward compatible" claim is in practice "backward compatible for all current sites." - If subpath sites do exist, coordinate a one-time secret rename in Secrets Manager before deploy, or add a transitional dual-read in callers (try new key, fall back to legacy hostname-only key, log a warning) for one release cycle.
- Confirm in the PR description (and ideally in LLMO-4186) that no production Site has a non-empty path in
-
Path-segment separator collision narrows but does not close the original bug class (
packages/spacecat-shared-utils/src/helpers.js:53)Both
/and any non-alphanumeric character collapse to_. As a result:nba.com/us/kingsandnba.com/us-kingsboth resolve tonba_com_us_kings.nba.com/us/kingsandnba.com/us_kingsalso collide.nba.com/foo barandnba.com/foo%20barcollide on_-runs.
Two legitimately distinct customers landing on these forms reproduces the exact bug class this PR set out to fix. Not currently exploitable as cross-customer auth bypass because
baseURLis server-stored fromsite.getBaseURL(), but it leaves the door open for the next onboarding incident. Either pick a separator that cannot appear in the sanitized output (e.g. join host and path-segments with a delimiter that never collapses), or hash thepathname(e.g. first 12 hex of sha256) and append. Both are small and bounded to this function. -
JSDoc does not document the new behavior (
packages/spacecat-shared-utils/src/helpers.js:41-46)The doc was thin before and is now actively misleading - readers will not know that the path component participates in the key. At minimum: "The hostname and (if present) URL path are normalized (non-alphanumeric replaced with
_, lowercased) and combined with_. Sites with the same hostname but different paths resolve to distinct secrets." A worked example of the format (/helix-deploy/spacecat-services/customer-secrets/<host>[_<path>]/<version>) would also help the next person debugging "why is this customer's secret missing." Consider adding a@sincenote recording the format change so the breadcrumb is discoverable.
Minor (Nice to Have)
-
Repeated-slash path normalization (
packages/spacecat-shared-utils/src/helpers.js:53)The strip-slash regex
/^\/|\/$/gis a single-character alternation - it strips at most one leading and one trailing slash.nba.com//kings(sometimes produced by URL builders) yields_kings, different fromnba.com/kings->kings. Same risk class as the original bug, less likely to be hit in practice. Either collapse with^\/+|\/+$, or normalize the path uniformly withpathname.split('/').filter(Boolean).join('_').replace(/[^a-zA-Z0-9_]/g, '_').toLowerCase()(which also handles dot-segments). -
Missing subpath trailing-slash idempotence test (
packages/spacecat-shared-utils/test/helpers.test.js)The trailing-slash idempotence test only covers root URLs. The same property should hold for subpaths and is not asserted:
expect(resolveCustomerSecretsName('https://nba.com/kings', ctx)) .to.equal(resolveCustomerSecretsName('https://nba.com/kings/', ctx));
A future refactor of the path regex could silently break this.
-
Path case-folding is undocumented (
packages/spacecat-shared-utils/src/helpers.js:53)nba.com/Kingsandnba.com/kingscollapse to the same key because of.toLowerCase()on the path. RFC 3986 says path segments are case-sensitive, so this is a deliberate normalization choice. A one-line comment on the.toLowerCase()(or a test) makes the intent visible to a future contributor who might "fix" it.
Recommendations
- Consider extracting the host+path-to-key transformation into a small named helper (
urlToCustomerSlugor similar) with focused unit tests covering the normalization edges (encoded chars, repeated slashes, query/fragment, mixed case). This is load-bearing logic; isolating it pays for itself the next time someone needs to extend it. - Add a length guard. AWS Secrets Manager caps secret names at 512 chars; the basePath consumes ~46. Deep subpaths could plausibly approach that ceiling. A length check would surface this earlier than a Secrets Manager API rejection at runtime.
- Forward-looking: the canonical place to enforce the customer-identity contract is the Site model. A future
site.getCustomerSecretId()would shrink the surface area where the convention can drift across the four downstream consumers.
Out of scope, worth tracking
- The companion
generateDataFolderfix inspacecat-api-serviceis correctly excluded here, but worth tracking that the two functions encode the same identity concept using compatible conventions (this PR uses_,generateDataFolderreportedly uses-). A drift between the two reproduces the original bug class on the data-folder side. - Downstream packages (
spacecat-shared-google-client,spacecat-shared-ims-client,spacecat-shared-content-client, plusspacecat-auth-servicehandlers) will need a coordinated bump of the@adobe/spacecat-shared-utilsdependency so a subpath site's auth flow does not write to the new key while a different service still reads the old one.
Assessment
Ready to merge? With fixes.
Reasoning: the bug fix is correct, minimal, and well-tested for the targeted regression. The main blocker is the migration story - the PR description should explicitly confirm whether any in-production Site already has a subpath baseURL, since the answer determines whether this is a clean ship or a coordinated cutover. The JSDoc update and the separator-collision narrowing are small and worth landing in this PR.
Next Steps
- Resolve the migration question first - either confirm "no subpath sites in prod today" in the PR description / Jira, or put a coordinated cutover plan in place.
- Tighten the separator collision (Important item 2) and update the JSDoc (Important item 3).
- The three Minor items can land here or in a small follow-up.
| const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| const pathSuffix = url.pathname.replace(/^\/|\/$/g, '').replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| customer = pathSuffix ? `${host}_${pathSuffix}` : host; | ||
| } catch { |
There was a problem hiding this comment.
Important - Migration impact for existing subpath customers
The PR description says "Backward compatible," but that holds only for sites with no path. Any customer whose baseURL already carries a path had its credentials written under customer-secrets/<host>/<ver>. After this ships, the same URL resolves to customer-secrets/<host>_<path>/<ver> - a key that does not exist in Secrets Manager. Downstream consumers (spacecat-shared-google-client, spacecat-shared-ims-client, spacecat-shared-content-client, plus spacecat-auth-service Google/SharePoint handlers) will start failing those lookups; content-client swallows the error at log.debug and silently degrades, while IMS auth raises and breaks page authentication.
Pick one:
- Confirm in the PR description (and LLMO-4186) that no production Site has a non-empty path in
baseURLtoday, so the "backward compatible" claim is in practice "backward compatible for all current sites." - If subpath sites already exist, coordinate a one-time secret rename in Secrets Manager before deploy, or add a transitional dual-read in callers (try new key, fall back to legacy hostname-only key, log a warning) for one release cycle.
| const url = new URL(baseURL); | ||
| const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| const pathSuffix = url.pathname.replace(/^\/|\/$/g, '').replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| customer = pathSuffix ? `${host}_${pathSuffix}` : host; |
There was a problem hiding this comment.
Important - Separator collision narrows but does not close the original bug class
Both / and any non-alphanumeric character collapse to _. As a result:
nba.com/us/kingsandnba.com/us-kingsboth resolve tonba_com_us_kingsnba.com/us/kingsandnba.com/us_kingsalso collidenba.com/foo barandnba.com/foo%20barcollide on_-runs
Two legitimately distinct customers landing on these forms reproduces the exact bug class this PR set out to fix. Not currently exploitable as cross-customer auth bypass because baseURL is server-stored from site.getBaseURL(), but it leaves the door open for the next onboarding incident. Either pick a separator that cannot appear in the sanitized output, or hash the pathname (e.g. first 12 hex of sha256) and append. Both are small and bounded to this function.
Minor (same line) - Repeated-slash normalization
The strip-slash regex /^\/|\/$/g strips at most one leading and one trailing slash. nba.com//kings yields _kings, different from nba.com/kings -> kings. Either collapse with ^\/+|\/+$, or normalize uniformly with pathname.split('/').filter(Boolean).join('_') (which also handles dot-segments).
Minor (same line) - Path case-folding is undocumented
nba.com/Kings and nba.com/kings collapse to the same key because of .toLowerCase() on the path. RFC 3986 says path segments are case-sensitive, so this is a deliberate choice. A one-line comment on the .toLowerCase() (or a test) makes the intent visible to a future contributor who might "fix" it.
| @@ -47,7 +47,10 @@ export function resolveCustomerSecretsName(baseURL, ctx) { | |||
| const basePath = '/helix-deploy/spacecat-services/customer-secrets'; | |||
There was a problem hiding this comment.
Important - JSDoc does not document the new behavior
The existing JSDoc (just above this line) was thin before and is now actively misleading - readers will not know that the path component participates in the key. At minimum: "The hostname and (if present) URL path are normalized (non-alphanumeric replaced with _, lowercased) and combined with _. Sites with the same hostname but different paths resolve to distinct secrets."
A worked example of the format (/helix-deploy/spacecat-services/customer-secrets/<host>[_<path>]/<version>) would also help the next person debugging "why is this customer's secret missing." Consider adding a @since note recording the format change so the breadcrumb is discoverable.
- Split pathname into segments and join with '__' so 'nba.com/us/kings' and
'nba.com/us-kings' produce distinct keys
- Use split('/').filter(Boolean) to handle repeated slashes correctly
- Update JSDoc with key format and examples
- Add trailing-slash idempotence test for subpaths
- Add separator-collision test
Addresses review feedback on LLMO-4186
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the thorough review @dzehnder — all points addressed in the latest commits: Migration impact: No production sites currently have a non-empty path in Separator collision (important #2): Fixed. Switched to Repeated-slash normalization (minor #1): Fixed as a side-effect of the segment split — JSDoc (important #3): Updated with format description, delimiter rationale, and worked examples. Subpath trailing-slash idempotence test (minor #2): Added. The companion PR (spacecat-api-service#2315 |
…olding intent - decodeURIComponent each path segment before sanitizing so %20 and equivalent decoded forms map to the same key - Collapse consecutive _ runs after sanitizing (e.g. %20%20 -> single _) - Document deliberate case-folding of path segments in JSDoc - Add tests for percent-encoded paths, uppercase paths, and double slashes Addresses remaining review feedback on LLMO-4186 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Addressing the remaining items: Minor #3 (case-folding undocumented): Added explicit note to JSDoc — "paths are case-folded deliberately so /Kings and /kings map to the same key" — and a test asserting this. decodeURIComponent (parallel with companion PR): Added percent-decode for each segment before sanitizing, with a safe fallback on malformed encoding. Added test for Consecutive Double-slash test: Added — All 1043 tests pass. |
dzehnder
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Thanks for the quick turnaround on the prior round - the new commit is well-targeted and clearly tries to close the collision class properly. One residual issue: the new JSDoc makes a categorical injectivity claim that the sanitizer does not actually deliver, and the new collision test is too narrow to catch it. Both are fixable with a one-line regex tweak plus a couple of extra assertions.
Strengths
Previously flagged issues now addressed:
- Path-segment separator collision (prior Important): the new per-segment sanitization joined by
__eliminates the originalnba.com/us/kingsvsnba.com/us-kingscollision (packages/spacecat-shared-utils/src/helpers.js:62-67). - JSDoc (prior Important): the docstring now describes the format, the per-segment normalization, the
__delimiter, and gives three worked examples for root, single-segment, and nested-segment URLs (packages/spacecat-shared-utils/src/helpers.js:40-52). - Repeated-slash normalization (prior Minor):
split('/').filter(Boolean)cleanly handlesnba.com//kingsand similar, as a side effect. - Subpath trailing-slash idempotence test (prior Minor): dedicated test added at
packages/spacecat-shared-utils/test/helpers.test.js:113-117. - Path case-folding (prior Minor): now documented in the expanded JSDoc.
- Migration impact for existing subpath customers (prior Important): prior concern resolved by author response - subpath support is new functionality from LLMO-4186, no existing sites carry a non-empty path. The new format has no migration footprint inside spacecat-shared today.
Issues
Important (Should Fix)
-
Injectivity claim is false; the new approach still admits collisions on consecutive non-alphanumeric inputs (
packages/spacecat-shared-utils/src/helpers.js:62-67, JSDoc at:43-46)The JSDoc states: "The double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide." This is not actually true. The sanitizer
replace(/[^a-zA-Z0-9]/g, '_')is per-character and does not collapse runs, and_itself is non-alphanumeric so it is preserved 1:1. Two consecutive non-alphanumeric characters in a single path segment therefore produce__inside the segment, indistinguishable from a segment boundary.Verified collisions (all produce the same final secret name):
https://nba.com/us..kings -> nba_com__us__kings https://nba.com/us/kings -> nba_com__us__kings COLLIDE https://nba.com/us-_kings -> nba_com__us__kings COLLIDE https://nba.com/us__kings -> nba_com__us__kings COLLIDE (literal __ in path) https://nba.com//us//kings -> nba_com__us__kings COLLIDE (filter(Boolean) collapses runs of /)Plus intra-segment, all single-segment paths whose two halves are separated by any non-alphanumeric character collide on
us_kings:nba.com/us-kings nba.com/us_kings nba.com/us.kings nba.com/us kings -> nba_com__us_kingsWhy it matters: this is the same security class the PR is meant to close - two distinct
baseURLs mapping to one Secrets Manager key. Severity is bounded today by the admin-onlycreateSitegate and the absence of production subpath sites, so I am calling this Important rather than Critical. But: the PR's whole premise is "close the collision class" and the JSDoc explicitly promises injectivity, so shipping a docstring guarantee that the code does not honor sets up a foot-gun for the next contributor and for any caller that builds on the contract.The new test at
packages/spacecat-shared-utils/test/helpers.test.js:127-132only assertsnba.com/us/kings != nba.com/us-kings. None of the residual collisions above are covered, so the test does not actually pin the property the JSDoc claims.Pick one fix:
-
Tighten the sanitizer (smallest change, recommended): collapse runs of non-alphanumeric to a single
_per segment. One-character regex tweak:const segments = url.pathname.split('/').filter(Boolean) .map((seg) => seg.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase());
With
+,us..kingsbecomesus_kings, and joined-with-__paths can no longer match a single-segment input. The JSDoc claim becomes architecturally true. -
Or weaken the JSDoc to "best-effort distinctness for typical baseURL shapes; pathological inputs may collide." Worst option for a credentials-routing function.
-
Or hash the canonical form (
host,segments) and use the hash as the secret-name suffix. Eliminates the injectivity question entirely. Heavier change.
Whichever path, also extend the separator-collision test to pin the property:
expect(resolveCustomerSecretsName('https://nba.com/us..kings', ctx)) .to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx)); expect(resolveCustomerSecretsName('https://nba.com/us__kings', ctx)) .to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));
-
Recommendations
- Adopt the
+regex change above. It is a one-character edit, makes the docstring accurate, preserves all currently-passing tests, and locks the collision class shut. - After this PR merges, raise the same fix on the companion
generateDataFolder(inspacecat-api-service) - it has the identical residual collision with--instead of__. Out of scope here; worth a follow-up ticket so the two functions stay aligned on the injectivity property. - The
__delimiter doubles per-segment overhead and brings the AWS Secrets Manager 512-char name limit a bit closer. Realistic baseURLs are nowhere near that ceiling, but if the team ever wants the limit to be a non-issue, the hash-based key suggestion above is the cleanest answer. Not for this PR.
Out of scope, worth tracking
- The companion fix in
spacecat-api-servicecarries the same residual collision class (--separator with the same regex). One mention here, no separate finding.
Assessment
Ready to merge? With one fix.
Reasoning: every prior finding is genuinely addressed and the design is the right shape. The remaining issue is that the JSDoc advertises injectivity that the sanitizer does not provide, and the new test does not catch the residual collisions. The fix is a one-character regex change plus two additional assertions - small enough to land here rather than defer.
Next Steps
- Apply the
[^a-zA-Z0-9]+(collapse-runs) regex change in the per-segment sanitizer. - Add the two adversarial assertions to the separator-collision test.
- The recommendations are optional.
| try { | ||
| customer = new URL(baseURL).host.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| const url = new URL(baseURL); | ||
| const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); |
There was a problem hiding this comment.
Important - Injectivity claim is false; collision class still open
The new JSDoc (line 43-46) states "The double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide." This isn't true. The sanitizer is per-character and _ is non-alphanumeric so two adjacent non-alphanumerics in one segment produce __ inside the segment, indistinguishable from a segment boundary.
Verified collisions (all produce nba_com__us__kings):
https://nba.com/us/kings
https://nba.com/us..kings (consecutive `..`)
https://nba.com/us-_kings (mixed `-_`)
https://nba.com/us__kings (literal `__` in path)
https://nba.com//us//kings (filter(Boolean) collapses runs of `/`)
Plus intra-segment: us-kings, us_kings, us.kings, us kings all collapse to us_kings.
Severity is bounded today by the admin-only createSite gate and the absence of production subpath sites, so this stays Important rather than Critical. But the PR's whole purpose is to close this collision class on a credentials function, and shipping a JSDoc guarantee that the code does not honor sets up a foot-gun.
The new test at helpers.test.js:127-132 only asserts us/kings != us-kings; none of the collisions above are covered.
One-line fix (recommended): collapse runs of non-alphanumeric to a single _ per segment:
const segments = url.pathname.split('/').filter(Boolean)
.map((seg) => seg.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase());The + makes us..kings -> us_kings (single underscore), so joined-with-__ paths can no longer match a single-segment input. JSDoc claim becomes architecturally true; all current tests still pass.
Add adversarial assertions to lock it in:
expect(resolveCustomerSecretsName('https://nba.com/us..kings', ctx))
.to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));
expect(resolveCustomerSecretsName('https://nba.com/us__kings', ctx))
.to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));The companion generateDataFolder in spacecat-api-service has the identical residual collision with --; worth a follow-up there so the two functions stay aligned on the injectivity property.
The per-segment sanitizer now uses /[^a-zA-Z0-9]+/g (with +) to collapse consecutive non-alphanumeric characters into a single underscore in one pass, removing the separate replace(/_+/g, '_') step. This makes the JSDoc injectivity claim architecturally accurate: a sanitized segment can never contain __, so the __ segment delimiter is unambiguous. Added three new tests: - 'us..kings' (dots) does not collide with 'us/kings' (two segments) - 'us__kings' (literal double underscore) does not collide with 'us/kings' - Malformed percent-encoding (%ZZ) is handled without throwing (covers the decodeURIComponent catch branch). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dzehnder
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Thanks for the iteration - the collapse-runs change closes the path-segment collision class cleanly and the adversarial tests do exactly what they need to. Verified by re-running each test pair. One residual issue surfaced once we walked the full attack chain: the same + quantifier was not applied to the hostname regex, which leaves a host-vs-path collision class open and makes the JSDoc's categorical claim still technically false.
Strengths
Previously flagged issues now addressed:
- Path-segment injectivity is now genuinely closed. The
replace(/[^a-zA-Z0-9]+/g, '_')quantifier guarantees a sanitized segment cannot contain__, so segments joined with__are uniquely decomposable. Verified:us..kings,us/kings,us__kings,us-kingsall produce distinct customer strings. - Adversarial tests at
packages/spacecat-shared-utils/test/helpers.test.js:131-142lock in the property forus..kings != us/kingsandus__kings != us/kings. - Malformed percent-encoding is correctly handled - the inner
try/catcharounddecodeURIComponentfalls back to the raw segment, and there is a regression test (%ZZkingsdoes not throw). - JSDoc is thorough, documents the case-folding decision deliberately, and gives concrete worked examples.
Issues
Important (Should Fix)
Hostname regex still per-character; host-vs-path collision class remains (packages/spacecat-shared-utils/src/helpers.js:64)
The path regex was strengthened with + (the change requested last round), but the hostname regex on line 64 was not updated:
const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); // no +So consecutive non-alphanumeric characters in a hostname produce __ runs - the same delimiter used to join host with path segments. This collapses the host/path boundary. Verified by running the function:
https://nba.com..foo -> nba_com__foo
https://nba.com/foo -> nba_com__foo COLLIDE
https://a..b -> a__b
https://a/b -> a__b COLLIDE
The JSDoc claim at lines 43-46 ("the double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide") therefore still does not hold across the full key.
Severity bounded by: (a) onboarding is admin-gated, (b) DNS will not resolve hostnames with consecutive empty labels (nba.com..foo), so reaching this in production requires either a manually-constructed baseURL with a typo or a corrupted normalization upstream. That keeps it at Important rather than Critical, but the same collision class the PR was filed to close still exists at the host/path boundary; closing it now (greenfield, no live subpath sites) is materially cheaper than discovering it post-merge.
Two acceptable shapes, pick one:
-
Mirror the path regex so the host also collapses runs:
const host = url.hostname.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase();
Add a regression test:
resolveCustomerSecretsName('https://nba.com..foo', ctx)must not equalresolveCustomerSecretsName('https://nba.com/foo', ctx). Note that this introduces a different same-host collision (a..b.comanda.b.comboth ->a_b_com), which is acceptable because well-formed hostnames do not contain consecutive non-alphanumeric characters anyway. -
Scope the JSDoc claim so the documented invariant matches what the code provides:
"Distinct path tuples on the same hostname cannot collide. Hostnames are sanitized per-character so adjacent separators in a hostname can produce
__. Callers must pass normalized hostnames."Plus a one-line comment at the regex explaining the asymmetry, so the next reader does not "fix" it and silently break path-side injectivity.
Recommend option 1: it is a one-character change, eliminates the host-vs-path class entirely, and keeps the JSDoc claim honest.
Minor (Nice to Have)
- No length guard. The function does not bound output length. AWS Secrets Manager rejects names over 512 bytes; a deeply-nested path with long segments will fail at write time rather than at onboarding. Admin-gated, so this is a self-DoS rather than a tenant-impacting concern. Consider truncating or rejecting
customerover a sane bound (e.g. 200 chars to leave headroom for basePath and version), or fold this into a future sharedsiteIdentityComponentshelper.
Recommendations
- Pick option 1 above and add the one-line regression test. After this PR ships, the same host-side fix should land on the companion
generateDataFolderinspacecat-api-serviceso the two functions stay aligned on the injectivity property.
Assessment
Ready to merge? With one fix.
Reasoning: Every prior path-side finding is genuinely addressed and the algorithm is the right shape. The remaining issue is asymmetric: path was tightened with +, host was not. The fix is one character; without it, the JSDoc still overstates the guarantee. After applying it (or scoping the JSDoc), this is good to go.
Next Steps
- Update the hostname regex (or scope the JSDoc) so the injectivity claim matches the code.
- Add the regression test for the host-vs-path class.
- The Minor length-guard item is optional and can land separately.
| let customer; | ||
| try { | ||
| customer = new URL(baseURL).host.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| const url = new URL(baseURL); |
There was a problem hiding this comment.
Important - Hostname regex was not strengthened with +; host-vs-path collision class remains
The path regex (line 71) now uses [^a-zA-Z0-9]+ (collapse runs), but the hostname regex on this line still uses [^a-zA-Z0-9] (per-char). That asymmetry means consecutive non-alphanumeric characters in a hostname produce __ runs, the same delimiter used to join host with path segments. Verified by running the function:
https://nba.com..foo -> nba_com__foo
https://nba.com/foo -> nba_com__foo COLLIDE
https://a..b -> a__b
https://a/b -> a__b COLLIDE
The JSDoc claim at lines 43-46 ("the double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide") therefore still doesn't hold across the full key.
Severity bounded by: (a) onboarding is admin-gated, (b) DNS will not resolve hostnames with consecutive empty labels. But this is the same collision class the PR was filed to close, just at the host/path boundary.
Recommended fix (one character):
const host = url.hostname.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase();And add a regression test: f('https://nba.com..foo', ctx) !== f('https://nba.com/foo', ctx).
Alternative: scope the JSDoc claim and add a comment here explaining the asymmetry is deliberate. Either resolves the inconsistency.
…ions Apply + quantifier to the hostname regex so consecutive non-alphanumeric characters (e.g. nba.com..foo) collapse to a single underscore, matching the already-applied run-collapse on path segments. Without this, a hostname like nba.com..foo produced nba_com__foo, colliding with the key for nba.com/foo. Add regression test to pin the fix.
…ntation - Restructure to isolate URL-parse errors from business logic (prevent unrelated errors being masked as 'Invalid baseURL') - Add protocol/hostname guard: reject non-http(s) and empty-hostname URLs - Extract inline sanitize() with leading/trailing _ trim and filter(Boolean) on segments, preventing degenerate all-underscore segments from artifacts - Update JSDoc: scope the injectivity claim honestly (punctuation-only segment variants still collide, document this), add @SInCE with LLMO-4186 - Fix percent-encoded segment test to use property assertion (equal to its literal-unicode counterpart) rather than pinning a lossy output value - Add: port-equality test, http/https validation tests
dzehnder
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Thanks for the iteration since the last round. The host-vs-path collision class is genuinely closed and the JSDoc rewrite is honest about the residual lossy-sanitization limit. Three Important items came up from the new commits: an injectivity claim that the empty-segment drop quietly invalidates, a unicode-normalization gap that admits the same kind of silent collision the PR is closing elsewhere, and the new protocol guard duplicating an existing in-package predicate.
Strengths
Previously flagged issues now addressed:
- Hostname collapse-runs (prior Important, review #3): the
sanitizehelper appliesreplace(/[^a-zA-Z0-9]+/g, '_')plus leading/trailing underscore trim to BOTH host and path segments, and the new testdoes not collide hostname with consecutive dots and a subpath with the same textpins the property correctly.https://nba.com..foo->nba_com_fooandhttps://nba.com/foo->nba_com__fooare now distinct. - JSDoc punctuation-collision admission (prior Minor framing): the rewrite explicitly documents that
us-kingsandus_kingscollide on the same sanitized segment as "an inherent limitation of lossy sanitization," with an@since 2.xLLMO-4186 note. Honest engineering and the right posture for a load-bearing key. - Encoded-vs-literal equivalence test: the percent-encoded test now asserts
https://nba.com/k%C3%B6nigresolves identically tohttps://nba.com/könig, which pins the semantic invariant rather than a brittle expected string. Plus the regex sanity check catches host-prefix regressions. - Defense-in-depth on protocol degeneration: before this PR,
new URL('javascript:alert(1)'),data:,blob:, and similar parsed with empty hostname and would all have produced a single shared key. The new throw closes that class. (Layering concern below in Important findings, but the underlying improvement is real.) - Caller compatibility verified: every internal caller of
resolveCustomerSecretsName(content-client, ims-client, google-client, auth-service google/sharepoint handlers, autofix-worker) passessite.getBaseURL(). TheSitemodel validatesbaseURLviaisValidUrl, which only accepts http(s). The new throw is a dead branch in practice for known callers, so the behavior change is backward-compatible.
Issues
Important (Should Fix)
-
JSDoc injectivity claim is still false because of the new
.filter(Boolean)(packages/spacecat-shared-utils/src/helpers.js:43-46, source at:80-91).The new doc states: "URLs that differ in path structure (different number of segments, or segments with different alphanumeric content) produce distinct keys." This is wrong. The new commit added trim-edges plus
.filter(Boolean)after sanitize, so a path segment consisting entirely of non-alphanumeric characters reduces to empty and is silently dropped. As a result, two URLs with different segment counts produce the same key:https://nba.com/-/foo->nba_com__foo==https://nba.com/foo->nba_com__foohttps://nba.com/foo/-/->nba_com__foo==https://nba.com/foohttps://nba.com/---->nba_com==https://nba.com(host-only)
This is a NEW collision class introduced by this commit (the prior code did not trim or drop empty segments). The author's prior pushback ("no production sites carry a non-empty path") still bounds the blast radius, but the
@since 2.xJSDoc is now part of a documented contract that callers may rely on, and the "different number of segments produce distinct keys" claim is provably wrong.Pick one:
- Weaken the JSDoc to acknowledge that segments which sanitize to empty are dropped (extending the existing "lossy sanitization" caveat).
- Or keep an empty-but-present marker (don't drop empty segments; emit a single
_placeholder) so segment count is preserved.
Add an adversarial test pinning whichever behavior you choose, e.g.
https://nba.com/-/foo!==https://nba.com/foo(or===, per choice). -
Unicode NFC vs NFD inputs produce different keys with no test or doc note (
packages/spacecat-shared-utils/src/helpers.js:84, thesanitizehelper).The new percent-encoded test uses NFC for both literal and encoded forms, so it passes. NFD is not exercised.
decodeURIComponentdoes no unicode normalization, so visually-identical inputs in different forms produce different sanitized keys:https://nba.com/k%C3%B6nig(NFCö= U+00F6) ->nba_com__k_nighttps://nba.com/ko%CC%88nig(NFDo+ U+0308 combining diaeresis) ->nba_com__ko_nig
A copy-paste of the same brand name from two different sources (browser address bar vs macOS filesystem; iOS clipboard) lands in different secret buckets. This is the same class of silent collision the PR is closing in the opposite direction (different keys for the same logical tenant). One-line fix: add
.normalize('NFC')tosanitizebefore the regex. Removes the entire NFC/NFD ambiguity. Also add one regression test pinning the equivalence. -
throw on non-http(s)duplicatesisValidUrland lives at the wrong layer (packages/spacecat-shared-utils/src/helpers.js:69-71).The new guard:
if (!url.hostname || !['http:', 'https:'].includes(url.protocol)) { throw new Error('Invalid baseURL: must be an http(s) URL with a hostname'); }
reimplements
isValidUrlfrompackages/spacecat-shared-utils/src/functions.js:208, which already doesurl.protocol === 'http:' || url.protocol === 'https:'after parse. The well-trodden caller pattern iscomposeBaseURL(domain) -> site.baseURL -> resolveCustomerSecretsName(site.getBaseURL(), ctx)- the function is a key-derivation leaf; protocol enforcement is an input-contract concern centralized upstream.Pick one:
- Drop the throw and rely on
composeBaseURL+isValidUrlupstream. - Or keep the throw but call
isValidUrl(baseURL)so the package has one definition of "valid baseURL." Lower-risk localized fix. The hostname check is fine to keep as a defensive assertion sincenew URL('http://')parses but yields empty hostname.
Worth noting: the throw is safely dead-branch given
Sitemodel validation, so this is a layering/duplication finding rather than a runtime defect. - Drop the throw and rely on
Minor (Nice to Have)
-
@since 2.xis not a real version (packages/spacecat-shared-utils/src/helpers.js:46). Currentspacecat-shared-utilsis1.113.0, and there are no other@sincetags in this package to mirror.2.xis a range, not a version. Either drop the tag or replace with the LLMO-4186 reference inline. -
IPv6 hostname literals collapse to a non-injective key.
URLparseshttps://[::1]/kingswithurl.hostname === '::1', whichsanitizereduces to empty after trim. The final key starts with the leading__from the path-join, so multiple distinct IPv6 hosts ([::1],[::2],[fe80::1]) all reduce to the same key prefix. Almost certainly unreachable for production customer baseURLs, but the new hostname guard accepts these inputs without throwing. Either add an IPv4-or-FQDN-only validator (probably overkill), or note it in the JSDoc punctuation-collision paragraph. -
Symmetric port assertion. The new port test only checks port-vs-no-port equivalence. Add the converse
same-port-different-pathdirection (nba.com:8443/kingsvsnba.com:8443/lakers-> different) to pin that the path delimiter still works once a port is present. One line.
Recommendations
- The lossy-sanitization framing in the JSDoc names the symptom; the architectural choice is "lossy sanitize chosen over hashing." A future iteration could append a short content-derived suffix (first 8 hex of
sha256("${host}/${decodedPath}")) so collisions become numerically improbable rather than syntactically guaranteed. Out of scope for this PR; track as a follow-up. - The new validation duplicates
isValidUrl; consider whetherSite.baseURLwrite-path validation should also rejecturl.username || url.password(today, a userinfo-bearing baseURL passesisValidUrland could collide with a legitimate baseURL on the same hostname). Strictly defense-in-depth on the admin-write path, not a current vulnerability. Out of scope here.
Out of scope, worth tracking
- PR #1577 (this) and
adobe/spacecat-api-service#2315now carry near-identical sanitize logic with different separators (_/__vs-/--). A sharedsiteIdentityComponents(baseURL)helper inspacecat-shared-utilswould keep the two repos from drifting on the URL-to-identifier contract. Mentioning once. - Length guard against AWS Secrets Manager 512-char name limit. Not currently load-bearing for realistic baseURLs.
Assessment
Ready to merge? With fixes.
Reasoning: the prior Important (host-vs-path collision) is genuinely closed and the new tests pin the right invariants. The three new Important findings are all small, localized fixes: a JSDoc tweak (or a one-line code change to preserve segment count), a one-line .normalize('NFC') plus one regression test, and a one-line isValidUrl swap. All three are within budget for this PR. After they land, this is approve as-is; the architecture and security posture are sound.
Next Steps
- Pick the JSDoc-vs-empty-segment-marker decision and add the adversarial test.
- Add
.normalize('NFC')tosanitizeand a regression test for the NFD form. - Route the protocol guard through
isValidUrl, or drop it. - Minor items (
@sincecleanup, IPv6 note, symmetric port test) can land here or in a follow-up.
| * The hostname (per RFC 1035, case-insensitive) and each URL path segment are | ||
| * percent-decoded and individually sanitized: runs of non-alphanumeric characters | ||
| * are replaced with a single `_`, leading/trailing `_` are trimmed, and the | ||
| * result is lowercased. Segments that reduce to empty after sanitization are |
There was a problem hiding this comment.
Important: this JSDoc claim ("URLs that differ in path structure (different number of segments, or segments with different alphanumeric content) produce distinct keys") is provably wrong because of the new .filter(Boolean) at line ~89. A path segment that sanitizes to empty (e.g. /-/, /---) is silently dropped, so:
https://nba.com/-/foo->nba_com__foo==https://nba.com/foohttps://nba.com/---->nba_com==https://nba.com
New collision class introduced by this commit (the prior code did not trim/drop). Pick: weaken the doc to admit the empty-segment drop, OR preserve segment count by emitting a single _ placeholder for empty segments. Add an adversarial test pinning the chosen behavior.
| const sanitize = (s) => s.replace(/[^a-zA-Z0-9]+/g, '_').replace(/^_+|_+$/g, '').toLowerCase(); | ||
| const host = sanitize(url.hostname); | ||
| const segments = url.pathname.split('/').filter(Boolean) | ||
| .map((seg) => { |
There was a problem hiding this comment.
Important: NFC vs NFD unicode forms produce different keys today. decodeURIComponent doesn't normalize. Same logical brand pasted from two sources lands in different secret buckets:
https://nba.com/k%C3%B6nig(NFCö= U+00F6) ->nba_com__k_nighttps://nba.com/ko%CC%88nig(NFDo+ U+0308) ->nba_com__ko_nig
This is the same class of silent collision the PR is closing elsewhere, in the opposite direction (different keys for the same logical tenant). One-line fix: add .normalize('NFC') to sanitize before the regex. Plus one regression test asserting NFC/NFD equivalence.
| * only, causing subpath sites on the same domain to share a secret. LLMO-4186. | ||
| */ | ||
| export function resolveCustomerSecretsName(baseURL, ctx) { | ||
| const basePath = '/helix-deploy/spacecat-services/customer-secrets'; |
There was a problem hiding this comment.
Important: the new if (!url.hostname || !['http:','https:'].includes(url.protocol)) reimplements isValidUrl from packages/spacecat-shared-utils/src/functions.js:208. Two concerns: (a) duplicate logic in the same package; (b) the throw lives at the leaf of a key-derivation function rather than at the input boundary (composeBaseURL / Site model write-path / isValidUrl).
Pick: drop the throw and rely on composeBaseURL + isValidUrl upstream, OR call isValidUrl(baseURL) here so the package has one valid-baseURL predicate. The hostname-empty check is fine to keep as a defensive assertion (new URL('http://') parses but yields empty hostname).
danieljchuser
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Thanks for the iteration - the degenerate-host guard in the latest commit is a real improvement, and the em-dash/since cleanup shows good attention to prior feedback. The core fix is solid: the __ delimiter, run-collapse sanitizer, and upstream Site model validation close the original cross-tenant credential collision class. Four Important items worth addressing and two Minor items, but none are blockers given the bounded blast radius.
Strengths
- The degenerate-host guard at
helpers.js:77(if (!host)) closes the last reachable path whereURLparsing accepts a hostname that sanitizes to empty. Defense-in-depth is appropriate even withSitevalidation upstream. - The
try/catcharounddecodeURIComponentathelpers.js:88-92is correctly bounded (catches onlyURIError), commented, and falls back to the raw segment. Does not silently swallow programmer errors. - Single-pass
[^a-zA-Z0-9]+regex athelpers.js:79makes the__delimiter unambiguous by construction. Collision class closed at the regex level. - Test additions at
helpers.test.js:93-203cover the adversarial collision classes (us..kingsvsus/kings,us__kingsvsus/kings,nba.com..foovsnba.com/foo), port equivalence, protocol errors, and degenerate hostname. Solid regression net. - Previously flagged issues now addressed: em-dashes removed,
@sinceprecision improved, degenerate-host guard added.
Issues
Important (Should Fix)
-
Hostname sanitizer regression extends beyond subpath sites -
helpers.js:79The regex changed from
[^a-zA-Z0-9](per-char) to[^a-zA-Z0-9]+(run-collapse) plus leading/trailing trim. This affects ALL hostnames with consecutive or leading/trailing non-alphanumerics, not just subpath sites:xn--fiq228c.com(IDN/punycode) -> oldxn__fiq228c_com, newxn_fiq228c_comfoo..bar.com-> oldfoo__bar_com, newfoo_bar_com_dmarc.example.com-> old_dmarc_example_com, newdmarc_example_com
The "no production subpath sites" justification only addresses the subpath dimension. Before merging, run a one-off pass over the production site list to confirm no hostname falls into these categories and document the result on the PR.
-
JSDoc injectivity claim still overstated -
helpers.js:47-50"URLs that differ in path structure (different number of segments...) produce distinct keys" remains false.
.filter(Boolean)at line 91 drops segments that sanitize to empty, so/fooand/-/fooboth producehost__foo. The doc only calls out the punctuation-vs-underscore collision, not the empty-segment collapse.Fix: add one sentence: "Segments that contain only non-alphanumeric characters sanitize to empty and are dropped, so paths differing only by such segments produce the same key (e.g.
/fooand/-/foo)." -
NFC vs NFD unicode normalization gap -
helpers.js:88decodeURIComponentreturns whatever bytes the URL carried. NFC precomposedo(U+00F6) and NFD decomposedo+ U+0308 sanitize to different keys for visually identical URLs. One-line fix: add.normalize('NFC')afterdecodeURIComponentbefore sanitize. -
Protocol/hostname guard duplicates
isValidUrl-helpers.js:69-71isValidUrllives in the same package (functions.js) and already validates http(s) + hostname. This function open-codes the same check with three near-identical "Invalid baseURL" error strings. Two sources of truth will drift whenisValidUrlevolves.Fix: call
isValidUrl(baseURL)first and throw a single error, thennew URLfor parsing. Keep the empty-after-sanitize host throw as the only inline guard (derivation-specific).
Minor (Nice to Have)
-
Test for malformed percent-encoding pins non-throw but not resulting key -
helpers.test.js:172-175. Add an explicit.to.equal(...)assertion to lock the raw-fallback contract, not just the non-throw property. -
No positive
http://test -helpers.test.js:93-203. Every successful-resolution test useshttps://. The protocol guard accepts both, but thehttp://branch is unexercised. One assertion would close the gap.
Recommendations
- Add a defensive max-length check on
customerbefore delegating toresolveSecretsName. AWS Secrets Manager imposes a 512-char name limit; deep pathnames could exceed it silently and surface as runtime AccessDenied. - Promote the empty catch on
decodeURIComponentto at leastlog.debugso silent key-routing changes from malformed percent-encoding are recoverable from logs. - Consider extracting
sanitizeto a named, exported helper. The companion change in api-service#2315 reimplements the same logic with-/--separators; a shared primitive would prevent drift.
Out of scope, worth tracking
- Companion PR
adobe/spacecat-api-service#2315derives the same conceptual identity with-/--separators. A sharedsanitizeUrlToSlug(baseURL, { separator })utility would prevent the two from drifting. Cheapest to converge now before both formats calcify across callers. - Backward-compat claim in the PR description applies to root-domain sites only. Any existing subpath site that shared a key will miss its secret after deployment. Worth a deployment runbook entry covering affected sites and rollback plan.
Assessment
Approved - the cross-tenant credential collision class is correctly closed and the security posture is sound. The findings above are worth addressing but none are blockers given the bounded blast radius (no production subpath sites, admin-gated onboarding, Site model validation upstream). The hostname sanitizer regression (finding 1) is the most important to verify with a production audit before deploying.
| throw new Error('Invalid baseURL: must be a valid URL'); | ||
| } | ||
| if (!url.hostname || !['http:', 'https:'].includes(url.protocol)) { | ||
| throw new Error('Invalid baseURL: must be an http(s) URL with a hostname'); |
There was a problem hiding this comment.
Important: the regex change from per-char to run-collapse affects ALL hostnames with consecutive non-alphanumerics, not just subpath sites. Punycode (xn--) prefixes, double-dot hostnames, and leading-underscore hostnames all get different keys. Run a one-off pass over the production site list to confirm no hostname falls into these categories before deploying.
| * are replaced with a single `_`, leading/trailing `_` are trimmed, and the | ||
| * result is lowercased. Segments that reduce to empty after sanitization are | ||
| * dropped. Sanitized parts are joined with `__` as the path-segment delimiter. | ||
| * |
There was a problem hiding this comment.
Important: this claim remains false for the empty-segment case. .filter(Boolean) at line 91 drops segments that sanitize to empty, so /foo and /-/foo both produce host__foo. Add a sentence noting that segments containing only non-alphanumeric characters are dropped.
| } | ||
| const segments = url.pathname.split('/').filter(Boolean) | ||
| .map((seg) => { | ||
| let decoded = seg; |
There was a problem hiding this comment.
Important: decodeURIComponent does no Unicode normalization. NFC precomposed and NFD decomposed forms of the same character produce different sanitized keys. One-line fix: decoded = decodeURIComponent(seg).normalize('NFC') before sanitize.
## [@adobe/spacecat-shared-utils-v1.115.1](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.115.0...@adobe/spacecat-shared-utils-v1.115.1) (2026-05-06) ### Bug Fixes * **utils:** include subpath in resolveCustomerSecretsName to prevent credential collisions ([#1577](#1577)) ([db95707](db95707))
|
🎉 This PR is included in version @adobe/spacecat-shared-utils-v1.115.1 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Summary
nba.com/kingsandnba.com/lakers) were colliding on a single Secrets Manager key becauseresolveCustomerSecretsNameused only the hostnameFixes LLMO-4186
Test plan
🤖 Generated with Claude Code