Skip to content

fix(utils): include subpath in resolveCustomerSecretsName to prevent credential collisions#1577

Merged
alinarublea merged 9 commits intomainfrom
fix/subpath-customer-secrets-name
May 6, 2026
Merged

fix(utils): include subpath in resolveCustomerSecretsName to prevent credential collisions#1577
alinarublea merged 9 commits intomainfrom
fix/subpath-customer-secrets-name

Conversation

@alinarublea
Copy link
Copy Markdown
Contributor

Summary

  • Subpath sites on the same domain (e.g. nba.com/kings and nba.com/lakers) were colliding on a single Secrets Manager key because resolveCustomerSecretsName used only the hostname
  • Now includes the URL path in the secret key, so each subpath site gets its own isolated secret
  • Backward compatible: root-domain sites (no path) produce the same key as before

Fixes LLMO-4186

Test plan

  • New unit tests verify subpath sites produce distinct secret names
  • Existing tests confirm root-domain behavior is unchanged
  • 100% line/branch coverage maintained

🤖 Generated with Claude Code

…credential collisions

Subpath sites on the same domain (e.g. nba.com/kings, nba.com/lakers) were
sharing a single Secrets Manager key because only the hostname was used.
Now the full path is included so each subpath site gets its own secret key.
Backward compatible: root-domain sites produce the same key as before.

Fixes LLMO-4186

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@alinarublea alinarublea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review

The fix is correct and backward compatible. Two small observations:

url.host vs url.hostname
The implementation uses url.host (which includes the port, e.g. nba.com:8080) rather than url.hostname. This matches the pre-existing behavior so it is not a regression, but generateDataFolder in spacecat-api-service (the companion fix for LLMO-4186) uses url.hostname. Worth aligning the two for consistency, or at minimum documenting the difference.

Missing test case for nested paths
https://nba.com/us/kingsnba_com_us_kings is the expected behavior (correct), but there is no test covering nested subpaths. A single extra assertion in the new test would document this and guard against future regressions.

Neither is a blocker — the core fix is solid.

- Switch from url.host to url.hostname for consistency with generateDataFolder
- Add test for nested subpaths (e.g. /us/kings)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

This PR will trigger a patch release when merged.

Copy link
Copy Markdown
Contributor

@dzehnder dzehnder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alinarublea,

Thanks for the focused fix and the upfront self-review. The core change is correct and the tests cover the headline scenarios. The cluster of concerns below is mostly about what happens to credentials that already exist under the old key shape, plus a few normalization edges that the new path-based key does not fully close.

Strengths

  • The fix correctly closes a real cross-tenant credential collision: distinct subpath customers on the same hostname were sharing one Secrets Manager key (packages/spacecat-shared-utils/src/helpers.js:54).
  • Backward compatibility for root-domain sites is preserved by the pathSuffix ? ... : host ternary; the new URL('https://nba.com/').pathname === '/' case is correctly handled by the ^\/|\/$ strip.
  • Switching from url.host to url.hostname was the right correctness move - a port should not be part of an identity key.
  • The new tests in packages/spacecat-shared-utils/test/helpers.test.js:93-117 cover the primary collision scenario, root trailing-slash idempotence, and nested subpaths. The pre-existing 'resolves the customer secrets name correctly with valid inputs' test (line 65) implicitly anchors the "no path - same as before" claim.
  • Both points raised in the self-review (host vs hostname, nested-path test) were addressed before review was requested. Clean change discipline.

Issues

Important (Should Fix)

  1. Migration impact for existing subpath customers is unaddressed (packages/spacecat-shared-utils/src/helpers.js:54)

    The PR description says "Backward compatible," but this is only true for sites with no path. Any customer whose baseURL already carries a path had its credentials written under customer-secrets/<host>/<ver>. After this PR ships, the same baseURL resolves to customer-secrets/<host>_<path>/<ver>, a key that does not exist in Secrets Manager. The downstream consumers (spacecat-shared-google-client, spacecat-shared-ims-client, spacecat-shared-content-client, plus spacecat-auth-service Google/SharePoint handlers) will start failing those lookups. content-client swallows the lookup error at log.debug and silently degrades; IMS auth raises and breaks page authentication.

    Two paths forward, pick one:

    • Confirm in the PR description (and ideally in LLMO-4186) that no production Site has a non-empty path in baseURL today, so the "backward compatible" claim is in practice "backward compatible for all current sites."
    • If subpath sites do exist, coordinate a one-time secret rename in Secrets Manager before deploy, or add a transitional dual-read in callers (try new key, fall back to legacy hostname-only key, log a warning) for one release cycle.
  2. Path-segment separator collision narrows but does not close the original bug class (packages/spacecat-shared-utils/src/helpers.js:53)

    Both / and any non-alphanumeric character collapse to _. As a result:

    • nba.com/us/kings and nba.com/us-kings both resolve to nba_com_us_kings.
    • nba.com/us/kings and nba.com/us_kings also collide.
    • nba.com/foo bar and nba.com/foo%20bar collide on _-runs.

    Two legitimately distinct customers landing on these forms reproduces the exact bug class this PR set out to fix. Not currently exploitable as cross-customer auth bypass because baseURL is server-stored from site.getBaseURL(), but it leaves the door open for the next onboarding incident. Either pick a separator that cannot appear in the sanitized output (e.g. join host and path-segments with a delimiter that never collapses), or hash the pathname (e.g. first 12 hex of sha256) and append. Both are small and bounded to this function.

  3. JSDoc does not document the new behavior (packages/spacecat-shared-utils/src/helpers.js:41-46)

    The doc was thin before and is now actively misleading - readers will not know that the path component participates in the key. At minimum: "The hostname and (if present) URL path are normalized (non-alphanumeric replaced with _, lowercased) and combined with _. Sites with the same hostname but different paths resolve to distinct secrets." A worked example of the format (/helix-deploy/spacecat-services/customer-secrets/<host>[_<path>]/<version>) would also help the next person debugging "why is this customer's secret missing." Consider adding a @since note recording the format change so the breadcrumb is discoverable.

Minor (Nice to Have)

  1. Repeated-slash path normalization (packages/spacecat-shared-utils/src/helpers.js:53)

    The strip-slash regex /^\/|\/$/g is a single-character alternation - it strips at most one leading and one trailing slash. nba.com//kings (sometimes produced by URL builders) yields _kings, different from nba.com/kings -> kings. Same risk class as the original bug, less likely to be hit in practice. Either collapse with ^\/+|\/+$, or normalize the path uniformly with pathname.split('/').filter(Boolean).join('_').replace(/[^a-zA-Z0-9_]/g, '_').toLowerCase() (which also handles dot-segments).

  2. Missing subpath trailing-slash idempotence test (packages/spacecat-shared-utils/test/helpers.test.js)

    The trailing-slash idempotence test only covers root URLs. The same property should hold for subpaths and is not asserted:

    expect(resolveCustomerSecretsName('https://nba.com/kings', ctx))
      .to.equal(resolveCustomerSecretsName('https://nba.com/kings/', ctx));

    A future refactor of the path regex could silently break this.

  3. Path case-folding is undocumented (packages/spacecat-shared-utils/src/helpers.js:53)

    nba.com/Kings and nba.com/kings collapse to the same key because of .toLowerCase() on the path. RFC 3986 says path segments are case-sensitive, so this is a deliberate normalization choice. A one-line comment on the .toLowerCase() (or a test) makes the intent visible to a future contributor who might "fix" it.

Recommendations

  • Consider extracting the host+path-to-key transformation into a small named helper (urlToCustomerSlug or similar) with focused unit tests covering the normalization edges (encoded chars, repeated slashes, query/fragment, mixed case). This is load-bearing logic; isolating it pays for itself the next time someone needs to extend it.
  • Add a length guard. AWS Secrets Manager caps secret names at 512 chars; the basePath consumes ~46. Deep subpaths could plausibly approach that ceiling. A length check would surface this earlier than a Secrets Manager API rejection at runtime.
  • Forward-looking: the canonical place to enforce the customer-identity contract is the Site model. A future site.getCustomerSecretId() would shrink the surface area where the convention can drift across the four downstream consumers.

Out of scope, worth tracking

  • The companion generateDataFolder fix in spacecat-api-service is correctly excluded here, but worth tracking that the two functions encode the same identity concept using compatible conventions (this PR uses _, generateDataFolder reportedly uses -). A drift between the two reproduces the original bug class on the data-folder side.
  • Downstream packages (spacecat-shared-google-client, spacecat-shared-ims-client, spacecat-shared-content-client, plus spacecat-auth-service handlers) will need a coordinated bump of the @adobe/spacecat-shared-utils dependency so a subpath site's auth flow does not write to the new key while a different service still reads the old one.

Assessment

Ready to merge? With fixes.

Reasoning: the bug fix is correct, minimal, and well-tested for the targeted regression. The main blocker is the migration story - the PR description should explicitly confirm whether any in-production Site already has a subpath baseURL, since the answer determines whether this is a clean ship or a coordinated cutover. The JSDoc update and the separator-collision narrowing are small and worth landing in this PR.

Next Steps

  1. Resolve the migration question first - either confirm "no subpath sites in prod today" in the PR description / Jira, or put a coordinated cutover plan in place.
  2. Tighten the separator collision (Important item 2) and update the JSDoc (Important item 3).
  3. The three Minor items can land here or in a small follow-up.

const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();
const pathSuffix = url.pathname.replace(/^\/|\/$/g, '').replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();
customer = pathSuffix ? `${host}_${pathSuffix}` : host;
} catch {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important - Migration impact for existing subpath customers

The PR description says "Backward compatible," but that holds only for sites with no path. Any customer whose baseURL already carries a path had its credentials written under customer-secrets/<host>/<ver>. After this ships, the same URL resolves to customer-secrets/<host>_<path>/<ver> - a key that does not exist in Secrets Manager. Downstream consumers (spacecat-shared-google-client, spacecat-shared-ims-client, spacecat-shared-content-client, plus spacecat-auth-service Google/SharePoint handlers) will start failing those lookups; content-client swallows the error at log.debug and silently degrades, while IMS auth raises and breaks page authentication.

Pick one:

  • Confirm in the PR description (and LLMO-4186) that no production Site has a non-empty path in baseURL today, so the "backward compatible" claim is in practice "backward compatible for all current sites."
  • If subpath sites already exist, coordinate a one-time secret rename in Secrets Manager before deploy, or add a transitional dual-read in callers (try new key, fall back to legacy hostname-only key, log a warning) for one release cycle.

const url = new URL(baseURL);
const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();
const pathSuffix = url.pathname.replace(/^\/|\/$/g, '').replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();
customer = pathSuffix ? `${host}_${pathSuffix}` : host;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important - Separator collision narrows but does not close the original bug class

Both / and any non-alphanumeric character collapse to _. As a result:

  • nba.com/us/kings and nba.com/us-kings both resolve to nba_com_us_kings
  • nba.com/us/kings and nba.com/us_kings also collide
  • nba.com/foo bar and nba.com/foo%20bar collide on _-runs

Two legitimately distinct customers landing on these forms reproduces the exact bug class this PR set out to fix. Not currently exploitable as cross-customer auth bypass because baseURL is server-stored from site.getBaseURL(), but it leaves the door open for the next onboarding incident. Either pick a separator that cannot appear in the sanitized output, or hash the pathname (e.g. first 12 hex of sha256) and append. Both are small and bounded to this function.

Minor (same line) - Repeated-slash normalization

The strip-slash regex /^\/|\/$/g strips at most one leading and one trailing slash. nba.com//kings yields _kings, different from nba.com/kings -> kings. Either collapse with ^\/+|\/+$, or normalize uniformly with pathname.split('/').filter(Boolean).join('_') (which also handles dot-segments).

Minor (same line) - Path case-folding is undocumented

nba.com/Kings and nba.com/kings collapse to the same key because of .toLowerCase() on the path. RFC 3986 says path segments are case-sensitive, so this is a deliberate choice. A one-line comment on the .toLowerCase() (or a test) makes the intent visible to a future contributor who might "fix" it.

@@ -47,7 +47,10 @@ export function resolveCustomerSecretsName(baseURL, ctx) {
const basePath = '/helix-deploy/spacecat-services/customer-secrets';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important - JSDoc does not document the new behavior

The existing JSDoc (just above this line) was thin before and is now actively misleading - readers will not know that the path component participates in the key. At minimum: "The hostname and (if present) URL path are normalized (non-alphanumeric replaced with _, lowercased) and combined with _. Sites with the same hostname but different paths resolve to distinct secrets."

A worked example of the format (/helix-deploy/spacecat-services/customer-secrets/<host>[_<path>]/<version>) would also help the next person debugging "why is this customer's secret missing." Consider adding a @since note recording the format change so the breadcrumb is discoverable.

- Split pathname into segments and join with '__' so 'nba.com/us/kings' and
  'nba.com/us-kings' produce distinct keys
- Use split('/').filter(Boolean) to handle repeated slashes correctly
- Update JSDoc with key format and examples
- Add trailing-slash idempotence test for subpaths
- Add separator-collision test

Addresses review feedback on LLMO-4186

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@alinarublea
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @dzehnder — all points addressed in the latest commits:

Migration impact: No production sites currently have a non-empty path in baseURL — subpath support is new functionality introduced by LLMO-4186. The "backward compatible" claim holds for all existing sites.

Separator collision (important #2): Fixed. Switched to split('/').filter(Boolean) on the pathname and joining segments with __ (double underscore). Since each sanitized segment can only contain single underscores, __ can never appear in a segment and acts as an unambiguous delimiter. nba.com/us/kingsnba_com__us__kings, nba.com/us-kingsnba_com__us_kings — distinct. Added a test asserting this.

Repeated-slash normalization (minor #1): Fixed as a side-effect of the segment split — split('/').filter(Boolean) discards empty segments, so //kings and /kings both yield ['kings'].

JSDoc (important #3): Updated with format description, delimiter rationale, and worked examples.

Subpath trailing-slash idempotence test (minor #2): Added.

The companion PR (spacecat-api-service#2315 generateDataFolder) received the same treatment with -- as the segment delimiter, for consistency.

@alinarublea alinarublea requested a review from dzehnder May 4, 2026 13:18
…olding intent

- decodeURIComponent each path segment before sanitizing so %20 and
  equivalent decoded forms map to the same key
- Collapse consecutive _ runs after sanitizing (e.g. %20%20 -> single _)
- Document deliberate case-folding of path segments in JSDoc
- Add tests for percent-encoded paths, uppercase paths, and double slashes

Addresses remaining review feedback on LLMO-4186

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@alinarublea
Copy link
Copy Markdown
Contributor Author

Addressing the remaining items:

Minor #3 (case-folding undocumented): Added explicit note to JSDoc — "paths are case-folded deliberately so /Kings and /kings map to the same key" — and a test asserting this.

decodeURIComponent (parallel with companion PR): Added percent-decode for each segment before sanitizing, with a safe fallback on malformed encoding. Added test for %C3%B6 (ö).

Consecutive _ collapse: Added .replace(/_+/g, '_') so e.g. %20%20 (two encoded spaces) produces one _ rather than two.

Double-slash test: Added — https://nba.com//kings equals https://nba.com/kings via filter(Boolean).

All 1043 tests pass.

Copy link
Copy Markdown
Contributor

@dzehnder dzehnder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alinarublea,

Thanks for the quick turnaround on the prior round - the new commit is well-targeted and clearly tries to close the collision class properly. One residual issue: the new JSDoc makes a categorical injectivity claim that the sanitizer does not actually deliver, and the new collision test is too narrow to catch it. Both are fixable with a one-line regex tweak plus a couple of extra assertions.

Strengths

Previously flagged issues now addressed:

  • Path-segment separator collision (prior Important): the new per-segment sanitization joined by __ eliminates the original nba.com/us/kings vs nba.com/us-kings collision (packages/spacecat-shared-utils/src/helpers.js:62-67).
  • JSDoc (prior Important): the docstring now describes the format, the per-segment normalization, the __ delimiter, and gives three worked examples for root, single-segment, and nested-segment URLs (packages/spacecat-shared-utils/src/helpers.js:40-52).
  • Repeated-slash normalization (prior Minor): split('/').filter(Boolean) cleanly handles nba.com//kings and similar, as a side effect.
  • Subpath trailing-slash idempotence test (prior Minor): dedicated test added at packages/spacecat-shared-utils/test/helpers.test.js:113-117.
  • Path case-folding (prior Minor): now documented in the expanded JSDoc.
  • Migration impact for existing subpath customers (prior Important): prior concern resolved by author response - subpath support is new functionality from LLMO-4186, no existing sites carry a non-empty path. The new format has no migration footprint inside spacecat-shared today.

Issues

Important (Should Fix)

  1. Injectivity claim is false; the new approach still admits collisions on consecutive non-alphanumeric inputs (packages/spacecat-shared-utils/src/helpers.js:62-67, JSDoc at :43-46)

    The JSDoc states: "The double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide." This is not actually true. The sanitizer replace(/[^a-zA-Z0-9]/g, '_') is per-character and does not collapse runs, and _ itself is non-alphanumeric so it is preserved 1:1. Two consecutive non-alphanumeric characters in a single path segment therefore produce __ inside the segment, indistinguishable from a segment boundary.

    Verified collisions (all produce the same final secret name):

    https://nba.com/us..kings    -> nba_com__us__kings
    https://nba.com/us/kings     -> nba_com__us__kings   COLLIDE
    https://nba.com/us-_kings    -> nba_com__us__kings   COLLIDE
    https://nba.com/us__kings    -> nba_com__us__kings   COLLIDE (literal __ in path)
    https://nba.com//us//kings   -> nba_com__us__kings   COLLIDE (filter(Boolean) collapses runs of /)
    

    Plus intra-segment, all single-segment paths whose two halves are separated by any non-alphanumeric character collide on us_kings:

    nba.com/us-kings   nba.com/us_kings   nba.com/us.kings   nba.com/us kings   -> nba_com__us_kings
    

    Why it matters: this is the same security class the PR is meant to close - two distinct baseURLs mapping to one Secrets Manager key. Severity is bounded today by the admin-only createSite gate and the absence of production subpath sites, so I am calling this Important rather than Critical. But: the PR's whole premise is "close the collision class" and the JSDoc explicitly promises injectivity, so shipping a docstring guarantee that the code does not honor sets up a foot-gun for the next contributor and for any caller that builds on the contract.

    The new test at packages/spacecat-shared-utils/test/helpers.test.js:127-132 only asserts nba.com/us/kings != nba.com/us-kings. None of the residual collisions above are covered, so the test does not actually pin the property the JSDoc claims.

    Pick one fix:

    • Tighten the sanitizer (smallest change, recommended): collapse runs of non-alphanumeric to a single _ per segment. One-character regex tweak:

      const segments = url.pathname.split('/').filter(Boolean)
        .map((seg) => seg.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase());

      With +, us..kings becomes us_kings, and joined-with-__ paths can no longer match a single-segment input. The JSDoc claim becomes architecturally true.

    • Or weaken the JSDoc to "best-effort distinctness for typical baseURL shapes; pathological inputs may collide." Worst option for a credentials-routing function.

    • Or hash the canonical form (host, segments) and use the hash as the secret-name suffix. Eliminates the injectivity question entirely. Heavier change.

    Whichever path, also extend the separator-collision test to pin the property:

    expect(resolveCustomerSecretsName('https://nba.com/us..kings', ctx))
      .to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));
    expect(resolveCustomerSecretsName('https://nba.com/us__kings', ctx))
      .to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));

Recommendations

  • Adopt the + regex change above. It is a one-character edit, makes the docstring accurate, preserves all currently-passing tests, and locks the collision class shut.
  • After this PR merges, raise the same fix on the companion generateDataFolder (in spacecat-api-service) - it has the identical residual collision with -- instead of __. Out of scope here; worth a follow-up ticket so the two functions stay aligned on the injectivity property.
  • The __ delimiter doubles per-segment overhead and brings the AWS Secrets Manager 512-char name limit a bit closer. Realistic baseURLs are nowhere near that ceiling, but if the team ever wants the limit to be a non-issue, the hash-based key suggestion above is the cleanest answer. Not for this PR.

Out of scope, worth tracking

  • The companion fix in spacecat-api-service carries the same residual collision class (-- separator with the same regex). One mention here, no separate finding.

Assessment

Ready to merge? With one fix.

Reasoning: every prior finding is genuinely addressed and the design is the right shape. The remaining issue is that the JSDoc advertises injectivity that the sanitizer does not provide, and the new test does not catch the residual collisions. The fix is a one-character regex change plus two additional assertions - small enough to land here rather than defer.

Next Steps

  1. Apply the [^a-zA-Z0-9]+ (collapse-runs) regex change in the per-segment sanitizer.
  2. Add the two adversarial assertions to the separator-collision test.
  3. The recommendations are optional.

try {
customer = new URL(baseURL).host.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();
const url = new URL(baseURL);
const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important - Injectivity claim is false; collision class still open

The new JSDoc (line 43-46) states "The double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide." This isn't true. The sanitizer is per-character and _ is non-alphanumeric so two adjacent non-alphanumerics in one segment produce __ inside the segment, indistinguishable from a segment boundary.

Verified collisions (all produce nba_com__us__kings):

https://nba.com/us/kings
https://nba.com/us..kings   (consecutive `..`)
https://nba.com/us-_kings   (mixed `-_`)
https://nba.com/us__kings   (literal `__` in path)
https://nba.com//us//kings  (filter(Boolean) collapses runs of `/`)

Plus intra-segment: us-kings, us_kings, us.kings, us kings all collapse to us_kings.

Severity is bounded today by the admin-only createSite gate and the absence of production subpath sites, so this stays Important rather than Critical. But the PR's whole purpose is to close this collision class on a credentials function, and shipping a JSDoc guarantee that the code does not honor sets up a foot-gun.

The new test at helpers.test.js:127-132 only asserts us/kings != us-kings; none of the collisions above are covered.

One-line fix (recommended): collapse runs of non-alphanumeric to a single _ per segment:

const segments = url.pathname.split('/').filter(Boolean)
  .map((seg) => seg.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase());

The + makes us..kings -> us_kings (single underscore), so joined-with-__ paths can no longer match a single-segment input. JSDoc claim becomes architecturally true; all current tests still pass.

Add adversarial assertions to lock it in:

expect(resolveCustomerSecretsName('https://nba.com/us..kings', ctx))
  .to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));
expect(resolveCustomerSecretsName('https://nba.com/us__kings', ctx))
  .to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));

The companion generateDataFolder in spacecat-api-service has the identical residual collision with --; worth a follow-up there so the two functions stay aligned on the injectivity property.

The per-segment sanitizer now uses /[^a-zA-Z0-9]+/g (with +) to collapse
consecutive non-alphanumeric characters into a single underscore in one
pass, removing the separate replace(/_+/g, '_') step. This makes the
JSDoc injectivity claim architecturally accurate: a sanitized segment
can never contain __, so the __ segment delimiter is unambiguous.

Added three new tests:
- 'us..kings' (dots) does not collide with 'us/kings' (two segments)
- 'us__kings' (literal double underscore) does not collide with 'us/kings'
- Malformed percent-encoding (%ZZ) is handled without throwing (covers
  the decodeURIComponent catch branch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@dzehnder dzehnder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alinarublea,

Thanks for the iteration - the collapse-runs change closes the path-segment collision class cleanly and the adversarial tests do exactly what they need to. Verified by re-running each test pair. One residual issue surfaced once we walked the full attack chain: the same + quantifier was not applied to the hostname regex, which leaves a host-vs-path collision class open and makes the JSDoc's categorical claim still technically false.

Strengths

Previously flagged issues now addressed:

  • Path-segment injectivity is now genuinely closed. The replace(/[^a-zA-Z0-9]+/g, '_') quantifier guarantees a sanitized segment cannot contain __, so segments joined with __ are uniquely decomposable. Verified: us..kings, us/kings, us__kings, us-kings all produce distinct customer strings.
  • Adversarial tests at packages/spacecat-shared-utils/test/helpers.test.js:131-142 lock in the property for us..kings != us/kings and us__kings != us/kings.
  • Malformed percent-encoding is correctly handled - the inner try/catch around decodeURIComponent falls back to the raw segment, and there is a regression test (%ZZkings does not throw).
  • JSDoc is thorough, documents the case-folding decision deliberately, and gives concrete worked examples.

Issues

Important (Should Fix)

Hostname regex still per-character; host-vs-path collision class remains (packages/spacecat-shared-utils/src/helpers.js:64)

The path regex was strengthened with + (the change requested last round), but the hostname regex on line 64 was not updated:

const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();   // no +

So consecutive non-alphanumeric characters in a hostname produce __ runs - the same delimiter used to join host with path segments. This collapses the host/path boundary. Verified by running the function:

https://nba.com..foo  -> nba_com__foo
https://nba.com/foo   -> nba_com__foo   COLLIDE
https://a..b          -> a__b
https://a/b           -> a__b           COLLIDE

The JSDoc claim at lines 43-46 ("the double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide") therefore still does not hold across the full key.

Severity bounded by: (a) onboarding is admin-gated, (b) DNS will not resolve hostnames with consecutive empty labels (nba.com..foo), so reaching this in production requires either a manually-constructed baseURL with a typo or a corrupted normalization upstream. That keeps it at Important rather than Critical, but the same collision class the PR was filed to close still exists at the host/path boundary; closing it now (greenfield, no live subpath sites) is materially cheaper than discovering it post-merge.

Two acceptable shapes, pick one:

  1. Mirror the path regex so the host also collapses runs:

    const host = url.hostname.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase();

    Add a regression test: resolveCustomerSecretsName('https://nba.com..foo', ctx) must not equal resolveCustomerSecretsName('https://nba.com/foo', ctx). Note that this introduces a different same-host collision (a..b.com and a.b.com both -> a_b_com), which is acceptable because well-formed hostnames do not contain consecutive non-alphanumeric characters anyway.

  2. Scope the JSDoc claim so the documented invariant matches what the code provides:

    "Distinct path tuples on the same hostname cannot collide. Hostnames are sanitized per-character so adjacent separators in a hostname can produce __. Callers must pass normalized hostnames."

    Plus a one-line comment at the regex explaining the asymmetry, so the next reader does not "fix" it and silently break path-side injectivity.

Recommend option 1: it is a one-character change, eliminates the host-vs-path class entirely, and keeps the JSDoc claim honest.

Minor (Nice to Have)

  • No length guard. The function does not bound output length. AWS Secrets Manager rejects names over 512 bytes; a deeply-nested path with long segments will fail at write time rather than at onboarding. Admin-gated, so this is a self-DoS rather than a tenant-impacting concern. Consider truncating or rejecting customer over a sane bound (e.g. 200 chars to leave headroom for basePath and version), or fold this into a future shared siteIdentityComponents helper.

Recommendations

  • Pick option 1 above and add the one-line regression test. After this PR ships, the same host-side fix should land on the companion generateDataFolder in spacecat-api-service so the two functions stay aligned on the injectivity property.

Assessment

Ready to merge? With one fix.

Reasoning: Every prior path-side finding is genuinely addressed and the algorithm is the right shape. The remaining issue is asymmetric: path was tightened with +, host was not. The fix is one character; without it, the JSDoc still overstates the guarantee. After applying it (or scoping the JSDoc), this is good to go.

Next Steps

  1. Update the hostname regex (or scope the JSDoc) so the injectivity claim matches the code.
  2. Add the regression test for the host-vs-path class.
  3. The Minor length-guard item is optional and can land separately.

let customer;
try {
customer = new URL(baseURL).host.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();
const url = new URL(baseURL);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important - Hostname regex was not strengthened with +; host-vs-path collision class remains

The path regex (line 71) now uses [^a-zA-Z0-9]+ (collapse runs), but the hostname regex on this line still uses [^a-zA-Z0-9] (per-char). That asymmetry means consecutive non-alphanumeric characters in a hostname produce __ runs, the same delimiter used to join host with path segments. Verified by running the function:

https://nba.com..foo -> nba_com__foo
https://nba.com/foo  -> nba_com__foo   COLLIDE
https://a..b         -> a__b
https://a/b          -> a__b           COLLIDE

The JSDoc claim at lines 43-46 ("the double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide") therefore still doesn't hold across the full key.

Severity bounded by: (a) onboarding is admin-gated, (b) DNS will not resolve hostnames with consecutive empty labels. But this is the same collision class the PR was filed to close, just at the host/path boundary.

Recommended fix (one character):

const host = url.hostname.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase();

And add a regression test: f('https://nba.com..foo', ctx) !== f('https://nba.com/foo', ctx).

Alternative: scope the JSDoc claim and add a comment here explaining the asymmetry is deliberate. Either resolves the inconsistency.

…ions

Apply + quantifier to the hostname regex so consecutive non-alphanumeric
characters (e.g. nba.com..foo) collapse to a single underscore, matching
the already-applied run-collapse on path segments. Without this, a hostname
like nba.com..foo produced nba_com__foo, colliding with the key for
nba.com/foo. Add regression test to pin the fix.
@alinarublea alinarublea requested a review from dzehnder May 5, 2026 12:18
…ntation

- Restructure to isolate URL-parse errors from business logic (prevent
  unrelated errors being masked as 'Invalid baseURL')
- Add protocol/hostname guard: reject non-http(s) and empty-hostname URLs
- Extract inline sanitize() with leading/trailing _ trim and filter(Boolean)
  on segments, preventing degenerate all-underscore segments from artifacts
- Update JSDoc: scope the injectivity claim honestly (punctuation-only
  segment variants still collide, document this), add @SInCE with LLMO-4186
- Fix percent-encoded segment test to use property assertion (equal to
  its literal-unicode counterpart) rather than pinning a lossy output value
- Add: port-equality test, http/https validation tests
Copy link
Copy Markdown
Contributor

@dzehnder dzehnder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alinarublea,

Thanks for the iteration since the last round. The host-vs-path collision class is genuinely closed and the JSDoc rewrite is honest about the residual lossy-sanitization limit. Three Important items came up from the new commits: an injectivity claim that the empty-segment drop quietly invalidates, a unicode-normalization gap that admits the same kind of silent collision the PR is closing elsewhere, and the new protocol guard duplicating an existing in-package predicate.

Strengths

Previously flagged issues now addressed:

  • Hostname collapse-runs (prior Important, review #3): the sanitize helper applies replace(/[^a-zA-Z0-9]+/g, '_') plus leading/trailing underscore trim to BOTH host and path segments, and the new test does not collide hostname with consecutive dots and a subpath with the same text pins the property correctly. https://nba.com..foo -> nba_com_foo and https://nba.com/foo -> nba_com__foo are now distinct.
  • JSDoc punctuation-collision admission (prior Minor framing): the rewrite explicitly documents that us-kings and us_kings collide on the same sanitized segment as "an inherent limitation of lossy sanitization," with an @since 2.x LLMO-4186 note. Honest engineering and the right posture for a load-bearing key.
  • Encoded-vs-literal equivalence test: the percent-encoded test now asserts https://nba.com/k%C3%B6nig resolves identically to https://nba.com/könig, which pins the semantic invariant rather than a brittle expected string. Plus the regex sanity check catches host-prefix regressions.
  • Defense-in-depth on protocol degeneration: before this PR, new URL('javascript:alert(1)'), data:, blob:, and similar parsed with empty hostname and would all have produced a single shared key. The new throw closes that class. (Layering concern below in Important findings, but the underlying improvement is real.)
  • Caller compatibility verified: every internal caller of resolveCustomerSecretsName (content-client, ims-client, google-client, auth-service google/sharepoint handlers, autofix-worker) passes site.getBaseURL(). The Site model validates baseURL via isValidUrl, which only accepts http(s). The new throw is a dead branch in practice for known callers, so the behavior change is backward-compatible.

Issues

Important (Should Fix)

  1. JSDoc injectivity claim is still false because of the new .filter(Boolean) (packages/spacecat-shared-utils/src/helpers.js:43-46, source at :80-91).

    The new doc states: "URLs that differ in path structure (different number of segments, or segments with different alphanumeric content) produce distinct keys." This is wrong. The new commit added trim-edges plus .filter(Boolean) after sanitize, so a path segment consisting entirely of non-alphanumeric characters reduces to empty and is silently dropped. As a result, two URLs with different segment counts produce the same key:

    • https://nba.com/-/foo -> nba_com__foo == https://nba.com/foo -> nba_com__foo
    • https://nba.com/foo/-/ -> nba_com__foo == https://nba.com/foo
    • https://nba.com/--- -> nba_com == https://nba.com (host-only)

    This is a NEW collision class introduced by this commit (the prior code did not trim or drop empty segments). The author's prior pushback ("no production sites carry a non-empty path") still bounds the blast radius, but the @since 2.x JSDoc is now part of a documented contract that callers may rely on, and the "different number of segments produce distinct keys" claim is provably wrong.

    Pick one:

    • Weaken the JSDoc to acknowledge that segments which sanitize to empty are dropped (extending the existing "lossy sanitization" caveat).
    • Or keep an empty-but-present marker (don't drop empty segments; emit a single _ placeholder) so segment count is preserved.

    Add an adversarial test pinning whichever behavior you choose, e.g. https://nba.com/-/foo !== https://nba.com/foo (or ===, per choice).

  2. Unicode NFC vs NFD inputs produce different keys with no test or doc note (packages/spacecat-shared-utils/src/helpers.js:84, the sanitize helper).

    The new percent-encoded test uses NFC for both literal and encoded forms, so it passes. NFD is not exercised. decodeURIComponent does no unicode normalization, so visually-identical inputs in different forms produce different sanitized keys:

    • https://nba.com/k%C3%B6nig (NFC ö = U+00F6) -> nba_com__k_nig
    • https://nba.com/ko%CC%88nig (NFD o + U+0308 combining diaeresis) -> nba_com__ko_nig

    A copy-paste of the same brand name from two different sources (browser address bar vs macOS filesystem; iOS clipboard) lands in different secret buckets. This is the same class of silent collision the PR is closing in the opposite direction (different keys for the same logical tenant). One-line fix: add .normalize('NFC') to sanitize before the regex. Removes the entire NFC/NFD ambiguity. Also add one regression test pinning the equivalence.

  3. throw on non-http(s) duplicates isValidUrl and lives at the wrong layer (packages/spacecat-shared-utils/src/helpers.js:69-71).

    The new guard:

    if (!url.hostname || !['http:', 'https:'].includes(url.protocol)) {
      throw new Error('Invalid baseURL: must be an http(s) URL with a hostname');
    }

    reimplements isValidUrl from packages/spacecat-shared-utils/src/functions.js:208, which already does url.protocol === 'http:' || url.protocol === 'https:' after parse. The well-trodden caller pattern is composeBaseURL(domain) -> site.baseURL -> resolveCustomerSecretsName(site.getBaseURL(), ctx) - the function is a key-derivation leaf; protocol enforcement is an input-contract concern centralized upstream.

    Pick one:

    • Drop the throw and rely on composeBaseURL + isValidUrl upstream.
    • Or keep the throw but call isValidUrl(baseURL) so the package has one definition of "valid baseURL." Lower-risk localized fix. The hostname check is fine to keep as a defensive assertion since new URL('http://') parses but yields empty hostname.

    Worth noting: the throw is safely dead-branch given Site model validation, so this is a layering/duplication finding rather than a runtime defect.

Minor (Nice to Have)

  1. @since 2.x is not a real version (packages/spacecat-shared-utils/src/helpers.js:46). Current spacecat-shared-utils is 1.113.0, and there are no other @since tags in this package to mirror. 2.x is a range, not a version. Either drop the tag or replace with the LLMO-4186 reference inline.

  2. IPv6 hostname literals collapse to a non-injective key. URL parses https://[::1]/kings with url.hostname === '::1', which sanitize reduces to empty after trim. The final key starts with the leading __ from the path-join, so multiple distinct IPv6 hosts ([::1], [::2], [fe80::1]) all reduce to the same key prefix. Almost certainly unreachable for production customer baseURLs, but the new hostname guard accepts these inputs without throwing. Either add an IPv4-or-FQDN-only validator (probably overkill), or note it in the JSDoc punctuation-collision paragraph.

  3. Symmetric port assertion. The new port test only checks port-vs-no-port equivalence. Add the converse same-port-different-path direction (nba.com:8443/kings vs nba.com:8443/lakers -> different) to pin that the path delimiter still works once a port is present. One line.

Recommendations

  • The lossy-sanitization framing in the JSDoc names the symptom; the architectural choice is "lossy sanitize chosen over hashing." A future iteration could append a short content-derived suffix (first 8 hex of sha256("${host}/${decodedPath}")) so collisions become numerically improbable rather than syntactically guaranteed. Out of scope for this PR; track as a follow-up.
  • The new validation duplicates isValidUrl; consider whether Site.baseURL write-path validation should also reject url.username || url.password (today, a userinfo-bearing baseURL passes isValidUrl and could collide with a legitimate baseURL on the same hostname). Strictly defense-in-depth on the admin-write path, not a current vulnerability. Out of scope here.

Out of scope, worth tracking

  • PR #1577 (this) and adobe/spacecat-api-service#2315 now carry near-identical sanitize logic with different separators (_/__ vs -/--). A shared siteIdentityComponents(baseURL) helper in spacecat-shared-utils would keep the two repos from drifting on the URL-to-identifier contract. Mentioning once.
  • Length guard against AWS Secrets Manager 512-char name limit. Not currently load-bearing for realistic baseURLs.

Assessment

Ready to merge? With fixes.

Reasoning: the prior Important (host-vs-path collision) is genuinely closed and the new tests pin the right invariants. The three new Important findings are all small, localized fixes: a JSDoc tweak (or a one-line code change to preserve segment count), a one-line .normalize('NFC') plus one regression test, and a one-line isValidUrl swap. All three are within budget for this PR. After they land, this is approve as-is; the architecture and security posture are sound.

Next Steps

  1. Pick the JSDoc-vs-empty-segment-marker decision and add the adversarial test.
  2. Add .normalize('NFC') to sanitize and a regression test for the NFD form.
  3. Route the protocol guard through isValidUrl, or drop it.
  4. Minor items (@since cleanup, IPv6 note, symmetric port test) can land here or in a follow-up.

* The hostname (per RFC 1035, case-insensitive) and each URL path segment are
* percent-decoded and individually sanitized: runs of non-alphanumeric characters
* are replaced with a single `_`, leading/trailing `_` are trimmed, and the
* result is lowercased. Segments that reduce to empty after sanitization are
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: this JSDoc claim ("URLs that differ in path structure (different number of segments, or segments with different alphanumeric content) produce distinct keys") is provably wrong because of the new .filter(Boolean) at line ~89. A path segment that sanitizes to empty (e.g. /-/, /---) is silently dropped, so:

  • https://nba.com/-/foo -> nba_com__foo == https://nba.com/foo
  • https://nba.com/--- -> nba_com == https://nba.com

New collision class introduced by this commit (the prior code did not trim/drop). Pick: weaken the doc to admit the empty-segment drop, OR preserve segment count by emitting a single _ placeholder for empty segments. Add an adversarial test pinning the chosen behavior.

const sanitize = (s) => s.replace(/[^a-zA-Z0-9]+/g, '_').replace(/^_+|_+$/g, '').toLowerCase();
const host = sanitize(url.hostname);
const segments = url.pathname.split('/').filter(Boolean)
.map((seg) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: NFC vs NFD unicode forms produce different keys today. decodeURIComponent doesn't normalize. Same logical brand pasted from two sources lands in different secret buckets:

  • https://nba.com/k%C3%B6nig (NFC ö = U+00F6) -> nba_com__k_nig
  • https://nba.com/ko%CC%88nig (NFD o + U+0308) -> nba_com__ko_nig

This is the same class of silent collision the PR is closing elsewhere, in the opposite direction (different keys for the same logical tenant). One-line fix: add .normalize('NFC') to sanitize before the regex. Plus one regression test asserting NFC/NFD equivalence.

* only, causing subpath sites on the same domain to share a secret. LLMO-4186.
*/
export function resolveCustomerSecretsName(baseURL, ctx) {
const basePath = '/helix-deploy/spacecat-services/customer-secrets';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: the new if (!url.hostname || !['http:','https:'].includes(url.protocol)) reimplements isValidUrl from packages/spacecat-shared-utils/src/functions.js:208. Two concerns: (a) duplicate logic in the same package; (b) the throw lives at the leaf of a key-derivation function rather than at the input boundary (composeBaseURL / Site model write-path / isValidUrl).

Pick: drop the throw and rely on composeBaseURL + isValidUrl upstream, OR call isValidUrl(baseURL) here so the package has one valid-baseURL predicate. The hostname-empty check is fine to keep as a defensive assertion (new URL('http://') parses but yields empty hostname).

- Replace U+2014 em-dashes in JSDoc with plain hyphens (project style)
- Change @SInCE 2.x to @SInCE next (version not yet cut)
- Guard against hostnames that sanitize to empty string (e.g. https://---/)
  and add a corresponding test
@alinarublea alinarublea requested a review from dzehnder May 5, 2026 13:54
Copy link
Copy Markdown
Contributor

@danieljchuser danieljchuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alinarublea,

Thanks for the iteration - the degenerate-host guard in the latest commit is a real improvement, and the em-dash/since cleanup shows good attention to prior feedback. The core fix is solid: the __ delimiter, run-collapse sanitizer, and upstream Site model validation close the original cross-tenant credential collision class. Four Important items worth addressing and two Minor items, but none are blockers given the bounded blast radius.

Strengths

  • The degenerate-host guard at helpers.js:77 (if (!host)) closes the last reachable path where URL parsing accepts a hostname that sanitizes to empty. Defense-in-depth is appropriate even with Site validation upstream.
  • The try/catch around decodeURIComponent at helpers.js:88-92 is correctly bounded (catches only URIError), commented, and falls back to the raw segment. Does not silently swallow programmer errors.
  • Single-pass [^a-zA-Z0-9]+ regex at helpers.js:79 makes the __ delimiter unambiguous by construction. Collision class closed at the regex level.
  • Test additions at helpers.test.js:93-203 cover the adversarial collision classes (us..kings vs us/kings, us__kings vs us/kings, nba.com..foo vs nba.com/foo), port equivalence, protocol errors, and degenerate hostname. Solid regression net.
  • Previously flagged issues now addressed: em-dashes removed, @since precision improved, degenerate-host guard added.

Issues

Important (Should Fix)

  1. Hostname sanitizer regression extends beyond subpath sites - helpers.js:79

    The regex changed from [^a-zA-Z0-9] (per-char) to [^a-zA-Z0-9]+ (run-collapse) plus leading/trailing trim. This affects ALL hostnames with consecutive or leading/trailing non-alphanumerics, not just subpath sites:

    • xn--fiq228c.com (IDN/punycode) -> old xn__fiq228c_com, new xn_fiq228c_com
    • foo..bar.com -> old foo__bar_com, new foo_bar_com
    • _dmarc.example.com -> old _dmarc_example_com, new dmarc_example_com

    The "no production subpath sites" justification only addresses the subpath dimension. Before merging, run a one-off pass over the production site list to confirm no hostname falls into these categories and document the result on the PR.

  2. JSDoc injectivity claim still overstated - helpers.js:47-50

    "URLs that differ in path structure (different number of segments...) produce distinct keys" remains false. .filter(Boolean) at line 91 drops segments that sanitize to empty, so /foo and /-/foo both produce host__foo. The doc only calls out the punctuation-vs-underscore collision, not the empty-segment collapse.

    Fix: add one sentence: "Segments that contain only non-alphanumeric characters sanitize to empty and are dropped, so paths differing only by such segments produce the same key (e.g. /foo and /-/foo)."

  3. NFC vs NFD unicode normalization gap - helpers.js:88

    decodeURIComponent returns whatever bytes the URL carried. NFC precomposed o (U+00F6) and NFD decomposed o + U+0308 sanitize to different keys for visually identical URLs. One-line fix: add .normalize('NFC') after decodeURIComponent before sanitize.

  4. Protocol/hostname guard duplicates isValidUrl - helpers.js:69-71

    isValidUrl lives in the same package (functions.js) and already validates http(s) + hostname. This function open-codes the same check with three near-identical "Invalid baseURL" error strings. Two sources of truth will drift when isValidUrl evolves.

    Fix: call isValidUrl(baseURL) first and throw a single error, then new URL for parsing. Keep the empty-after-sanitize host throw as the only inline guard (derivation-specific).

Minor (Nice to Have)

  1. Test for malformed percent-encoding pins non-throw but not resulting key - helpers.test.js:172-175. Add an explicit .to.equal(...) assertion to lock the raw-fallback contract, not just the non-throw property.

  2. No positive http:// test - helpers.test.js:93-203. Every successful-resolution test uses https://. The protocol guard accepts both, but the http:// branch is unexercised. One assertion would close the gap.

Recommendations

  • Add a defensive max-length check on customer before delegating to resolveSecretsName. AWS Secrets Manager imposes a 512-char name limit; deep pathnames could exceed it silently and surface as runtime AccessDenied.
  • Promote the empty catch on decodeURIComponent to at least log.debug so silent key-routing changes from malformed percent-encoding are recoverable from logs.
  • Consider extracting sanitize to a named, exported helper. The companion change in api-service#2315 reimplements the same logic with -/-- separators; a shared primitive would prevent drift.

Out of scope, worth tracking

  • Companion PR adobe/spacecat-api-service#2315 derives the same conceptual identity with -/-- separators. A shared sanitizeUrlToSlug(baseURL, { separator }) utility would prevent the two from drifting. Cheapest to converge now before both formats calcify across callers.
  • Backward-compat claim in the PR description applies to root-domain sites only. Any existing subpath site that shared a key will miss its secret after deployment. Worth a deployment runbook entry covering affected sites and rollback plan.

Assessment

Approved - the cross-tenant credential collision class is correctly closed and the security posture is sound. The findings above are worth addressing but none are blockers given the bounded blast radius (no production subpath sites, admin-gated onboarding, Site model validation upstream). The hostname sanitizer regression (finding 1) is the most important to verify with a production audit before deploying.

throw new Error('Invalid baseURL: must be a valid URL');
}
if (!url.hostname || !['http:', 'https:'].includes(url.protocol)) {
throw new Error('Invalid baseURL: must be an http(s) URL with a hostname');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: the regex change from per-char to run-collapse affects ALL hostnames with consecutive non-alphanumerics, not just subpath sites. Punycode (xn--) prefixes, double-dot hostnames, and leading-underscore hostnames all get different keys. Run a one-off pass over the production site list to confirm no hostname falls into these categories before deploying.

* are replaced with a single `_`, leading/trailing `_` are trimmed, and the
* result is lowercased. Segments that reduce to empty after sanitization are
* dropped. Sanitized parts are joined with `__` as the path-segment delimiter.
*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: this claim remains false for the empty-segment case. .filter(Boolean) at line 91 drops segments that sanitize to empty, so /foo and /-/foo both produce host__foo. Add a sentence noting that segments containing only non-alphanumeric characters are dropped.

}
const segments = url.pathname.split('/').filter(Boolean)
.map((seg) => {
let decoded = seg;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: decodeURIComponent does no Unicode normalization. NFC precomposed and NFD decomposed forms of the same character produce different sanitized keys. One-line fix: decoded = decodeURIComponent(seg).normalize('NFC') before sanitize.

@alinarublea alinarublea merged commit db95707 into main May 6, 2026
5 checks passed
@alinarublea alinarublea deleted the fix/subpath-customer-secrets-name branch May 6, 2026 10:17
solaris007 pushed a commit that referenced this pull request May 6, 2026
## [@adobe/spacecat-shared-utils-v1.115.1](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.115.0...@adobe/spacecat-shared-utils-v1.115.1) (2026-05-06)

### Bug Fixes

* **utils:** include subpath in resolveCustomerSecretsName to prevent credential collisions ([#1577](#1577)) ([db95707](db95707))
@solaris007
Copy link
Copy Markdown
Member

🎉 This PR is included in version @adobe/spacecat-shared-utils-v1.115.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants