Skip to content

fix: url map 404 on different host (#35268)#35345

Open
gortiz-dotcms wants to merge 11 commits intomainfrom
issue-35268-urlmapped-returning-404
Open

fix: url map 404 on different host (#35268)#35345
gortiz-dotcms wants to merge 11 commits intomainfrom
issue-35268-urlmapped-returning-404

Conversation

@gortiz-dotcms
Copy link
Copy Markdown
Member

@gortiz-dotcms gortiz-dotcms commented Apr 16, 2026

Summary

Fixes the runtime 404 reported in #35268, which occurs when a URL-mapped content item lives on a different site (siteB) from the one receiving the request (siteA).

What happened

The typical setup:

  • siteA has a list page that queries and renders content from multiple sites, including siteB.
  • Clicking a siteB content item navigates to siteA's URL-mapped detail page (e.g. https://siteA.com/museum_decade/1920).
  • The content type's URL pattern and detail page are both configured on siteA (dotCMS does not allow a global detail page path by design, so the detail page cannot be shared globally).

The runtime 404 was caused by URLMapAPIImpl.buildContentQuery(), which restricted the Elasticsearch query to content on the current host and SYSTEM_HOST:

+(conhost:siteA_id OR conhost:SYSTEM_HOST)

Content created on siteB has conhost=siteB_id, so it was never found by the query. Because getContentlet() returned null, isUrlPattern() returned false, and processURLMap() was never even reached — making the previous fix at getDetailPageUri() irrelevant for this scenario.

What was done

buildContentQuery() now accepts a restrictToHost flag and the content lookup uses a two-step strategy:

  1. First query — restricted to current site + SYSTEM_HOST (existing behavior, no performance impact for the common case).
  2. Fallback query — if the first returns nothing, retries without any host restriction so cross-site content can be found.

Once the siteB content is resolved, getDetailPageUri() correctly returns siteA's configured detail page (since the content type is registered on siteA), and the page renders normally.

An integration test (processURLMap_contentOnDifferentSite_shouldResolveViaFallback) was added to URLMapAPIImplTest to cover this exact scenario.

Test plan

  • URLMapAPIImplTest#processURLMap_contentOnDifferentSite_shouldResolveViaFallback — new test: content on siteB is resolved when requested from siteA
  • Full URLMapAPIImplTest suite — all existing tests must continue to pass
./mvnw verify -pl :dotcms-integration -Dcoreit.test.skip=false -Dit.test=URLMapAPIImplTest

Closes #35268

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 16, 2026

Claude finished @gortiz-dotcms's task in 4m 6s —— View job


🔍 dotCMS Backend Review

[🔴 Critical] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:308-313

The unrestricted fallback query (restrictToHost=false) removes the conhost: filter entirely, so any published content matching the URL-map pattern across all sites in the installation is eligible to be returned. The downstream checkContentPermission call (line 328) only verifies that the requesting user has READ on the returned contentlet — it does not verify that the contentlet belongs to a site reachable from the requesting site. In a multi-tenant deployment where content is world-readable, a request on site-A.com can silently resolve and render content from site-B.com, site-C.com, or any staging site whose URL-map field values match the requested URI.

if (contentletSearches.isEmpty()) {
    contentletSearches =
        ContentUtils.pull(this.buildContentQuery(matches, contentType, context, false),
            0, 2, "score", this.wuserAPI.getSystemUser(), true);
    // No conhost post-filter — any tenant's content can be returned
}

💡 After the fallback resolves results, post-filter by owning site: keep only contentlets whose conhost equals context.getHost().getIdentifier() or Host.SYSTEM_HOST. If nothing passes, return null rather than surfacing another tenant's content.


[🟠 High] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:101-105

The request-attribute cache key is built by raw string concatenation with | as separator. urlMapContext.getUri() is user-supplied, so a URI containing | can produce a key that collides with a different (uri, host, lang, mode) tuple. Because checkContentPermission is skipped on cache hits (lines 109–112), a carefully crafted URI can cause the wrong contentlet — potentially one the caller has no direct permission to access — to be returned without a permission check.

final String cacheKey = REQUEST_CACHE_KEY
        + urlMapContext.getUri() + "|"          // attacker-controlled
        + urlMapContext.getHost().getIdentifier() + "|"
        + urlMapContext.getLanguageId() + "|"
        + urlMapContext.getMode().name();

💡 Use a separator that cannot appear in a URI (e.g., NUL char \0 is prohibited by RFC 3986) or hash the tuple: String.valueOf(Objects.hash(uri, hostId, langId, mode)).


[🟠 High] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:52

CONTENTLET_NOT_FOUND is a static final instance of the fully mutable Contentlet class. Contentlet exposes dozens of public setters. Any code path that obtains this sentinel via request.getAttribute(cacheKey) before the identity guard at line 111 and calls a setter will silently corrupt it for all subsequent requests — the identity check cached == CONTENTLET_NOT_FOUND still passes even after mutation, but callers introspecting the sentinel's field map see corrupted data.

private static final Contentlet CONTENTLET_NOT_FOUND = new Contentlet();
// Contentlet.setInode(), setHost(), setStringProperty(), ... all available

💡 Use an opaque, non-Contentlet marker (private static final Object CONTENTLET_NOT_FOUND = new Object()) and store the cache value as a typed wrapper, or use Optional<Contentlet> as the attribute value and eliminate the sentinel entirely.


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:111

The request-attribute value is cast to Contentlet without an instanceof guard. If any other code ever stores a non-Contentlet value under the same key prefix, this throws a ClassCastException with no context. The sentinel identity check protects only the CONTENTLET_NOT_FOUND case.

return cached == CONTENTLET_NOT_FOUND ? null : (Contentlet) cached;

💡 Add else if (cached instanceof Contentlet) return (Contentlet) cached; and fall through to re-resolve if the type is unexpected.


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:400-401

context.getHost().getIdentifier() is appended into the Lucene query without passing through ESUtils.escapeExcludingSlashIncludingSpace(). Host identifiers are normally UUIDs (safe), but the method accepts whatever is stored in the Host object's property map, so a non-UUID identifier (e.g., from a test or a schema migration) would corrupt the query or bypass the host filter silently.

query.append(" +(conhost:")
     .append(context.getHost().getIdentifier())   // not escaped
     .append(" OR conhost:")

💡 Wrap with ESUtils.escapeExcludingSlashIncludingSpace(context.getHost().getIdentifier()).


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:395

String concatenation inside a StringBuilder.append() call creates an implicit intermediate String, negating the purpose of using StringBuilder.

.append(" +" + ESMappingConstants.VARIANT + ":")

💡 Replace with .append(" +").append(ESMappingConstants.VARIANT).append(":") — three separate appends.


Next steps

  • 🔴 Fix locally and push — these need your judgment
  • 🟠 / 🟡 You can ask me to handle mechanical fixes inline: @claude fix <issue description> in URLMapAPIImpl.java
  • Every new push triggers a fresh review automatically

@github-actions github-actions bot added the Area : Backend PR changes Java/Maven backend code label Apr 16, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 16, 2026

🔍 dotCMS Backend Review

[🟠 High] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:308-313

The unrestricted fallback query (restrictToHost=false) removes all conhost filtering and runs as the system user, bypassing per-tenant boundaries. Any published contentlet of the matching content type across any site in the installation can be returned if its URL-map field values happen to match the requested URI. The downstream checkContentPermission call only verifies READ on the final contentlet — it does not verify that the content is reachable from the requesting site. In multi-tenant deployments with public READ grants at the repo root, a request on Tenant A can silently surface and render content from Tenant B, C, or an internal staging site.

contentletSearches =
    ContentUtils.pull(this.buildContentQuery(matches, contentType, context, false), 0, 2,
        "score", this.wuserAPI.getSystemUser(), true);
// No conhost filter → all tenants searched as system user

💡 After the fallback resolves a contentlet, post-filter by owning site: verify contentlet.getHost() equals the requesting site's identifier or Host.SYSTEM_HOST. Or restrict the fallback to content types whose detailPage identifier is explicitly registered on the current site.


[🟠 High] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:51

CONTENTLET_NOT_FOUND is a static final instance of the fully mutable Contentlet class. Contentlet exposes dozens of public setters (setInode(), setLanguageId(), setStringProperty(), setBinary(), etc.). Any code path that reads this sentinel from the request attribute map and calls a setter will silently corrupt it for all subsequent requests on any thread, since the identity check cached == CONTENTLET_NOT_FOUND still passes after mutation.

private static final Contentlet CONTENTLET_NOT_FOUND = new Contentlet();
// public setInode(), setLanguageId(), setProperty(), setBinary(), ... all available

💡 Use an opaque, non-Contentlet marker: private static final Object CONTENTLET_NOT_FOUND = new Object(); — the identity check continues to work without the mutability risk.


[🟠 High] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:32,107

This PR introduces javax.servlet.http.HttpServletRequest and HttpServletRequestThreadLocal into URLMapAPIImpl, which is core business logic. While HttpServletRequestThreadLocal appears in other *APIImpl classes in the codebase, directly coupling a business-logic class to the servlet API makes it impossible to call in non-servlet contexts (batch jobs, CLI, background tasks) without a live request — and the if (request != null) guard silently disables the cache in those contexts rather than failing loudly.

import javax.servlet.http.HttpServletRequest;
...
final HttpServletRequest request = HttpServletRequestThreadLocal.INSTANCE.getRequest();

💡 Move request-scope caching to the caller (the filter or servlet that invokes isUrlPattern/processURLMap), or encapsulate behind a RequestScopeCache interface injected as a dependency so the business class is not coupled to the servlet container.


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/cms/urlmap/URLMapAPIImpl.java:101-105

The request-attribute cache key is built by concatenating the raw, user-supplied URI with | as a separator. A URI containing | can produce a key that collides with a different URI+host combination. Because the permission check (checkContentPermission) is skipped for cache hits (lines 109-112), a cache key collision can cause the wrong contentlet — potentially one the caller is not permitted to see — to be returned from cache without a permission check.

final String cacheKey = REQUEST_CACHE_KEY
        + urlMapContext.getUri() + "|"       // URI is attacker-controlled
        + urlMapContext.getHost().getIdentifier() + "|"
        + urlMapContext.getLanguageId() + "|"
        + urlMapContext.getMode().name();

💡 Use a separator that cannot appear in a URI (e.g. NUL char is prohibited by RFC 3986), or hash the tuple: String.valueOf(Objects.hash(uri, hostId, langId, mode)).


Next steps

  • 🟠 Fix locally and push — these need your judgment
  • 🟡 You can ask me to handle mechanical fixes inline: @claude fix <issue description> in URLMapAPIImpl.java
  • Every new push triggers a fresh review automatically

gortiz-dotcms and others added 10 commits April 16, 2026 11:50
…allback, log hardening

- [Critical] Fix getDetailPageUri to fall back to the configured detail-page identifier
  when the current site has no page at the same path (e.g. detail page lives on a global
  host). Previously returned Optional.empty(), causing processURLMap to return empty and
  yield a 404 even when the detail page was accessible.

- [High] Add request-scoped cache to getContentlet(UrlMapContext) so that isUrlPattern()
  and processURLMap(), which are both called on the same HTTP request, share a single
  ES lookup. Without the cache, each call issued up to 2 ES queries with the cross-site
  fallback; now the second call reuses the first result via a request attribute.

- [Medium] Sanitize the Host header value before writing it to the debug log to prevent
  log-forging via newline injection (replaceAll on CR/LF/TAB).

- [Medium] Change test assertion in processURLMap_contentOnDifferentSite_shouldResolveViaFallback
  from getName()/title comparison to getIdentifier() for an unambiguous identity check
  that is not vulnerable to score-ranking non-determinism.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Content types scoped to a specific site may not reliably allow content
creation on a different site. The cross-site fallback test now registers
the URL-mapped content type on SYSTEM_HOST (idiomatic for cross-site URL
mapping), while the content item and the request still come from different
sites — so the host-restricted query still misses the content and the
fallback is exercised as intended.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TestDataUtils.getNewsContent uses IndexPolicy.FORCE which writes
immediately but does not wait for ES to make the document queryable.
The fallback (site-agnostic) query may therefore see no results.

Publishing the content triggers IndexPolicy.WAIT_FOR, ensuring the
document is committed and searchable before the assertion runs.
LIVE mode is used in the UrlMapContext to match the published version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gortiz-dotcms gortiz-dotcms marked this pull request as ready for review April 20, 2026 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

URL mapping returns 404 at runtime when detail page uses custom Page content type (follow-up to #35149)

1 participant