Skip to content

fix: add flag to enable the use of rendered HTML title (#35153)#35265

Merged
gortiz-dotcms merged 1 commit intomainfrom
issue-35153-wrong-page-title-site-search
Apr 8, 2026
Merged

fix: add flag to enable the use of rendered HTML title (#35153)#35265
gortiz-dotcms merged 1 commit intomainfrom
issue-35153-wrong-page-title-site-search

Conversation

@gortiz-dotcms
Copy link
Copy Markdown
Member

@gortiz-dotcms gortiz-dotcms commented Apr 8, 2026

Problem

For non–URL-mapped HTML pages, site search was unconditionally overwriting the title parsed from the rendered HTML (via Tika metadata) with page.getTitle() — the CMS Page Title field from page properties.

Legacy site search (StaticHTMLPageBundler) indexed Tika metadata from the bundled HTML file, so the indexed title followed the rendered <title> tag (as produced by theme / html_head.vtl / dotSeo). URL-mapped pages already preserved this behavior through a conditional fallback. Normal HTML pages did not.

This meant partners and clients who built pages assuming site search matched the rendered <title> (which often differs from the CMS Title field due to SEO themes) would get unexpected results after migrating to the current publisher.

Fixes #35153

What Was Done

Introduced a backward-compatible opt-in config flag SITE_SEARCH_USE_HTML_TITLE (default false) in ESSiteSearchPublisher.processHTMLPageAsContent().

Before: the title was always set from page.getTitle(), discarding whatever Tika parsed from the rendered HTML.

After:

  • SITE_SEARCH_USE_HTML_TITLE=false (default): behavior is identical to today — page.getTitle() is always used. No impact on existing installs.
  • SITE_SEARCH_USE_HTML_TITLE=true: the rendered <title> from Tika metadata is used when present, falling back to page.getTitle() only when absent. This matches the existing logic already in place for URL-mapped pages (processUrlMap) and the legacy bundler behavior.

The change is localized to a single conditional block in ESSiteSearchPublisher. No schema changes, no new APIs, no behavioral change unless the flag is explicitly enabled.

Note: Changing this flag requires a full site search re-index to take effect. Incremental indexing will not backfill pages that have not changed.

Test Plan

  • Create a non–URL-mapped HTML page. Set Page Title (page properties) to CMS_TITLE_FOR_SEARCH_TEST.
  • Assign a template that hardcodes <title>HTML_TITLE_IN_HEAD_ONLY</title> in <head>. Publish the page.
  • Verify via View Source that the live URL renders <title>HTML_TITLE_IN_HEAD_ONLY</title>.
  • Run a full site search index with SITE_SEARCH_USE_HTML_TITLE=false (default). Confirm search result title is CMS_TITLE_FOR_SEARCH_TEST.
  • Set SITE_SEARCH_USE_HTML_TITLE=true, re-run full index. Confirm search result title is HTML_TITLE_IN_HEAD_ONLY.
  • Verify URL-mapped pages are unaffected by the flag (they follow their own existing path).

🤖 Generated with Claude Code

This PR fixes: #35153

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 8, 2026

Claude finished @gortiz-dotcms's task in 4m 27s —— View job


dotCMS Backend Review

  • Gather context and read conventions
  • Get Java diff
  • Run security review
  • Run database review
  • Run Java standards review
  • Run REST API review
  • Consolidate findings and post review

🔍 dotCMS Backend Review

[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchPublisher.java:418

Unchecked cast (String) map.get("title") can throw ClassCastException at runtime. map is typed Map<String, Serializable>, so map.get("title") returns Serializable. If Tika ever stores the title as a non-String Serializable (unlikely but possible), this cast throws at index time and is silently swallowed by the surrounding catch (Exception e) block. The title value is also fetched from the map twice — once for the isSet() guard and once for setTitle().

// line 418 — unchecked cast before isSet()
&& UtilMethods.isSet((String) map.get("title"))) {
    res.setTitle((String) map.get("title"));   // line 419 — duplicate map lookup

💡 Extract with an instanceof guard (Java 11-compatible):

final Serializable rawTitle = map.get("title");
final String htmlTitle = rawTitle instanceof String ? (String) rawTitle : null;
if (Config.getBooleanProperty("SITE_SEARCH_USE_HTML_TITLE", false)
        && UtilMethods.isSet(htmlTitle)) {
    res.setTitle(htmlTitle);
} else {
    res.setTitle(page.getTitle());
}

[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchPublisher.java:417

Property name SITE_SEARCH_USE_HTML_TITLE uses legacy SCREAMING_SNAKE_CASE. Per dotCMS Java standards, new config properties must follow hierarchical domain-driven naming. The canonical key should be site.search.use-html-title — the environment variable override (DOT_SITE_SEARCH_USE_HTML_TITLE) resolves the same either way.

if (Config.getBooleanProperty("SITE_SEARCH_USE_HTML_TITLE", false)

💡 Rename to: Config.getBooleanProperty("site.search.use-html-title", false)


Next steps

  • 🟡 You can ask me to handle these mechanical fixes inline: @claude fix unchecked cast and property naming in ESSiteSearchPublisher.java
  • Every new push triggers a fresh review automatically

@github-actions github-actions bot added the Area : Backend PR changes Java/Maven backend code label Apr 8, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 8, 2026

Rollback Safety Analysis

  • Read rollback-unsafe categories reference
  • Retrieve PR diff
  • Analyze against all unsafe categories

Result: ✅ Safe To Rollback

The single changed file — ESSiteSearchPublisher.java — introduces a new Config.getBooleanProperty("SITE_SEARCH_USE_HTML_TITLE", false) guard around the title-selection logic in processHTMLPageAsContent(). The change was checked against every category in the reference document:

Category Verdict
C-1 Structural Data Model Change ✅ No DB schema touched
C-2 Elasticsearch Mapping Change ✅ No mapping change — same title field, different runtime value only
C-3 Content JSON Model Version Bump ✅ Not touched
C-4 DROP TABLE / DROP Column ✅ Not present
H-1 One-Way Data Migration ✅ No data transformation; flag defaults to false (old behavior)
H-2 RENAME TABLE / COLUMN ✅ Not present
H-3 PK Restructuring ✅ Not present
H-4 New Content Type Field ✅ Not present
H-5 Storage Provider Change ✅ Not present
H-6 DROP PROCEDURE/FUNCTION ✅ Not present
H-7 NOT NULL column without default ✅ Not present
M-1 Column Type Change ✅ Not present
M-2 Push Publishing Bundle Format ✅ Bundle XML format unchanged
M-3 REST / GraphQL API Contract ✅ No API surface changed
M-4 OSGi Plugin API ✅ No public interfaces changed

Why this is safe: The change is purely behavioral and opt-in via a config flag that defaults to false. Rolling back to N-1 simply reverts to always using page.getTitle() for site search indexing. Any stale title values in the site search index would be refreshed on the next scheduled reindex — no backup or manual DB intervention required.

View job run

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 8, 2026

dotCMS Backend Review

[Medium] ESSiteSearchPublisher.java:418 - Unchecked cast ClassCastException risk

[Medium] ESSiteSearchPublisher.java:417 - Legacy property naming

@gortiz-dotcms gortiz-dotcms added this pull request to the merge queue Apr 8, 2026
Merged via the queue into main with commit af997cb Apr 8, 2026
50 checks passed
@gortiz-dotcms gortiz-dotcms deleted the issue-35153-wrong-page-title-site-search branch April 8, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Site search: normal HTML pages use CMS Page Title instead of rendered <title>; request backward-compatible option

2 participants