fix: add flag to enable the use of rendered HTML title (#35153)#35265
fix: add flag to enable the use of rendered HTML title (#35153)#35265gortiz-dotcms merged 1 commit intomainfrom
Conversation
|
Claude finished @gortiz-dotcms's task in 4m 27s —— View job dotCMS Backend Review
🔍 dotCMS Backend Review[🟡 Medium]
// line 418 — unchecked cast before isSet()
&& UtilMethods.isSet((String) map.get("title"))) {
res.setTitle((String) map.get("title")); // line 419 — duplicate map lookup💡 Extract with an final Serializable rawTitle = map.get("title");
final String htmlTitle = rawTitle instanceof String ? (String) rawTitle : null;
if (Config.getBooleanProperty("SITE_SEARCH_USE_HTML_TITLE", false)
&& UtilMethods.isSet(htmlTitle)) {
res.setTitle(htmlTitle);
} else {
res.setTitle(page.getTitle());
}[🟡 Medium]
if (Config.getBooleanProperty("SITE_SEARCH_USE_HTML_TITLE", false)💡 Rename to: Next steps
|
Rollback Safety Analysis
Result: ✅ Safe To Rollback The single changed file —
Why this is safe: The change is purely behavioral and opt-in via a config flag that defaults to |
dotCMS Backend Review[Medium] [Medium] |
Problem
For non–URL-mapped HTML pages, site search was unconditionally overwriting the title parsed from the rendered HTML (via Tika metadata) with
page.getTitle()— the CMS Page Title field from page properties.Legacy site search (
StaticHTMLPageBundler) indexed Tika metadata from the bundled HTML file, so the indexed title followed the rendered<title>tag (as produced by theme /html_head.vtl/ dotSeo). URL-mapped pages already preserved this behavior through a conditional fallback. Normal HTML pages did not.This meant partners and clients who built pages assuming site search matched the rendered
<title>(which often differs from the CMS Title field due to SEO themes) would get unexpected results after migrating to the current publisher.Fixes #35153
What Was Done
Introduced a backward-compatible opt-in config flag
SITE_SEARCH_USE_HTML_TITLE(defaultfalse) inESSiteSearchPublisher.processHTMLPageAsContent().Before: the title was always set from
page.getTitle(), discarding whatever Tika parsed from the rendered HTML.After:
SITE_SEARCH_USE_HTML_TITLE=false(default): behavior is identical to today —page.getTitle()is always used. No impact on existing installs.SITE_SEARCH_USE_HTML_TITLE=true: the rendered<title>from Tika metadata is used when present, falling back topage.getTitle()only when absent. This matches the existing logic already in place for URL-mapped pages (processUrlMap) and the legacy bundler behavior.The change is localized to a single conditional block in
ESSiteSearchPublisher. No schema changes, no new APIs, no behavioral change unless the flag is explicitly enabled.Test Plan
CMS_TITLE_FOR_SEARCH_TEST.<title>HTML_TITLE_IN_HEAD_ONLY</title>in<head>. Publish the page.<title>HTML_TITLE_IN_HEAD_ONLY</title>.SITE_SEARCH_USE_HTML_TITLE=false(default). Confirm search result title isCMS_TITLE_FOR_SEARCH_TEST.SITE_SEARCH_USE_HTML_TITLE=true, re-run full index. Confirm search result title isHTML_TITLE_IN_HEAD_ONLY.🤖 Generated with Claude Code
This PR fixes: #35153