Skip to content

feat(opensearch) #34610: vendor-neutral DotIndexWindowLimitException replaces QueryPhaseExecutionException#35827

Closed
fabrizzio-dotCMS wants to merge 5 commits into
mainfrom
feat/opensearch-neutral-index-window-limit-34610
Closed

feat(opensearch) #34610: vendor-neutral DotIndexWindowLimitException replaces QueryPhaseExecutionException#35827
fabrizzio-dotCMS wants to merge 5 commits into
mainfrom
feat/opensearch-neutral-index-window-limit-34610

Conversation

@fabrizzio-dotCMS
Copy link
Copy Markdown
Member

Summary

Introduces DotIndexWindowLimitException extends DotRuntimeException as the vendor-neutral signal for result-window overflow (max_result_window exceeded, default 10 000 in both ES and OS).

Problem

WorkflowAPIImpl.findContentletsToProcess() previously caught QueryPhaseExecutionException — an Elasticsearch-specific type — to detect when pagination exceeded the index window. This kept an ES vendor import in business logic and had no equivalent for the OpenSearch path.

Additionally, both factory implementations (ContentFactoryIndexOperationsES and ContentFactoryIndexOperationsOS) were swallowing the vendor exception silently and returning ERROR_HIT (empty result), so the original typed check in WorkflowAPIImpl was never triggered. The caller saw an empty list with no diagnostic information.

Solution

The detection now happens at the factory layer, which is the only place that knows the vendor-specific exception types:

Provider Detection strategy
Elasticsearch instanceof QueryPhaseExecutionException — typed, no string matching
OpenSearch e.error().type().equals("illegal_argument_exception") + reason contains "Result window" — structured error field, not free-text

When detected, both implementations throw DotIndexWindowLimitException instead of silently returning empty results. WorkflowAPIImpl now catches it by type with no vendor imports.

Changes

File Change
DotIndexWindowLimitException (new) Vendor-neutral DotRuntimeException subclass
ContentFactoryIndexOperationsES Detect QueryPhaseExecutionException, throw neutral exception
ContentFactoryIndexOperationsOS Detect result-window OpenSearchException, throw neutral exception
WorkflowAPIImpl Replace instanceof QueryPhaseExecutionException with catch (DotIndexWindowLimitException e); remove ES vendor import

Test plan

  • ./mvnw compile -pl :dotcms-core -DskipTests — no compilation errors
  • grep -r "QueryPhaseExecutionException" dotCMS/src/main/java/com/dotmarketing/portlets/workflows/ returns empty
  • grep -r "org.elasticsearch" dotCMS/src/main/java/com/dotmarketing/portlets/workflows/business/WorkflowAPIImpl.java returns empty

Closes #34610

🤖 Generated with Claude Code

…utral DotIndexWindowLimitException

Introduce DotIndexWindowLimitException (extends DotRuntimeException) as a
vendor-neutral signal for result-window overflow (max_result_window exceeded).

Both factory implementations now detect the vendor-specific error and throw
the neutral exception before it escapes the factory layer:
- ES: instanceof QueryPhaseExecutionException (typed check, no string matching)
- OS: e.error().type() == "illegal_argument_exception" + reason contains
  "Result window" (structured error field, not free-text message parsing)

WorkflowAPIImpl.findContentletsToProcess() now catches
DotIndexWindowLimitException directly — no more string matching, no more
ES vendor import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…om PR #34610

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fabrizzio-dotCMS added a commit that referenced this pull request May 25, 2026
Remove @ESCoupled annotation and extra blank line — WorkflowAPIImpl
is excluded from this PR. Vendor-neutral exception handling tracked
separately in #35827.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WorkflowAPIImpl is not part of this PR. Restore to exact origin/main state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix dead instanceof branch in ContentFactoryIndexOperationsES: QPE does not
  extend ElasticsearchStatusException|IndexNotFoundException|SearchPhaseExecutionException,
  so the old check was unreachable. Now mirrors OS: check instanceof
  ElasticsearchStatusException + message.contains("Result window")
- Add Logger.debug before DotIndexWindowLimitException throw in both ES and OS
  factories so operators retain the failing-index breadcrumb
- Re-include WorkflowAPIImpl: replace QueryPhaseExecutionException instanceof
  (ES vendor type) with DotIndexWindowLimitException catch — the stated goal
  of this PR; remove org.elasticsearch import
- Add DotIndexWindowLimitException(String index, Throwable) constructor so
  factories can embed the offending index name in the exception message

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dotCMS dotCMS deleted a comment from claude Bot May 26, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 26, 2026

Claude finished @fabrizzio-dotCMS's task in 2m 38s —— View job


PR Review

  • Read changed files
  • Analyze vendor-neutral exception design
  • Check ES & OS detection logic
  • Check WorkflowAPIImpl integration
  • Post review

Findings

1. PR description is out of date and contradicts the code (medium)
The body states ES detection uses instanceof QueryPhaseExecutionException — "typed, no string matching". The actual code (ContentFactoryIndexOperationsES.java:108-115) catches ElasticsearchStatusException | IndexNotFoundException | SearchPhaseExecutionException and then does instanceof ElasticsearchStatusException && e.getMessage().contains("Result window") — same string-match approach used on the OS side. Commit 971fed63 even calls out why the typed approach was abandoned (QPE didn't extend any of the caught types, so the original branch was dead). The PR body should be updated before merge so the table doesn't mislead future readers/auditors.

2. Behavior change is broader than "fixes WorkflowAPIImpl" (HIGH — please audit)
Pre-PR, both factories swallowed the window-overflow error and returned ERROR_HIT (empty). Now they throw DotIndexWindowLimitException (a DotRuntimeException). That's a contract change for every caller of ContentletAPI.search(...) / indexSearch(...) / searchHits(...), not just workflows. Direct callers that previously got an empty list and continued will now see an unchecked runtime exception bubble up. Grep shows many such sites — none of them catch the new type:

  • PersonaAPIImpl.java:280,339,519
  • KeyValueAPIImpl.java:179,214
  • HTMLPageAssetAPIImpl.java:395,565
  • BrowserAPIImpl.java:512
  • BinaryExporterServlet.java:311
  • PublisherAPIImpl.java:330,336
  • ContentFromEmailJob.java:318,336
  • ContentsWebAPI.java:877
  • RelationshipUtil.java:104
  • PublishAuditUtil.java:106
  • MacroCacheRefresherJob.java:85
  • PushPublishigDependencyProcesor.java:1162
  • WorkflowHelper.java:191

Any of these triggered with offset ≥ 10 000 will now throw to their caller instead of returning empty. Some of these are background jobs / servlets / Velocity tools where the previous silent-empty was arguably wrong, but flipping it to a DotRuntimeException for the whole codebase deserves at least a one-paragraph "impact" section in the PR body, and ideally a quick check that the high-traffic paths (BrowserAPIImpl, HTMLPageAssetAPIImpl, Velocity) tolerate the new exception. If the safer migration is to keep returning ERROR_HIT for non-workflow paths and add a separate explicit "paginated" entrypoint that throws, that would isolate the contract change.

3. String-match on "Result window" is the only thing distinguishing this from "any illegal_argument / status error" (medium)
Both branches collapse to: catch a broad parent exception, then .contains("Result window"). That literal comes from server-side error text and is not part of any documented contract. An ES/OS version bump that re-phrases the message — or a translated cluster — would silently break detection and revert callers to the "empty list" path again, exactly the bug this PR is fixing. Worth either (a) adding a regression test that asserts the typed exception is thrown when the server replies with the canonical message, or (b) leaving a code comment marking the string as a known fragility so a future upgrader knows to re-verify.

4. No tests added (medium)
Given (2) and (3), at minimum a unit test that feeds the factories a mocked exception with the expected error().type() / message and asserts DotIndexWindowLimitException is thrown would protect both the detection conditions and the upstream WorkflowAPIImpl catch. The "Test plan" checklist in the PR body is grep + compile only.

5. Import ordering in WorkflowAPIImpl.java (nit)
Line 62 inserts DotIndexWindowLimitException between DotDataException and DotDataValidationException, breaking the alphabetical group. Alphabetically it belongs after DotHibernateException (line 64).

import com.dotmarketing.exception.DotDataException;
import com.dotmarketing.exception.DotIndexWindowLimitException;   // ← out of order
import com.dotmarketing.exception.DotDataValidationException;
import com.dotmarketing.exception.DotHibernateException;

Fix this →

6. Single-arg constructor is unused (nit)
DotIndexWindowLimitException(Throwable cause) exists in the new class but neither factory uses it (both go through the (index, cause) variant). Either drop it or keep it intentional — small dead-code surface.

7. Index name in both log line and exception message (nit)
At ContentFactoryIndexOperationsES.java:113-114 and ContentFactoryIndexOperationsOS.java:116-117 the index name is logged at DEBUG and then re-encoded into the exception. Not wrong, just redundant — the exception alone already carries the breadcrumb up the stack.

Net

The local refactor is fine. The non-obvious risk is #2 — the silent-to-throwing flip happens at the factory boundary so it affects everyone, not just the workflow path called out in the title. I'd want either an audit of those call sites confirming they tolerate the throw, or scoping the throw to the workflow entry point. The other items are smaller (description, tests, import order).
Branch

@fabrizzio-dotCMS
Copy link
Copy Markdown
Member Author

Not worth the effort. Will remove the Exception at decommission time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[TASK] Migrate Utilities and Supporting Classes

1 participant