Skip to content

Add Akka circuit breaker for ElasticSearch calls#105

Merged
DominicBM merged 3 commits intomainfrom
es-circuit-breaker
Apr 23, 2026
Merged

Add Akka circuit breaker for ElasticSearch calls#105
DominicBM merged 3 commits intomainfrom
es-circuit-breaker

Conversation

@DominicBM
Copy link
Copy Markdown
Contributor

@DominicBM DominicBM commented Apr 19, 2026

Summary

  • Wraps all ES calls in an Akka CircuitBreaker in ElasticSearchResponseHandler
  • After maxFailures consecutive call timeouts, the breaker opens and immediately returns ElasticSearchResponseFailure for resetTimeout, giving ES breathing room without continued request pressure
  • Logs state transitions (open / half-open / closed) for production visibility
  • All three parameters configurable via env vars for tuning without a redeploy

Configuration (with defaults)

Env var Default Meaning
ES_CIRCUIT_BREAKER_MAX_FAILURES 3 Consecutive timeouts before opening
ES_CIRCUIT_BREAKER_CALL_TIMEOUT 10s Per-call timeout
ES_CIRCUIT_BREAKER_RESET_TIMEOUT 20s How long to stay open before retrying

Background

Resurrects #34 (shelved in 2022 pending performance testing), ported to the current dpla.api.v2 package. Motivated by recurring ES flakiness under expensive bot-driven queries — the breaker prevents a struggling ES from being hammered further during recovery.

The callTimeout is set to 10s (vs. the original 3s) to avoid tripping on legitimately slow-but-valid queries under normal load.

Test plan

  • Confirm existing tests pass (sbt test)
  • Deploy to staging and verify breaker state transitions appear in logs under simulated ES slowness
  • Tune ES_CIRCUIT_BREAKER_CALL_TIMEOUT if needed based on observed p99 ES latency

🤖 Generated with Claude Code

Add Akka circuit breaker for ElasticSearch calls

Wraps ElasticSearch calls with an Akka CircuitBreaker to protect ES from cascading failures and reduce pressure during ES slowness/outages.

Changes

  • Configuration (src/main/resources/application.conf):

    • Adds elasticSearch.circuitBreaker with:
      • maxFailures (default 3 — override via ES_CIRCUIT_BREAKER_MAX_FAILURES)
      • callTimeout (default 10s — override via ES_CIRCUIT_BREAKER_CALL_TIMEOUT)
      • resetTimeout (default 20s — override via ES_CIRCUIT_BREAKER_RESET_TIMEOUT)
    • Call timeout increased to 10s to reduce false trips on slow-but-valid queries.
  • Implementation:

    • CircuitBreaker is constructed in ElasticSearchClient using the new config and logs state transitions:
      • onOpen logs "ElasticSearch circuit breaker opened" (warn)
      • onHalfOpen logs "ElasticSearch circuit breaker half-open, testing ES" (info)
      • onClose logs "ElasticSearch circuit breaker closed" (info)
    • The breaker is passed into per-request session behaviors (search, fetch, multi-fetch, random).
    • Each ES HTTP request is executed as breaker.withCircuitBreaker { Http().singleRequest(request) } wrapped inside the existing withConcurrencyLimit, so when the breaker is open the request is short-circuited and the HTTP request is not sent.
    • ElasticSearchResponseHandler now special-cases CircuitBreakerOpenException and logs "Request rejected: ElasticSearch circuit breaker is open" at warn level; rejected requests are mapped to ElasticSearchResponseFailure as before.
  • Concurrency behavior:

    • Existing semaphore-based concurrency limiter (ES_MAX_CONCURRENT_REQUESTS, default 32) remains in place and continues to cap in-flight ES requests per API instance.

Environment Variables / Secrets

  • Adds three tunable environment variables (no removals or renames):
    • ES_CIRCUIT_BREAKER_MAX_FAILURES (default 3)
    • ES_CIRCUIT_BREAKER_CALL_TIMEOUT (default 10s)
    • ES_CIRCUIT_BREAKER_RESET_TIMEOUT (default 20s)
  • No new AWS Secrets Manager keys were added or changed.

Deployment & Operational Notes

  • Requires redeploy to take effect: merging to main does not auto-deploy. After merge, trigger the repo's deploy workflow / CodePipeline / ECS deployment (repo uses manual dispatch for deploys).
  • Runtime tuning: monitor staging/production logs for circuit-breaker state transitions and adjust ES_CIRCUIT_BREAKER_CALL_TIMEOUT / ES_CIRCUIT_BREAKER_MAX_FAILURES / ES_CIRCUIT_BREAKER_RESET_TIMEOUT based on observed p99 ES latency and failure characteristics.

Testing

  • Run existing tests (sbt test).
  • In staging, simulate ES slowness to verify breaker opens, half-opens, and closes, and that rejected requests result in ElasticSearchResponseFailure without sending requests to ES.

Impact Summary

  • No database migrations.
  • No changes to public API response shapes or endpoints.
  • No changes to shared infra (CodePipeline/CodeBuild/ECS task definitions/IAM) in this PR — only application code and config.
  • Security: no new credential handling or auth changes introduced. Logs now include circuit-breaker state transitions for visibility.

After maxFailures consecutive call timeouts, the breaker opens and
immediately returns ElasticSearchResponseFailure for resetTimeout,
giving ES breathing room to recover without continued request pressure.
All three parameters are overrideable via env vars for production tuning.

Resurrects #34 (shelved 2022 for performance testing), ported
to the current dpla.api.v2 package structure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ab7bb642-ead5-4749-8ba0-5686529def97

📥 Commits

Reviewing files that changed from the base of the PR and between 38ad54e and 1908b6a.

📒 Files selected for processing (1)
  • src/main/scala/dpla/api/v2/search/ElasticSearchClient.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/main/scala/dpla/api/v2/search/ElasticSearchClient.scala

Walkthrough

Adds an Akka CircuitBreaker for ElasticSearch HTTP calls, a new elasticSearch.circuitBreaker config block, wraps Http requests with the breaker in the ES client, and special-cases logging for CircuitBreakerOpenException in the response handler.

Changes

Cohort / File(s) Summary
Configuration
src/main/resources/application.conf
Added elasticSearch.circuitBreaker block with maxFailures, callTimeout, and resetTimeout, each overridable via environment variables.
ElasticSearch client
src/main/scala/dpla/api/v2/search/ElasticSearchClient.scala
Constructs an Akka CircuitBreaker from config, attaches open/half-open/closed logging handlers, threads breaker into per-session request paths, and wraps Http().singleRequest(...) calls with breaker.withCircuitBreaker.
Response handler
src/main/scala/dpla/api/v2/search/ElasticSearchResponseHandler.scala
Special-cases akka.pattern.CircuitBreakerOpenException in ReturnFinalResponse logging (logs a warning when breaker is open); other error logging unchanged.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ESClient as ElasticSearchClient
    participant Breaker as CircuitBreaker
    participant HTTP as Akka-HTTP
    participant ES as ElasticSearch

    Client->>ESClient: send search request
    ESClient->>Breaker: withCircuitBreaker { perform request }
    Breaker->>HTTP: singleRequest(...)
    HTTP->>ES: HTTP/REST call
    ES-->>HTTP: response
    HTTP-->>Breaker: Future completes
    Breaker-->>ESClient: Future result or short-circuit failure
    ESClient-->>Client: ElasticSearchResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Cursor API fix #92: Also modifies ElasticSearchClient.scala to protect Http().singleRequest call sites (uses a semaphore-based limiter instead of a CircuitBreaker).

Suggested reviewers

  • mdellabitta
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and specifically summarizes the main change: adding an Akka circuit breaker for ElasticSearch calls.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch es-circuit-breaker

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/main/scala/dpla/api/v2/search/ElasticSearchResponseHandler.scala (1)

75-82: ⚠️ Potential issue | 🟠 Major

Circuit breaker wraps an already-started Future, preventing it from protecting ElasticSearch.

The futureHttpResponse passed in ProcessElasticSearchResponse is initiated in ElasticSearchClient (lines 212-223) via Http().singleRequest(request) inside withConcurrencyLimit. By the time this Future reaches the handler, the HTTP request is already in-flight and the semaphore permit is already held.

In the handler (line 76), breaker.withCircuitBreaker(futureHttpResponse) receives an already-evaluated Future. Although CircuitBreaker.withCircuitBreaker uses call-by-name semantics (body: => Future[T]), it can only prevent execution of code that hasn't run yet. Once the Future is evaluated and the HTTP call is underway, the circuit breaker can only:

  1. Fail the Future immediately if the breaker is open (but the request is already sent to ES)
  2. Track the timeout/success for future decisions

This means ES still receives and processes requests even when the circuit breaker is open, defeating its core purpose: reducing pressure on a struggling cluster.

To properly protect ES, the circuit breaker must wrap the Http().singleRequest() call directly in ElasticSearchClient, preventing the HTTP request from being initiated when the breaker is open. This requires either making the breaker accessible from the client or restructuring the code so the HTTP call happens inside the handler after the circuit breaker check.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/scala/dpla/api/v2/search/ElasticSearchResponseHandler.scala` around
lines 75 - 82, The circuit breaker is being applied to an already-started Future
in ProcessElasticSearchResponse, so it cannot prevent the HTTP request from
being sent; change the design so the breaker wraps the actual
Http().singleRequest(...) call inside ElasticSearchClient (where the Future is
created in withConcurrencyLimit) instead of wrapping futureHttpResponse in
ElasticSearchResponseHandler; either (a) make the CircuitBreaker instance
available to ElasticSearchClient and call breaker.withCircuitBreaker around
Http().singleRequest(request) before returning the Future, or (b) move the HTTP
invocation into the handler so ProcessElasticSearchResponse executes
breaker.withCircuitBreaker { Http().singleRequest(request) } directly, ensuring
the request is not initiated when the breaker is open. Ensure references:
ProcessElasticSearchResponse, ElasticSearchClient, withConcurrencyLimit,
Http().singleRequest, and breaker.withCircuitBreaker are updated accordingly.
🧹 Nitpick comments (1)
src/main/scala/dpla/api/v2/search/ElasticSearchResponseHandler.scala (1)

119-127: Consider distinguishing circuit breaker rejections in logs.

When the breaker is open, the error will be a CircuitBreakerOpenException. Logging this the same as actual ES failures may cause confusion during incident response. Consider logging breaker rejections at a different level or with distinct messaging.

💡 Suggested improvement
         case ReturnFinalResponse(response, replyTo, error) =>
           // Log error if there is one
           error match {
+            case Some(_: akka.pattern.CircuitBreakerOpenException) =>
+              context.log.warn("Request rejected: ElasticSearch circuit breaker is open")
             case Some(e) =>
               context.log.error(
                 "Failed to process ElasticSearch response:", e
               )
             case None => // no-op
           }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/scala/dpla/api/v2/search/ElasticSearchResponseHandler.scala` around
lines 119 - 127, Logs currently treat all errors in the ReturnFinalResponse
handler the same; update the error handling in ElasticSearchResponseHandler so
that when error is a CircuitBreakerOpenException (or matches its fully qualified
type) you log a distinct message and/or lower severity (e.g., context.log.warn
or context.log.info) indicating a circuit-breaker rejection, otherwise keep the
existing context.log.error for real ES failures; locate the match on error in
the case ReturnFinalResponse(response, replyTo, error) and branch on error.get
(or pattern-match Some(e: CircuitBreakerOpenException) vs Some(e)) to implement
the different messages and levels.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/main/scala/dpla/api/v2/search/ElasticSearchResponseHandler.scala`:
- Around line 75-82: The circuit breaker is being applied to an already-started
Future in ProcessElasticSearchResponse, so it cannot prevent the HTTP request
from being sent; change the design so the breaker wraps the actual
Http().singleRequest(...) call inside ElasticSearchClient (where the Future is
created in withConcurrencyLimit) instead of wrapping futureHttpResponse in
ElasticSearchResponseHandler; either (a) make the CircuitBreaker instance
available to ElasticSearchClient and call breaker.withCircuitBreaker around
Http().singleRequest(request) before returning the Future, or (b) move the HTTP
invocation into the handler so ProcessElasticSearchResponse executes
breaker.withCircuitBreaker { Http().singleRequest(request) } directly, ensuring
the request is not initiated when the breaker is open. Ensure references:
ProcessElasticSearchResponse, ElasticSearchClient, withConcurrencyLimit,
Http().singleRequest, and breaker.withCircuitBreaker are updated accordingly.

---

Nitpick comments:
In `@src/main/scala/dpla/api/v2/search/ElasticSearchResponseHandler.scala`:
- Around line 119-127: Logs currently treat all errors in the
ReturnFinalResponse handler the same; update the error handling in
ElasticSearchResponseHandler so that when error is a CircuitBreakerOpenException
(or matches its fully qualified type) you log a distinct message and/or lower
severity (e.g., context.log.warn or context.log.info) indicating a
circuit-breaker rejection, otherwise keep the existing context.log.error for
real ES failures; locate the match on error in the case
ReturnFinalResponse(response, replyTo, error) and branch on error.get (or
pattern-match Some(e: CircuitBreakerOpenException) vs Some(e)) to implement the
different messages and levels.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8c42f750-9bbe-4d65-ac46-51adb7600039

📥 Commits

Reviewing files that changed from the base of the PR and between 64d519e and ff68517.

📒 Files selected for processing (2)
  • src/main/resources/application.conf
  • src/main/scala/dpla/api/v2/search/ElasticSearchResponseHandler.scala

Move CircuitBreaker creation from ElasticSearchResponseHandler to
ElasticSearchClient so it wraps Http().singleRequest() directly inside
withConcurrencyLimit. Previously the breaker wrapped an already-started
Future, so it could not prevent requests from being sent to ES when open.
Also distinguish CircuitBreakerOpenException in error log (warn vs error).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DominicBM
Copy link
Copy Markdown
Contributor Author

Fixed both issues:

Major (outside-diff): Moved CircuitBreaker creation from ElasticSearchResponseHandler to ElasticSearchClient, where it now wraps Http().singleRequest() directly inside withConcurrencyLimit — the breaker check happens before the HTTP request is initiated. When the breaker is open, the request is never sent to ES.

Nitpick: Added a case Some(_: CircuitBreakerOpenException) branch in the ReturnFinalResponse handler that logs at warn level with a distinct message, so open-breaker rejections are distinguishable from actual ES failures.

@DominicBM
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@DominicBM DominicBM merged commit a5feba5 into main Apr 23, 2026
5 checks passed
@DominicBM DominicBM deleted the es-circuit-breaker branch April 23, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant