Skip to content

Conversation

@monshri
Copy link
Collaborator

@monshri monshri commented Jan 9, 2026

📝 This PR addresses the following:

Fix #1931

Previous implementation of OPA plugin had the following issue:
When the OPA plugin is enabled and evaluates policies, each policy check blocks the entire async worker until the HTTP request completes, which caused latency across all concurrent requests, etc.

Fix:

Key improvements:

  • Replaced all synchronous HTTP calls with asynchronous equivalents.
  • Added retry logic with exponential backoff for transient connection failures.
  • Introduced request timeout configuration to prevent indefinite hangs.
  • Enhanced logging to capture retry attempts and HTTP status errors.
        for attempt in range(max_retries):
            try:
                response = await client.post(url, json=payload, timeout=seconds)
                response.raise_for_status()
                logger.info(f"OPA POST succeeded on attempt {attempt + 1}")
                return response
            except (httpx.TimeoutException, httpx.RequestError, httpx.ConnectError):
                logger.warning(f"Retry attempt to connect to OPA server {attempt + 1}")
                if attempt == max_retries - 1:
                    raise
                await asyncio.sleep(2**attempt)
            except httpx.HTTPStatusError as e:
                logger.error(f"OPA POST HTTP error (attempt {attempt + 1}, status {e.response.status_code}): 
{e.response.text[:200]}")
  1. Added configurable timeout and maximum retry attempts as part of plugin configuration config.yaml
    config:
      # Plugin config dict passed to the plugin constructor
      opa_base_url: "http://127.0.0.1:8181/v1/data/"
      opa_client_retries: 3
      opa_client_timeout: 30s

@monshri monshri self-assigned this Jan 9, 2026
@monshri monshri added the enhancement New feature or request label Jan 9, 2026
@monshri monshri requested a review from crivetimihai as a code owner January 9, 2026 18:17
@crivetimihai crivetimihai self-assigned this Jan 9, 2026
@monshri monshri requested review from araujof and terylt January 9, 2026 18:52
response: Returns a response that's recieved from the OPA server. Raises error in cases the communication is not successful.
"""
match = re.match(r"(\d+)s", self.opa_config.opa_client_timeout.strip())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that could be done once in the init function of the Plugin? This would save processing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.,

# Parse once in __init__:
  def __init__(self, config: PluginConfig):
      ...
      # Parse timeout once
      match = re.match(r"(\d+)s", self.opa_config.opa_client_timeout.strip())
      self._timeout_seconds = float(match.group(1)) if match else 30.0

Copy link
Collaborator Author

@monshri monshri Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's a good point. I will make that change

Copy link
Member

@araujof araujof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Shriti, please look into the recommendations we made.

try:
rsp = requests.post(url, json=payload)
logger.info(f"OPA connection response '{rsp}'")
async with httpx.AsyncClient() as client:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new httpx.AsyncClient() is created and destroyed for every OPA policy evaluation. This means:

  • Connection establishment overhead on every request
  • No connection reuse/pooling
  • TCP handshake + TLS negotiation repeated for each policy check
  • Under high load with many plugin invocations, this becomes a severe bottleneck

Recommendation:

  # In __init__:
  self._http_client = httpx.AsyncClient(
      timeout=httpx.Timeout(self._timeout_seconds),
      limits=httpx.Limits(max_keepalive_connections=20, max_connections=100),
      http2=True  # Optional: enable HTTP/2 for better performance
  )

  # Add cleanup method:
  async def cleanup(self):
      await self._http_client.aclose()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set a default (those are max numbers) and allows for configuration, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

response: Returns a response that's recieved from the OPA server. Raises error in cases the communication is not successful.
"""
match = re.match(r"(\d+)s", self.opa_config.opa_client_timeout.strip())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.,

# Parse once in __init__:
  def __init__(self, config: PluginConfig):
      ...
      # Parse timeout once
      match = re.match(r"(\d+)s", self.opa_config.opa_client_timeout.strip())
      self._timeout_seconds = float(match.group(1)) if match else 30.0

logger.warning(f"Retry attempt to connect to OPA server {attempt + 1}")
if attempt == max_retries - 1:
raise
await asyncio.sleep(2**attempt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await asyncio.sleep(2**attempt) grows exponentially without limit:

  • Attempt 0: 1 second
  • Attempt 1: 2 seconds
  • Attempt 2: 4 seconds
  • Attempt 3: 8 seconds (if max_retries increased)

This could cause excessive delays if max_retries is increased.

Recommendation:

  # Add jitter and cap max delay
  import random
  delay = min(2**attempt, 10) + random.uniform(0, 1)
  await asyncio.sleep(delay)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense. Will make that change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@araujof
Copy link
Member

araujof commented Jan 9, 2026

Nice job on this PR. As noted in the inlined comments, a few suggestions:

  1. Connection Pooling: Consider reusing a single httpx.AsyncClient instance instead of creating new ones per request:
  # In __init__:
  self.http_client = httpx.AsyncClient(timeout=httpx.Timeout(30.0))

  # In _evaluate_opa_policy:
  rsp = await self._post_with_retry(client=self.http_client, url=url, payload=payload, ...)

  # Add cleanup method:
  async def cleanup(self):
      await self.http_client.aclose()

This would improve performance by avoiding connection establishment overhead on each OPA request.

  1. Regex Compilation: The timeout regex is compiled on every call (line 136):
  match = re.match(r"(\d+)s", self.opa_config.opa_client_timeout.strip())

Could be compiled once in init:

  # In __init__:
  self._timeout_pattern = re.compile(r"(\d+)s")

  # In _post_with_retry:
  match = self._timeout_pattern.match(self.opa_config.opa_client_timeout.strip())

Copy link
Member

@araujof araujof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@araujof araujof added performance Performance related items plugins labels Jan 10, 2026
@araujof araujof changed the title OPA plugin Optimization: Replace Synchronous HTTP Calls to Asynchronous feat: Replace Synchronous HTTP Calls with Asynchronous in OPA plugin Jan 10, 2026
@araujof araujof changed the title feat: Replace Synchronous HTTP Calls with Asynchronous in OPA plugin feat: replace Synchronous HTTP Calls with Asynchronous in OPA plugin Jan 10, 2026
monshri and others added 7 commits January 11, 2026 17:31
…to opa

Signed-off-by: Shriti Priya <shritip@ibm.com>
Signed-off-by: Shriti Priya <shritip@ibm.com>
Signed-off-by: Shriti Priya <shritip@ibm.com>
Signed-off-by: Shriti Priya <shritip@ibm.com>
Signed-off-by: Shriti Priya <shritip@ibm.com>
Signed-off-by: Shriti Priya <shritip@ibm.com>
- Rename cleanup() to shutdown() to match plugin framework interface
- Fix config key from opa_client_max_retries to opa_client_retries

The plugin framework calls shutdown() for cleanup, not cleanup().
The config.yaml key must match the schema field name.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai force-pushed the fix/opa_plugin_performance_optimization branch from d9ad3d4 to ace14e3 Compare January 11, 2026 18:01
- Pass keepalive_expiry to httpx.Limits (was parsed but unused)
- Add Pydantic Field validation for retries >= 1 (prevents 0/negative)
- Add validation for max_keepalive and max_connections >= 1

Addresses review feedback about config knobs having no effect
and retries=0 causing silent failures.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
- Add httpx>=0.28.0 to pyproject.toml dependencies (was missing)
- Clarify retries field: 1=single attempt, 3=up to 3 attempts

The httpx library is now explicitly declared since the plugin
switched from requests to httpx for async HTTP calls.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai merged commit 3a3ef5b into IBM:main Jan 11, 2026
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request performance Performance related items plugins

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize OPA plugin: Replace synchronous requests with async httpx client

4 participants