Skip to content

Conversation

@chandrasekharan-zipstack
Copy link
Contributor

@chandrasekharan-zipstack chandrasekharan-zipstack commented Sep 9, 2025

What

  • Created generic retry decorator factory supporting multiple service prefixes with configurable exception types
  • Enhanced platform service calls with improved logging and structured backoff details
  • Preserved existing sophisticated error detection logic for OSError errno codes
  • Bumped SDK version to v0.78.0

Why

  • Platform service calls were failing immediately on connection errors without any retry attempts
  • Better logging and observability for retry attempts and failures

How

  • Created generic factory: create_retry_decorator for flexible retry configurations
  • Environment configuration: Prefix-based env vars allow different retry configs per service

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

No, this PR should not break any existing features because:

  • Same error handling: Errors are still raised after exhausting retries, preserving existing error handling
  • Conservative defaults: Default retry configuration remains the same (3 retries with exponential backoff)
  • Preserved error detection: All existing OSError errno codes and HTTP status code logic maintained
  • Transparent to callers: The retry mechanism maintains the same API and behavior

Database Migrations

  • N/A

Dependencies Versions / Env Variables

Environment variables (per service prefix, e.g., PLATFORM_SERVICE, PROMPT_SERVICE):

  • {PREFIX}_MAX_RETRIES: Maximum retry attempts (default: 3)
  • {PREFIX}_MAX_TIME: Maximum total time in seconds (default: 60) (please advice if this should be more @ritwik-g)
  • {PREFIX}_BASE_DELAY: Initial delay in seconds (default: 1.0)
  • {PREFIX}_MULTIPLIER: Backoff multiplier (default: 2.0)
  • {PREFIX}_JITTER: Enable/disable jitter (default: true)

Examples:

  • Platform service: PLATFORM_SERVICE_MAX_RETRIES=5
  • Custom service: API_SERVICE_MAX_RETRIES=2, API_SERVICE_BASE_DELAY=2.0

Relevant Docs

  • N/A

Related Issues or PRs

Notes on Testing

  • Tested locally by using it in the backend
  • Didn't explicitly check in tools but it should work similarly

Screenshots

image

Checklist

I have read and understood the Contribution Guidelines.

…l backoff retry mechanism for platform service connections

- Implemented retry_utils module with configurable retry behavior
- Added @retry_on_connection_error decorator to platform service calls
- Supports exponential backoff with jitter to prevent thundering herd
- Configuration via environment variables for max retries, delays, and backoff factor
- Automatically retries on ConnectionError and errno 111 (Connection refused)
- Improved error handling and logging for better debugging
- Bumped SDK version to v0.78.0
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

Warning

Rate limit exceeded

@chandrasekharan-zipstack has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 55 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 2039147 and 987f11f.

📒 Files selected for processing (1)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)

Summary by CodeRabbit

  • New Features
    • Automatic retries with exponential backoff for platform and prompt service calls.
    • Retry behavior is configurable via environment variables (retries, time limits, delays, jitter).
  • Improvements
    • More resilient network handling with clearer, user-friendly error messages.
    • Enhanced logging of retry attempts and failures.
  • Chores
    • Version updated to v0.78.0.

Summary by CodeRabbit

  • New Features

    • Automatic retries with exponential backoff for platform and prompt service requests and adapter configuration fetches.
    • Clearer error messages and logging on connection issues, with guidance when retries are exhausted.
    • Retry behavior configurable via environment variables (retries, max time, base delay, multiplier, jitter). No API signature changes required.
  • Chores

    • Bumped SDK version to v0.78.0.

Walkthrough

Adds a retry utilities module and applies retry decorators to platform, prompt, and adapter service-call methods, updates error handling and logging, and bumps SDK version from v0.77.1 to v0.78.0. No public API signatures changed.

Changes

Cohort / File(s) Summary of Changes
Version bump
src/unstract/sdk/__init__.py
Updated __version__ from "v0.77.1" to "v0.78.0".
Retry utilities (new module)
src/unstract/sdk/utils/retry_utils.py
New retry helpers: is_retryable_error, backoff calculation, retry_with_exponential_backoff, create_retry_decorator, and preconfigured decorators retry_platform_service_call, retry_prompt_service_call. Reads env vars for config, supports jitter and logging.
Platform helper
src/unstract/sdk/platform.py
Added module logger, imported LogLevel and retry utils, decorated PlatformHelper._call_service with @retry_platform_service_call, replaced direct process exit with logging/tool error emission and raising ConnectionError, and updated docstring to mention retries and env config.
Adapter integration
src/unstract/sdk/adapter.py
Added module logger, imported retry decorator, applied @retry_platform_service_call to ToolAdapter._get_adapter_configuration, removed explicit ConnectionError handling inside the method (relying on decorator), preserved HTTPError re-raising, and wrapped public get_adapter_config call to raise SdkError on connection issues.
Prompt service integration
src/unstract/sdk/prompt.py
Imported and applied @retry_prompt_service_call to _call_service, updated docstring and import ordering.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Adapter as ToolAdapter
  participant Platform as PlatformHelper
  participant Prompt as PromptHelper
  participant Retry as Retry Decorator
  participant Svc as Remote Service

  rect rgb(245,245,255)
    note over Adapter,Platform: Adapter configuration fetch
    Client->>Adapter: get_adapter_config()
    Adapter->>Retry: _get_adapter_configuration()
    loop Retry attempts (env-configurable backoff)
      Retry->>Svc: HTTP request
      alt Success
        Svc-->>Retry: 2xx response
        Retry-->>Adapter: result
      else Retryable error
        Svc--x Retry: network/timeout/5xx
        Retry-->>Retry: wait(backoff+jitter)
      else Exhausted
        Retry-->>Adapter: raise ConnectionError
      end
    end
  end

  rect rgb(245,255,245)
    note over Platform: Generic platform call
    Client->>Platform: _call_service(...)
    Platform->>Retry: _call_service()
    loop Retry attempts
      Retry->>Svc: HTTP request
      alt Success
        Svc-->>Retry: 2xx
        Retry-->>Platform: result
      else Retryable error
        Svc--x Retry: transient failure
        Retry-->>Retry: wait/backoff
      else Give up
        Retry-->>Platform: raise ConnectionError
        Platform-->>Client: propagate ConnectionError (after logging/emitting tool error)
      end
    end
  end

  rect rgb(255,245,245)
    note over Prompt: Prompt service call
    Client->>Prompt: _call_service(...)
    Prompt->>Retry: _call_service()
    loop Retry attempts
      Retry->>Svc: HTTP request
      alt Success
        Svc-->>Retry: 2xx
        Retry-->>Prompt: result
      else Retryable error
        Svc--x Retry: transient failure
        Retry-->>Retry: wait/backoff
      else Exhausted
        Retry-->>Prompt: raise error
        Prompt-->>Client: propagate error
      end
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title concisely and accurately summarizes the main change introduced by this PR, specifically the addition of an exponential backoff retry mechanism for platform service connections, and follows the project’s prefix and tagging conventions without extraneous details.
Description Check ✅ Passed The description fully adheres to the repository template by providing clearly delineated “What,” “Why,” and “How” sections along with Relevant Docs, Related Issues or PRs, Dependencies Versions / Env Variables, Notes on Testing, Screenshots, and a Checklist, offering comprehensive and structured information for reviewers.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chandrasekharan-zipstack chandrasekharan-zipstack changed the title [FEAT] Added exponential backoff retry mechanism for platform service connections UN-2793 [FEAT] Added exponential backoff retry mechanism for platform service connections Sep 9, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/unstract/sdk/__init__.py (1)

1-6: Add missing Git tag and release notes for v0.78.0
Packaging is configured to pull the version from src/unstract/sdk/__init__.py, so the SDK version is already in sync.

  • Create and push the Git tag v0.78.0
  • Draft or update the release notes on GitHub for v0.78.0
src/unstract/sdk/platform.py (1)

130-135: Add explicit timeouts to external requests.
Without timeouts, calls can hang indefinitely; with retries this compounds latency.

@@
-                response = requests.post(
-                    url=url, json=payload, params=params, headers=req_headers
-                )
+                response = requests.post(
+                    url=url, json=payload, params=params, headers=req_headers, timeout=REQUEST_TIMEOUT
+                )
@@
-                response = requests.get(url=url, params=params, headers=req_headers)
+                response = requests.get(url=url, params=params, headers=req_headers, timeout=REQUEST_TIMEOUT)

Add a module-level default near the logger:

@@
 logger = logging.getLogger(__name__)
+REQUEST_TIMEOUT: tuple[float, float] = (3.05, 30.0)  # (connect, read) seconds
src/unstract/sdk/adapter.py (1)

66-67: Add explicit timeouts to the platform request.
Prevents indefinite hangs and bounds total retry latency.

-            response = requests.get(url, headers=headers, params=query_params)
+            response = requests.get(
+                url, headers=headers, params=query_params, timeout=REQUEST_TIMEOUT
+            )

Add a module-level default near the logger:

@@
 logger = logging.getLogger(__name__)
+REQUEST_TIMEOUT: tuple[float, float] = (3.05, 30.0)  # (connect, read) seconds
🧹 Nitpick comments (7)
src/unstract/sdk/utils/retry_utils.py (2)

121-201: Improve failure logging and formatting; align with linter hints.

Use logger.exception on final failure to capture traceback and avoid f-string coercions in logging calls.

-                        logger_instance.error(
-                            f"Failed '{func.__name__}' after {attempt + 1} attempt(s). "
-                            f"Error: {str(e)}"
-                        )
+                        logger_instance.exception(
+                            "Failed '%s' after %d attempt(s). Error: %s",
+                            func.__name__, attempt + 1, e
+                        )
@@
-                    logger_instance.warning(
-                        f"Retryable error in '{func.__name__}' "
-                        f"(attempt {attempt + 1}/{max_attempts}). "
-                        f"Error: {str(e)}. Retrying in {delay:.2f} seconds..."
-                    )
+                    logger_instance.warning(
+                        "Retryable error in '%s' (attempt %d/%d). Error: %s. Retrying in %.2f seconds...",
+                        func.__name__, attempt + 1, max_attempts, e, delay
+                    )

55-61: More robust env parsing for boolean jitter.

Accept common truthy values ("1", "yes", "on", "true") and falsy equivalents.

-            jitter=os.getenv(f"{prefix}_RETRY_JITTER", "true").lower() == "true",
+            jitter=os.getenv(f"{prefix}_RETRY_JITTER", "true").strip().lower()
+            in {"1", "true", "yes", "on"},
src/unstract/sdk/platform.py (3)

94-107: Docstring should mention retry configurability.
Add the env knobs so users know how to tune retries.

@@
-        This method automatically retries on connection errors with exponential backoff.
+        This method automatically retries on connection errors with exponential backoff.
+        Retry behavior is configurable via env:
+          - PLATFORM_SERVICE_MAX_RETRIES (default: 3)
+          - PLATFORM_SERVICE_INITIAL_DELAY (default: 1.0s)
+          - PLATFORM_SERVICE_MAX_DELAY (default: 60.0s)
+          - PLATFORM_SERVICE_BACKOFF_FACTOR (default: 2.0)
+          - PLATFORM_SERVICE_RETRY_JITTER (default: true)

140-146: Use logging.exception and avoid emitting user-facing error on every retry.
Log the stack once per failure; let the decorator own retry-attempt logs to reduce noise.

-            logger.error(f"Connection error to platform service: {connect_err}")
-            msg = (
-                "Unable to connect to platform service. Might attempt to retry, "
-                "please contact admin if the retries fail."
-            )
-            self.tool.stream_log(msg, level=LogLevel.ERROR)
+            logger.exception("Connection error to platform service: %s", connect_err)
+            msg = (
+                "Unable to connect to platform service. Will retry with exponential backoff; "
+                "please contact admin if retries ultimately fail."
+            )
             raise ConnectionError(msg) from connect_err

If user-facing logs are required, emit them once after retries are exhausted (in the decorator).


148-159: Harden error extraction and reuse helper for consistency.
Guard JSON parsing and prefer AdapterUtils-style extraction to avoid JSONDecodeError and inconsistent messages.

@@
-            content_type = response.headers.get("Content-Type", "").lower()
-            if MimeType.JSON in content_type:
-                response_json = response.json()
-                if "error" in response_json:
-                    error_message = response_json["error"]
-            elif response.text:
-                error_message = response.text
+            content_type = response.headers.get("Content-Type", "").lower()
+            try:
+                if MimeType.JSON in content_type:
+                    response_json = response.json()
+                    error_message = response_json.get("error", error_message)
+                elif response.text:
+                    error_message = response.text
+            except ValueError:
+                if response.text:
+                    error_message = response.text

Optional: Factor this via a shared utility (like AdapterUtils.get_msg_from_request_exc) for parity with adapter.py.

src/unstract/sdk/adapter.py (2)

64-65: Use header constants and include request ID for traceability.
Align with platform.py and carry request correlation.

-        headers = {"Authorization": f"Bearer {self.bearer_token}"}
+        headers = {
+            "Authorization": f"Bearer {self.bearer_token}",
+            "X-Request-Id": self.request_id or "",
+        }

If available, prefer RequestHeader.AUTHORIZATION and RequestHeader.REQUEST_ID constants for consistency.


50-55: Docstring: note retry configurability.
Make the env-based controls discoverable here too.

@@
-        retries on connection errors with exponential backoff.
+        retries on connection errors with exponential backoff.
+        Controlled via:
+          PLATFORM_SERVICE_MAX_RETRIES, PLATFORM_SERVICE_INITIAL_DELAY,
+          PLATFORM_SERVICE_MAX_DELAY, PLATFORM_SERVICE_BACKOFF_FACTOR,
+          PLATFORM_SERVICE_RETRY_JITTER.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4afc1f2 and ba4ecf4.

📒 Files selected for processing (5)
  • mypy-errors.txt (1 hunks)
  • src/unstract/sdk/__init__.py (1 hunks)
  • src/unstract/sdk/adapter.py (4 hunks)
  • src/unstract/sdk/platform.py (4 hunks)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
src/unstract/sdk/utils/retry_utils.py

115-115: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


159-159: Consider moving this statement to an else block

(TRY300)


174-177: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


176-176: Use explicit conversion flag

Replace with conversion flag

(RUF010)


188-188: Use explicit conversion flag

Replace with conversion flag

(RUF010)

src/unstract/sdk/adapter.py

82-84: Avoid specifying long messages outside the exception class

(TRY003)


92-92: Avoid specifying long messages outside the exception class

(TRY003)

src/unstract/sdk/platform.py

140-140: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (3)
src/unstract/sdk/__init__.py (1)

1-1: Version bump looks good.

No code-path changes; only the version constant was updated to v0.78.0.

src/unstract/sdk/platform.py (1)

1-1: LGTM: module-level logger setup is appropriate.

Also applies to: 17-17

src/unstract/sdk/adapter.py (1)

2-2: LGTM: logger wiring mirrors platform module.

Also applies to: 15-15

Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/unstract/sdk/utils/retry_utils.py (1)

38-46: Also validate max_delay >= initial_delay.

Prevents confusing configs where the first backoff is immediately capped.

         if backoff_factor <= 0:
             raise ValueError("backoff_factor must be > 0")
+        if max_delay < initial_delay:
+            raise ValueError("max_delay must be >= initial_delay")
🧹 Nitpick comments (5)
src/unstract/sdk/utils/retry_utils.py (5)

61-67: Parse RETRY_JITTER more broadly; friendlier env parsing.

Make boolean parsing tolerant to common truthy values; optional: wrap numeric parsing to raise clear messages.

         return cls(
             max_retries=int(os.getenv(f"{prefix}_MAX_RETRIES", "3")),
             initial_delay=float(os.getenv(f"{prefix}_INITIAL_DELAY", "1.0")),
             max_delay=float(os.getenv(f"{prefix}_MAX_DELAY", "60.0")),
             backoff_factor=float(os.getenv(f"{prefix}_BACKOFF_FACTOR", "2.0")),
-            jitter=os.getenv(f"{prefix}_RETRY_JITTER", "true").lower() == "true",
+            jitter=os.getenv(f"{prefix}_RETRY_JITTER", "true").strip().lower()
+            in {"1", "true", "t", "yes", "y", "on"},
         )

If you want clearer errors for bad numeric envs, I can add small helpers to validate and include the offending var/value in the exception.


180-185: Use logging.exception and lazy formatting; silences TRY400/RUF010 and preserves traceback.

Switch to logger formatting placeholders and include traceback on final failure.

-                            logger_instance.error(
-                                f"Failed '{func.__name__}' after {attempt + 1} attempt(s). "
-                                f"Error: {str(e)}"
-                            )
+                            logger_instance.exception(
+                                "Failed '%s' after %d attempt(s).",
+                                func.__name__, attempt + 1,
+                            )
@@
-                    logger_instance.warning(
-                        f"Retryable error in '{func.__name__}' "
-                        f"(attempt {attempt + 1}/{max_attempts}). "
-                        f"Error: {str(e)}. Retrying in {delay:.2f} seconds..."
-                    )
+                    logger_instance.warning(
+                        "Retryable error in '%s' (attempt %d/%d). Error: %s. Retrying in %.2f seconds...",
+                        func.__name__, attempt + 1, max_attempts, e, delay,
+                    )

Also applies to: 191-195


119-124: Nit: replace en dash and silence non-crypto RNG warning (S311).

Use ASCII hyphen in the comment and mark RNG as non-crypto.

-    # Add jitter if enabled (0–25% positive jitter)
+    # Add jitter if enabled (0-25% positive jitter)
@@
-        delay = base + (base * random.uniform(0.0, 0.25))
+        delay = base + (base * random.uniform(0.0, 0.25))  # noqa: S311 non-crypto jitter

Also applies to: 121-121


127-132: Optional: injectable sleep for testability and faster unit tests.

Allow passing a sleep function (default time.sleep) so tests can stub it out.

 def retry_with_exponential_backoff(
     config: RetryConfig | None = None,
     retryable_exceptions: tuple[type[Exception], ...] | None = None,
     logger_instance: logging.Logger | None = None,
+    sleep_func: Callable[[float], None] | None = None,
 ) -> Callable:
@@
     if logger_instance is None:
         logger_instance = logger
+    if sleep_func is None:
+        sleep_func = time.sleep
@@
-                    time.sleep(delay)
+                    sleep_func(delay)

Also applies to: 145-147, 198-198


85-92: Consider optionally handling HTTP 429 with Retry-After (if applicable to the platform API).

If your platform returns 429 for throttling, optionally retry using the Retry-After header (seconds or HTTP-date), with a cap to max_delay.

Would you like me to confirm the platform’s throttling semantics before adding this? If yes, I can search the service docs or scan the repo for 429 handling.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ba4ecf4 and 8e305d7.

📒 Files selected for processing (1)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
src/unstract/sdk/utils/retry_utils.py

39-39: Avoid specifying long messages outside the exception class

(TRY003)


41-41: Avoid specifying long messages outside the exception class

(TRY003)


43-43: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Avoid specifying long messages outside the exception class

(TRY003)


70-70: Redefinition of unused ConnectionError from line 13

Remove definition: ConnectionError

(F811)


70-70: Redefinition of unused HTTPError from line 13

Remove definition: HTTPError

(F811)


119-119: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)


121-121: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


165-165: Consider moving this statement to an else block

(TRY300)


180-183: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


182-182: Use explicit conversion flag

Replace with conversion flag

(RUF010)


194-194: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (1)
src/unstract/sdk/utils/retry_utils.py (1)

127-206: No async functions decorated with retry_with_exponential_backoff detected
Ran a repository-wide regex search for @retry_with_exponential_backoff on async def and found no matches—no blocking issues present.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/unstract/sdk/adapter.py (1)

66-67: Add client-side timeouts to external call (retries can’t help a hang).

requests.get has no timeout; a stalled socket blocks indefinitely and bypasses the retry logic. Pass a sane connect/read timeout.

Apply:

-            response = requests.get(url, headers=headers, params=query_params)
+            response = requests.get(
+                url, headers=headers, params=query_params, timeout=(5, 30)
+            )
♻️ Duplicate comments (2)
src/unstract/sdk/adapter.py (1)

128-133: Good: ConnectionError now bubbles to the decorator; mapping to SdkError happens at the API boundary.

This resolves the earlier concern about neutralizing the retry decorator. Looks correct.

src/unstract/sdk/utils/retry_utils.py (1)

234-237: Bug: all OSErrors are retried due to the OR logic.

is_retryable = is_retryable or isinstance(e, retryable_exceptions) makes every OSError retryable (including PermissionError), bypassing your errno filter. Remove OSError from retryable_exceptions and rely on is_retryable_error for errno-based checks.

Apply:

     decorator = retry_with_exponential_backoff(
         config=retry_config,
-        retryable_exceptions=(ConnectionError, OSError),
+        retryable_exceptions=(ConnectionError,),
     )
🧹 Nitpick comments (6)
src/unstract/sdk/adapter.py (2)

2-2: Remove unused logger/import.

logging and logger = logging.getLogger(__name__) are not used in this file.

Apply:

-import logging
@@
-logger = logging.getLogger(__name__)

Also applies to: 15-15


91-94: Tighten return type hint.

get_adapter_config always returns a dict or raises; None isn’t returned.

Apply:

-    def get_adapter_config(
-        tool: BaseTool, adapter_instance_id: str
-    ) -> dict[str, Any] | None:
+    def get_adapter_config(
+        tool: BaseTool, adapter_instance_id: str
+    ) -> dict[str, Any]:
src/unstract/sdk/utils/retry_utils.py (4)

180-183: Log final failure with traceback.

Use logging.exception when re-raising to capture stack traces.

Apply:

-                            logger_instance.error(
-                                f"Failed '{func.__name__}' after {attempt + 1} attempt(s). "
-                                f"Error: {str(e)}"
-                            )
+                            logger_instance.exception(
+                                "Failed '%s' after %d attempt(s). Error: %s",
+                                func.__name__, attempt + 1, e,
+                            )

191-195: Prefer logging format params over f-strings (and explicit conversion).

Improves performance and avoids string building when the log level is disabled.

Apply:

-                    logger_instance.warning(
-                        f"Retryable error in '{func.__name__}' "
-                        f"(attempt {attempt + 1}/{max_attempts}). "
-                        f"Error: {str(e)}. Retrying in {delay:.2f} seconds..."
-                    )
+                    logger_instance.warning(
+                        "Retryable error in '%s' (attempt %d/%d). Error: %s. Retrying in %.2f seconds...",
+                        func.__name__, attempt + 1, max_attempts, e, delay,
+                    )

38-45: Validate that max_delay >= initial_delay.

Prevents a permanently clamped delay when misconfigured.

Apply:

         if max_delay <= 0:
             raise ValueError("max_delay must be > 0")
+        if max_delay < initial_delay:
+            raise ValueError("max_delay must be >= initial_delay")

119-124: Minor: fix en-dash and silence non-crypto RNG warning.

Use a hyphen and add a per-line ignore for S311 since jitter isn’t cryptographic.

Apply:

-    # Add jitter if enabled (0–25% positive jitter)
+    # Add jitter if enabled (0-25% positive jitter)
     if config.jitter:
-        delay = base + (base * random.uniform(0.0, 0.25))
+        delay = base + (base * random.uniform(0.0, 0.25))  # noqa: S311
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ba4ecf4 and 565078a.

📒 Files selected for processing (2)
  • src/unstract/sdk/adapter.py (5 hunks)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
src/unstract/sdk/adapter.py

87-87: Avoid specifying long messages outside the exception class

(TRY003)


131-133: Avoid specifying long messages outside the exception class

(TRY003)

src/unstract/sdk/utils/retry_utils.py

39-39: Avoid specifying long messages outside the exception class

(TRY003)


41-41: Avoid specifying long messages outside the exception class

(TRY003)


43-43: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Avoid specifying long messages outside the exception class

(TRY003)


70-70: Redefinition of unused ConnectionError from line 13

Remove definition: ConnectionError

(F811)


70-70: Redefinition of unused HTTPError from line 13

Remove definition: HTTPError

(F811)


119-119: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)


121-121: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


165-165: Consider moving this statement to an else block

(TRY300)


180-183: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


182-182: Use explicit conversion flag

Replace with conversion flag

(RUF010)


194-194: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (1)
src/unstract/sdk/adapter.py (1)

80-87: Confirm intent: catching HTTPError here disables 5xx retries.

Because this block converts HTTPError to SdkError inside the decorated function, the decorator won’t see HTTP 502/503/504 and won’t retry, even though the retry utility considers them retryable. If you intended to retry 5xx, re-raise HTTPError here and translate at the caller after retries.

Option if you want 5xx retries:

-        except HTTPError as e:
-            default_err = (
-                "Error while calling the platform service, please contact the admin."
-            )
-            msg = AdapterUtils.get_msg_from_request_exc(
-                err=e, message_key="error", default_err=default_err
-            )
-            raise SdkError(f"Error retrieving adapter. {msg}") from e
+        except HTTPError:
+            # Let the retry decorator handle transient 5xx; convert after retries at the caller.
+            raise

…onfigurable exceptions

      - Replace custom retry implementation with battle-tested backoff library
      - Add configurable exception types while preserving existing OSError errno logic
      - Implement generic retry decorator factory supporting multiple service prefixes
      - Maintain backward compatibility with existing platform service retry behavior
      - Add comprehensive environment variable configuration for retry parameters
      - Improve logging with structured backoff details and exception context
      - Reduce codebase complexity from 236 lines to ~108 lines (55% reduction)
      - Support custom retry configurations per service via prefix-based env vars
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (7)
pyproject.toml (3)

28-29: Constrain backoff to a safe major range.

Avoid unexpected breaking changes by pinning an upper bound.

-    "backoff>=2.2.1",
+    "backoff>=2.2.1,<3",

117-120: Align linter target with supported Python version.

Package supports >=3.9, but Ruff targets py312. This can mask 3.9-incompatible syntax (e.g., type unions). Recommend aligning.

-[tool.ruff]
-line-length = 90
-target-version = "py312"
+[tool.ruff]
+line-length = 90
+target-version = "py39"

205-205: Mypy config targets 3.12 while package supports 3.9.

To avoid type hints/syntax creeping beyond 3.9 features, consider matching runtime.

-[tool.mypy]
-python_version = "3.12"
+[tool.mypy]
+python_version = "3.9"
src/unstract/sdk/platform.py (2)

133-137: Add request timeouts to avoid indefinite hangs.

Without a timeout, retries won’t trigger on stalled connections.

-                response = requests.post(
-                    url=url, json=payload, params=params, headers=req_headers
-                )
+                response = requests.post(
+                    url=url, json=payload, params=params, headers=req_headers, timeout=30
+                )
@@
-                response = requests.get(url=url, params=params, headers=req_headers)
+                response = requests.get(url=url, params=params, headers=req_headers, timeout=30)

151-163: Prefer e.response over a locally constructed Response() when extracting error details.

Avoids empty headers/body when the exception contains the actual response.

-        except RequestException as e:
-            # Extract error information from the response if available
-            error_message = str(e)
-            content_type = response.headers.get("Content-Type", "").lower()
-            if MimeType.JSON in content_type:
-                response_json = response.json()
+        except RequestException as e:
+            # Extract error information from the response if available
+            error_message = str(e)
+            resp = getattr(e, "response", None) or response
+            content_type = resp.headers.get("Content-Type", "").lower() if resp is not None else ""
+            if resp is not None and MimeType.JSON in content_type:
+                response_json = resp.json()
                 if "error" in response_json:
                     error_message = response_json["error"]
-            elif response.text:
-                error_message = response.text
+            elif resp is not None and resp.text:
+                error_message = resp.text
             self.tool.stream_error_and_exit(
                 f"Error from platform service. {error_message}"
             )
src/unstract/sdk/adapter.py (1)

70-73: Add request timeout for robustness.

Prevents indefinite waits and enables Timeout-based retries when server stalls.

-            response = requests.get(url, headers=headers, params=query_params)
+            response = requests.get(
+                url, headers=headers, params=query_params, timeout=30
+            )
src/unstract/sdk/utils/retry_utils.py (1)

111-118: Prefer logger.error with exc_info over logger.exception with explicit exception text.

Keeps logs clean and avoids redundant exception string formatting.

-        logger_instance.exception(
-            "Giving up after %d retries for %s: %s", tries, prefix, exception
-        )
+        logger_instance.error(
+            "Giving up after %d retries for %s: %s", tries, prefix, exception, exc_info=True
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 565078a and fd4240f.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • pyproject.toml (1 hunks)
  • src/unstract/sdk/adapter.py (6 hunks)
  • src/unstract/sdk/platform.py (4 hunks)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.13.1)
src/unstract/sdk/adapter.py

91-91: Avoid specifying long messages outside the exception class

(TRY003)


135-137: Avoid specifying long messages outside the exception class

(TRY003)

src/unstract/sdk/platform.py

143-143: Redundant exception object included in logging.exception call

(TRY401)

🔇 Additional comments (3)
src/unstract/sdk/platform.py (1)

94-103: Decorator usage looks good.

Applying the retry decorator at the method boundary is the right place.

src/unstract/sdk/adapter.py (2)

47-52: Good placement of retry decorator and tightened error mapping.

Letting ConnectionError propagate to the decorator and mapping to SdkError at the call site is correct.


132-137: Caller-side conversion to SdkError looks good.

This preserves retry behavior and surfaces a domain error after exhausting retries.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Copy link
Contributor

@ritwik-g ritwik-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chandrasekharan-zipstack would be great if we can also cover prompt service retry with this.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/unstract/sdk/utils/retry_utils.py (4)

29-51: Include socket/built-in timeouts to cover non-requests flows.

socket.timeout (subclass of TimeoutError) often lacks errno and won’t match your OSError errno set. Add them explicitly to avoid missing retries when not using requests.

@@
-    # Requests connection and timeout errors
-    if isinstance(error, (ConnectionError, Timeout)):
+    # Connection and timeout errors
+    # - requests: ConnectionError, Timeout
+    # - stdlib: TimeoutError, socket.timeout
+    if isinstance(error, (ConnectionError, Timeout, TimeoutError)):
         return True
+
+    # socket.timeout may not carry errno reliably
+    try:
+        import socket  # local import to avoid unconditional dependency
+        if isinstance(error, socket.timeout):
+            return True
+    except Exception:
+        pass

81-92: Harden env parsing; fall back to sane defaults instead of raising.

If envs are malformed (e.g., “abc”), current casts raise at import-time (due to the global retry_platform_service_call). Prefer tolerant parsing with warnings.

-    max_tries = int(os.getenv(f"{prefix}_MAX_RETRIES", "3")) + 1  # +1 for initial attempt
-    max_time = float(os.getenv(f"{prefix}_MAX_TIME", "60"))
-    base = float(os.getenv(f"{prefix}_BASE_DELAY", "1.0"))
-    factor = float(os.getenv(f"{prefix}_MULTIPLIER", "2.0"))
-    use_jitter = os.getenv(f"{prefix}_JITTER", "true").strip().lower() in {
+    def _int_env(key: str, default: int) -> int:
+        raw = os.getenv(key)
+        if raw is None:
+            return default
+        try:
+            return int(raw)
+        except ValueError:
+            logger.warning("Invalid %s=%r; falling back to %d", key, raw, default)
+            return default
+
+    def _float_env(key: str, default: float) -> float:
+        raw = os.getenv(key)
+        if raw is None:
+            return default
+        try:
+            return float(raw)
+        except ValueError:
+            logger.warning("Invalid %s=%r; falling back to %.2f", key, raw, default)
+            return default
+
+    def _bool_env(key: str, default: bool) -> bool:
+        raw = os.getenv(key)
+        if raw is None:
+            return default
+        return raw.strip().lower() in {"true", "1", "yes", "on"}
+
+    max_tries = _int_env(f"{prefix}_MAX_RETRIES", 3) + 1  # +1 for initial attempt
+    max_time = _float_env(f"{prefix}_MAX_TIME", 60.0)
+    base = _float_env(f"{prefix}_BASE_DELAY", 1.0)
+    factor = _float_env(f"{prefix}_MULTIPLIER", 2.0)
+    use_jitter = _bool_env(f"{prefix}_JITTER", True)

81-92: Optional: add per-attempt cap via MAX_DELAY and wire through to expo.

This complements MAX_TIME and prevents single sleeps from growing too large under long retry windows.

@@
-    use_jitter = os.getenv(f"{prefix}_JITTER", "true").strip().lower() in {
+    use_jitter = os.getenv(f"{prefix}_JITTER", "true").strip().lower() in {
         "true",
         "1",
         "yes",
         "on",
     }
+    max_delay_env = os.getenv(f"{prefix}_MAX_DELAY")
+    max_delay = float(max_delay_env) if max_delay_env else None
@@
-    return backoff.on_exception(
-        backoff.expo,
+    return backoff.on_exception(
+        backoff.expo,
         exceptions,  # Use the configurable exceptions
         max_tries=max_tries,
         max_time=max_time,
         base=base,
-        factor=factor,
+        factor=factor,
+        max_value=max_delay,
         jitter=backoff.full_jitter if use_jitter else None,
         giveup=lambda e: not is_retryable_error(e),
         on_backoff=on_backoff_handler,
         on_giveup=on_giveup_handler,
     )

Also applies to: 120-129


102-109: Tweak logging: avoid leaking secrets and fix retry count wording.

Exception strings can include URLs/tokens; also the “retries” count in give-up should be tries - 1.

-        logger_instance.warning(
-            "Retry %d/%d for %s: %s (waiting %.1fs)",
-            tries,
-            max_tries - 1,
-            prefix,
-            exception,
-            wait,
-        )
+        logger_instance.warning(
+            "Retry %d/%d for %s (waiting %.1fs) | exc=%s",
+            tries,
+            max_tries - 1,
+            prefix,
+            wait,
+            type(exception).__name__,
+        )
@@
-        logger_instance.exception(
-            "Giving up after %d retries for %s: %s", tries, prefix, exception
-        )
+        # tries includes the initial attempt; retries = tries - 1
+        logger_instance.error(
+            "Giving up after %d retries for %s | exc=%s",
+            max(0, tries - 1),
+            prefix,
+            type(exception).__name__,
+        )

Also applies to: 116-118

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between fd4240f and 4452b09.

📒 Files selected for processing (2)
  • src/unstract/sdk/platform.py (4 hunks)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/unstract/sdk/platform.py
🔇 Additional comments (3)
src/unstract/sdk/utils/retry_utils.py (3)

1-12: Solid migration to backoff; clear public API.

Factory-based decorator with env-driven config and preserved errno logic looks good.


135-141: Docs/variables consistent with PR summary—nice.

Env names align with the PR description and defaults. Good clarity for operators.


121-132: Verify backoff kwargs compatibility with pinned version Manually confirm that your pinned backoff version (>=2.2.1) supports passing base, factor and max_value to backoff.on_exception, and ensure tests cover both retry code paths.

Copy link
Contributor

@gaya3-zipstack gaya3-zipstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/unstract/sdk/prompt.py (2)

189-197: Add a client-side timeout to make retries effective and prevent indefinite hangs.

Without a requests timeout, a call can block forever and never trigger retries. Introduce a timeout parameter (default 30s) and pass it to requests.

Apply these diffs:

@@
-    def _call_service(
+    def _call_service(
         self,
         url_path: str,
         payload: dict[str, Any] | None = None,
         params: dict[str, str] | None = None,
         headers: dict[str, str] | None = None,
         method: str = "POST",
+        timeout: float = 30.0,
     ) -> dict[str, Any]:

And:

@@
-            response = requests.post(
-                url=url, json=payload, params=params, headers=req_headers
-            )
+            response = requests.post(
+                url=url, json=payload, params=params, headers=req_headers, timeout=timeout
+            )
@@
-            response = requests.get(url=url, params=params, headers=req_headers)
+            response = requests.get(url=url, params=params, headers=req_headers, timeout=timeout)

223-229: Pass the timeout to requests to avoid indefinite blocking.

Follow-up to the timeout addition above; ensure both POST and GET pass it.

Apply:

-            response = requests.post(
-                url=url, json=payload, params=params, headers=req_headers
-            )
+            response = requests.post(
+                url=url, json=payload, params=params, headers=req_headers, timeout=timeout
+            )
@@
-            response = requests.get(url=url, params=params, headers=req_headers)
+            response = requests.get(url=url, params=params, headers=req_headers, timeout=timeout)
🧹 Nitpick comments (5)
src/unstract/sdk/prompt.py (1)

189-189: Guard against duplicate effects when retrying POSTs; ensure idempotency via request_id.

Since retries apply to POST endpoints, confirm the server deduplicates via the Request-ID header. If not guaranteed, generate a UUID when request_id is None to make retries safe.

Example (outside this hunk):

# At top of file
import uuid

# In __init__
self.request_id = request_id or str(uuid.uuid4())

Can you confirm the prompt service treats Request-ID as an idempotency key?

src/unstract/sdk/utils/retry_utils.py (4)

29-51: Optionally include HTTP 429 (rate limiting) as retryable.

Many services return 429 for throttling; retrying with backoff is appropriate.

Apply:

-            # Retry on server errors and bad gateway
-            if status_code in [502, 503, 504]:
+            # Retry on throttling and transient server errors
+            if status_code in [429, 502, 503, 504]:
                 return True

Optionally honor Retry-After in a future enhancement.


81-92: Harden env parsing with basic validation/clamping to sane defaults.

Protect against misconfig (negative/zero values) that would create invalid backoff schedules.

Apply:

     max_tries = int(os.getenv(f"{prefix}_MAX_RETRIES", "3")) + 1  # +1 for initial attempt
     max_time = float(os.getenv(f"{prefix}_MAX_TIME", "60"))
     base = float(os.getenv(f"{prefix}_BASE_DELAY", "1.0"))
     factor = float(os.getenv(f"{prefix}_MULTIPLIER", "2.0"))
     use_jitter = os.getenv(f"{prefix}_JITTER", "true").strip().lower() in {
         "true",
         "1",
         "yes",
         "on",
     }
+
+    # Clamp/sanitize
+    if max_tries < 1:
+        max_tries = 1
+    if max_time <= 0:
+        max_time = 60.0
+    if base <= 0:
+        base = 1.0
+    if factor <= 0:
+        factor = 2.0

81-92: Expose optional per-attempt max delay (PREFIX_MAX_DELAY) and wire it to expo.

Prevents unbounded waits while still respecting MAX_TIME.

Apply:

@@
-    use_jitter = os.getenv(f"{prefix}_JITTER", "true").strip().lower() in {
+    use_jitter = os.getenv(f"{prefix}_JITTER", "true").strip().lower() in {
         "true",
         "1",
         "yes",
         "on",
     }
+    max_value_env = os.getenv(f"{prefix}_MAX_DELAY")
+    max_value = float(max_value_env) if max_value_env else None

120-132: Cap exponential backoff using max_value, if provided.

Wires the new PREFIX_MAX_DELAY into backoff.expo.

Apply:

-    return backoff.on_exception(
-        backoff.expo,
+    return backoff.on_exception(
+        backoff.expo,
         exceptions,  # Use the configurable exceptions
         max_tries=max_tries,
         max_time=max_time,
         base=base,
-        factor=factor,
+        factor=factor,
+        max_value=max_value,
         jitter=backoff.full_jitter if use_jitter else None,
         giveup=lambda e: not is_retryable_error(e),
         on_backoff=on_backoff_handler,
         on_giveup=on_giveup_handler,
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4452b09 and 3649d4a.

📒 Files selected for processing (2)
  • src/unstract/sdk/prompt.py (3 hunks)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)
🔇 Additional comments (4)
src/unstract/sdk/prompt.py (2)

18-18: LGTM: Using the preconfigured PROMPT_SERVICE retry decorator is appropriate.


201-209: Docstring update aligns with the new env-driven retry behavior.

src/unstract/sdk/utils/retry_utils.py (2)

141-149: LGTM: Pre-instantiated decorators with clear prefixes make adoption easy.


9-11: backoff dependency is pinned to >=2.2.1 in pyproject.toml

@chandrasekharan-zipstack
Copy link
Contributor Author

@ritwik-g I've addressed your comments and added support for prompt service as well

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3649d4a and 5fbc56b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.13.1)
src/unstract/sdk/utils/retry_utils.py

83-83: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


136-136: Consider moving this statement to an else block

(TRY300)


151-156: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


167-167: Redundant exception object included in logging.exception call

(TRY401)


179-184: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (1)
src/unstract/sdk/utils/retry_utils.py (1)

75-86: Ensure jitter stays within the configured cap.

Right now we clamp the exponential delay before adding jitter, so the final sleep interval can overshoot the configured cap. For example, with MAX_TIME=60, once the exponential term hits 60 the jitter pushes the delay up to 75 s, which immediately trips the delay >= remaining_time guard and aborts retries even though we still had time budget left. Capping after jitter keeps the behaviour aligned with the config.

-    delay = base_delay * (multiplier**attempt)
-
-    # Cap at max delay
-    delay = min(delay, max_delay)
-
-    # Add jitter if enabled (0-25% of delay)
-    if jitter:
-        jitter_amount = delay * random.uniform(0, 0.25)
-        delay += jitter_amount
-
-    return delay
+    delay = base_delay * (multiplier**attempt)
+
+    if jitter:
+        jitter_amount = delay * random.uniform(0, 0.25)
+        delay += jitter_amount
+
+    return min(delay, max_delay)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 31f7497 and 2039147.

📒 Files selected for processing (1)
  • src/unstract/sdk/utils/retry_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.13.1)
src/unstract/sdk/utils/retry_utils.py

83-83: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


136-136: Consider moving this statement to an else block

(TRY300)


151-156: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


167-167: Redundant exception object included in logging.exception call

(TRY401)


179-184: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


268-268: Avoid specifying long messages outside the exception class

(TRY003)


270-270: Avoid specifying long messages outside the exception class

(TRY003)


272-272: Avoid specifying long messages outside the exception class

(TRY003)


274-274: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
src/unstract/sdk/utils/retry_utils.py (5)

1-14: LGTM!

Module setup and imports are appropriate. The random module is correctly used for backoff jitter (S311 warning is a false positive in this context).


17-53: LGTM!

The retryable error detection is comprehensive and correctly handles connection errors, timeouts, specific HTTP status codes, and OS-level connection failures including errno 111 (ECONNREFUSED).


89-211: Core retry logic is sound.

The exponential backoff implementation correctly handles max retries, max time, retry predicates, and exception filtering. The defensive code at lines 206-207 is good practice even if logically unreachable.


214-289: LGTM! Configuration validation is appropriate.

The environment-driven configuration with validation (lines 267-274) correctly fails fast on invalid values. The TRY003 static analysis hints about exception messages are acceptable in this context where inline validation messages provide clear feedback.


292-306: LGTM! Module-level decorators are properly configured.

The pre-instantiated decorators with clear documentation of environment variables make the API easy to consume. Note the comment from hari-kuriakose about adding unit tests for this library.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
@chandrasekharan-zipstack chandrasekharan-zipstack merged commit 0681f03 into main Oct 7, 2025
2 checks passed
@chandrasekharan-zipstack chandrasekharan-zipstack deleted the feat/added-retries-for-platform-service-calls branch October 7, 2025 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants