Skip to content

Webhook: Add Fingerprinting to Webhook Payloads#765

Merged
Prajna1999 merged 12 commits intomainfrom
feat/webhook-fingerprinting
Apr 24, 2026
Merged

Webhook: Add Fingerprinting to Webhook Payloads#765
Prajna1999 merged 12 commits intomainfrom
feat/webhook-fingerprinting

Conversation

@Prajna1999
Copy link
Copy Markdown
Collaborator

@Prajna1999 Prajna1999 commented Apr 16, 2026

Summary

Target issue is #766
Explain the motivation for making this change. What existing problem does the pull request solve?

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features
    • Added a webhook-secret credential provider (project/org-scoped, sensitive/masked). Callback flows now include webhook secrets when available, send compact JSON bytes, and attach signature/timestamp headers if a secret is present.
  • Tests
    • Expanded tests for callback signing, deterministic signatures, headers, and end-to-end job callback behavior (success, failure, signed URL handling, cleanup).
  • Documentation
    • Updated credential docs and examples to include the webhook-secret provider.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a WEBHOOK_SECRET provider and threads project/organization-scoped webhook secrets into callback dispatch. Callbacks are optionally HMAC-SHA256-signed (signature + timestamp headers) and sent as pre-serialized compact JSON bytes to ensure the signed body matches the transmitted payload. (≤50 words)

Changes

Cohort / File(s) Summary
Provider Configuration
backend/app/core/providers.py
Added Provider.WEBHOOK_SECRET and PROVIDER_CONFIGS[Provider.WEBHOOK_SECRET] requiring and marking webhook_secret as sensitive.
Collection Services
backend/app/services/collections/create_collection.py, backend/app/services/collections/delete_collection.py
Resolve webhook_secret via get_webhook_secret(...) and pass it to send_callback(...); normalize callback_url and collection_id handling; failure handler extended to accept organization_id.
Doctransform & LLM Services
backend/app/services/doctransform/job.py, backend/app/services/llm/chain/executor.py, backend/app/services/llm/jobs.py
Fetch webhook secret during job/chain setup and thread it to all send_callback(...) calls; update error handlers and function signatures to accept org/project context where required.
Callback Infrastructure
backend/app/utils.py, backend/app/services/response/callbacks.py, backend/app/services/response/jobs.py
Added sign_webhook_payload and get_webhook_secret; send_callback(...) accepts optional webhook_secret, deterministically serializes compact JSON bytes, computes HMAC-SHA256 over "<timestamp_ms>.<raw_body>", attaches X-Webhook-Signature and X-Webhook-Timestamp, and sends data=raw_body. send_response_callback now requires organization_id and project_id.
Docs
backend/app/api/docs/credentials/create.md, .../update.md, .../update_by_org_project.md
Documented new webhook_secret provider and added examples including a webhook-secret-only credential payload.
Tests — core webhook behavior
backend/app/tests/core/test_callback_ssrf.py
Updated assertions to expect request body sent via data as pre-serialized compact JSON bytes, Content-Type: application/json, and absence of signature headers when no webhook_secret is supplied; added tests for sign_webhook_payload.
Tests — doctransform job behavior
backend/app/tests/services/doctransformer/test_job/test_callbacks.py
New end-to-end unit tests for doctransform job callbacks: success/failure payloads, webhook-secret propagation to send_callback, signed-URL handling, tmpdir cleanup, and failure state persistence.

Sequence Diagram(s)

sequenceDiagram
    participant Service as Service Layer
    participant DB as Provider DB
    participant Signer as HMAC Signer
    participant Webhook as External Webhook

    Service->>DB: get_webhook_secret(project_id, organization_id)
    DB-->>Service: secret or None
    Service->>Service: build callback payload (dict)
    Service->>Signer: compact-serialize(payload) -> raw_body bytes
    alt secret present
        Service->>Signer: sign_webhook_payload(secret, raw_body)
        Signer-->>Service: (signature, timestamp)
        Service->>Webhook: POST callback_url with data=raw_body and headers: Content-Type, X-Webhook-Signature, X-Webhook-Timestamp
    else no secret
        Service->>Webhook: POST callback_url with data=raw_body and header: Content-Type
    end
    Webhook-->>Service: HTTP response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • vprashrex
  • Ayush8923

Poem

🐇 I nibble keys and sign the night,
I stamp each payload snug and tight,
HMAC hops, a timestamp chime,
Webhooks leap in perfect time,
A rabbit's seal — secure delight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Webhook: Add Fingerprinting to Webhook Payloads' accurately captures the primary change—implementing webhook request signing (fingerprinting via HMAC-SHA256 signatures and timestamps) across multiple service files and utilities.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/webhook-fingerprinting

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Prajna1999 Prajna1999 marked this pull request as draft April 16, 2026 10:28
@Prajna1999 Prajna1999 moved this to In Progress in Kaapi-dev Apr 16, 2026
@Prajna1999 Prajna1999 linked an issue Apr 17, 2026 that may be closed by this pull request
@Prajna1999 Prajna1999 removed this from Kaapi-dev Apr 17, 2026
@Prajna1999 Prajna1999 self-assigned this Apr 21, 2026
@Prajna1999 Prajna1999 added the enhancement New feature or request label Apr 21, 2026
@Prajna1999 Prajna1999 moved this to In Review in Kaapi-dev Apr 21, 2026
@Prajna1999 Prajna1999 marked this pull request as ready for review April 21, 2026 11:26
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
backend/app/services/collections/delete_collection.py (2)

241-247: ⚠️ Potential issue | 🟠 Major

Sign successful delete-collection callbacks too.

The success path still calls send_callback(...) without webhook_secret, so successful deletion callbacks remain unsigned while failure callbacks are signed.

🔐 Proposed fix
             if deletion_request.callback_url and collection_job:
                 success_payload = build_success_payload(
                     collection_job=collection_job,
                     collection_id=collection_id,
                 )
-                send_callback(deletion_request.callback_url, success_payload)
+                with Session(engine) as session:
+                    creds = get_provider_credential(
+                        session=session,
+                        org_id=organization_id,
+                        project_id=project_id,
+                        provider="webhook_secret",
+                    )
+                webhook_secret = (
+                    creds.get("webhook_secret") if isinstance(creds, dict) else None
+                )
+                send_callback(
+                    deletion_request.callback_url,
+                    success_payload,
+                    webhook_secret=webhook_secret,
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/collections/delete_collection.py` around lines 241 -
247, The success callback path currently calls
send_callback(deletion_request.callback_url, success_payload) without the
webhook secret, leaving successful delete-collection callbacks unsigned; update
the call to include the webhook secret used for signing (pass the same
webhook_secret used for failure callbacks, e.g. deletion_request.webhook_secret
or the variable used elsewhere) so change the send_callback invocation in the
block that builds success_payload (references: deletion_request,
build_success_payload, collection_job, collection_id, send_callback) to pass the
webhook_secret as the signing argument.

248-282: ⚠️ Potential issue | 🔴 Critical

Add organization_id parameter and fix orphaned exception block.

The code has a syntax error: the except at line 274 is not attached to any try statement. Additionally, the first exception handler (line 248–272) has two issues:

  1. Missing organization_id argument in the _mark_job_failed_and_callback() call (line 251)
  2. The code attempts to use success_payload at line 270, but it's only defined inside a conditional block at lines 242–245 and may not exist when an exception occurs

Remove the webhook callback code (lines 258–272) and the orphaned except block (lines 274–282), then add organization_id to the first _mark_job_failed_and_callback call and raise to propagate the exception.

Proposed fix
         except Exception as err:
             span.record_exception(err)
             span.set_status(trace.Status(trace.StatusCode.ERROR, str(err)))
             _mark_job_failed_and_callback(
+                organization_id=organization_id,
                 project_id=project_id,
                 collection_id=collection_id,
                 job_id=job_uuid,
                 err=err,
                 callback_url=deletion_request.callback_url,
             )
-            with Session(engine) as session:
-                creds = get_provider_credential(
-                    session=session,
-                    org_id=organization_id,
-                    project_id=project_id,
-                    provider="webhook_secret",
-                )
-            webhook_secret = (
-                creds.get("webhook_secret") if isinstance(creds, dict) else None
-            )
-            send_callback(
-                deletion_request.callback_url,
-                success_payload,
-                webhook_secret=webhook_secret,
-            )
-
-
-    except Exception as err:
-        _mark_job_failed_and_callback(
-            organization_id=organization_id,
-            project_id=project_id,
-            collection_id=collection_id,
-            job_id=job_uuid,
-            err=err,
-            callback_url=deletion_request.callback_url,
-        )
+            raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/collections/delete_collection.py` around lines 248 -
282, In the exception handler inside delete_collection.py update the first
_mark_job_failed_and_callback call to include organization_id, remove the
downstream webhook callback code that references send_callback and
success_payload (delete the block that fetches webhook_secret and calls
send_callback), delete the orphaned second except block, and re-raise the caught
exception after recording it (i.e., keep span.record_exception/ set_status, call
_mark_job_failed_and_callback with organization_id, then raise the exception) so
the error is propagated.
backend/app/services/llm/jobs.py (3)

184-208: ⚠️ Potential issue | 🟠 Major

Send the failure webhook only once, with the signature.

Line 197 sends the signed callback, then Lines 202-208 send the same failure callback again without webhook_secret. This can double-trigger client-side webhook handlers and make the second delivery fail signature validation.

Suggested fix
     if callback_url:
         webhook_secret = None
         if organization_id is not None and project_id is not None:
             with Session(engine) as session:
                 creds = get_provider_credential(
                     session=session,
                     org_id=organization_id,
                     project_id=project_id,
                     provider="webhook_secret",
                 )
             webhook_secret = (
                 creds.get("webhook_secret") if isinstance(creds, dict) else None
             )
-        send_callback(
-            callback_url=callback_url,
-            data=callback_response.model_dump(),
-            webhook_secret=webhook_secret,
-        )
         with tracer.start_as_current_span("llm.send_callback") as cb_span:
             cb_span.set_attribute("callback.url", callback_url)
             cb_span.set_attribute("callback.status", "failure")
             send_callback(
                 callback_url=callback_url,
                 data=callback_response.model_dump(),
+                webhook_secret=webhook_secret,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 184 - 208, The code is sending
the failure webhook twice—once with webhook_secret and again without—causing
duplicate deliveries and signature failures; modify the block around
send_callback, webhook_secret, get_provider_credential and the tracer span so
you only call send_callback once for the failure path and include the
webhook_secret (if available) in that single call, remove the redundant
send_callback invocation, and keep the tracer span (llm.send_callback) setting
"callback.url" and "callback.status" around that single send to preserve
tracing.

963-979: ⚠️ Potential issue | 🔴 Critical

Fix duplicate and improperly indented code in the exception handler.

The code at lines 963-978 contains duplicate callback_response and handle_job_error calls with inconsistent indentation. The first version (lines 963-973) is not indented to the except block, and the second (lines 974-978) is over-indented. Additionally, the second version omits the organization_id and project_id parameters, losing critical webhook context.

Keep only the properly indented version with all required parameters inside the except block:

Fix
-        callback_response = APIResponse.failure_response(
-            error="Unexpected error occurred",
-            metadata=request.request_metadata,
-        )
-        return handle_job_error(
-            job_uuid,
-            callback_url_str,
-            callback_response,
-            organization_id=organization_id,
-            project_id=project_id,
-        )
-            callback_response = APIResponse.failure_response(
-                error="Unexpected error occurred",
-                metadata=request.request_metadata,
-            )
-            return handle_job_error(job_uuid, callback_url_str, callback_response)
+            callback_response = APIResponse.failure_response(
+                error="Unexpected error occurred",
+                metadata=request.request_metadata,
+            )
+            return handle_job_error(
+                job_uuid,
+                callback_url_str,
+                callback_response,
+                organization_id=organization_id,
+                project_id=project_id,
+            )
         finally:
             # Ensure task spans are pushed promptly so Sentry dashboards update faster.
             flush_telemetry()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 963 - 979, Remove the
duplicated and mis-indented exception handling code: inside the except block
create a single callback_response using
APIResponse.failure_response(error="Unexpected error occurred",
metadata=request.request_metadata) and call handle_job_error(job_uuid,
callback_url_str, callback_response, organization_id=organization_id,
project_id=project_id); delete the stray duplicate
callback_response/handle_job_error calls outside or with wrong indentation so
only the properly indented version that includes organization_id and project_id
remains.

758-846: ⚠️ Potential issue | 🔴 Critical

Fix critical control flow errors before merging.

The code has three structural issues:

  1. Line 771: callback_response is referenced before assignment (called on callback_response.model_dump() before the if result.success: block creates it).
  2. Line 789: De-indentation creates a syntax error—the statement falls outside the try block with no matching except/finally, causing Python to raise expected 'except' or 'finally' block.
  3. Lines 817–825: Dead code—the success DB update and return statement are unreachable, placed after the except block's return.

These must be fixed by restructuring control flow: move webhook credential retrieval into the success branch (after callback_response is assigned), properly close the try/except/finally blocks, and relocate the DB update into the success branch before the return.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 758 - 846, The code currently
calls callback_response.model_dump() and sends a webhook before
callback_response is created, has mismatched try/except/finally indentation, and
places the JobCrud.update success path after an except making it unreachable;
fix by moving the webhook credential lookup (get_provider_credential) and any
send_callback calls into the success branch after callback_response is assigned
(i.e., inside the if result.success block), ensure the try block is properly
closed with its corresponding except and finally (flush_telemetry) so no
statements fall outside, and relocate the database success update
(JobCrud(session).update with JobUpdate(status=JobStatus.SUCCESS)) and the
success return (callback_response.model_dump()) into the success branch before
any returns so they execute when result.success is true; verify all
send_callback usage calls callback_response that has been created and that
exception handlers return via handle_job_error as intended.
🧹 Nitpick comments (1)
backend/app/tests/core/test_callback_ssrf.py (1)

330-338: Add positive coverage for signed callbacks.

This verifies the unsigned path, but the new feature also needs a test that passes webhook_secret and asserts X-Webhook-Signature / X-Webhook-Timestamp match the exact compact body bytes sent in data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/core/test_callback_ssrf.py` around lines 330 - 338, Add a
positive test for the signed callback path: call the same code that produces
call_kwargs but pass a webhook_secret, compute the compact body bytes as
json.dumps(test_data, separators=(",", ":")).encode(), generate the expected
timestamp and HMAC signature using the webhook_secret over the exact bytes
(including whatever signing format your code uses, e.g., sig =
hmac.new(webhook_secret.encode(), compact_bytes, sha256).hexdigest()), then
assert call_kwargs["data"] equals the compact_bytes and that
call_kwargs["headers"] contains "X-Webhook-Timestamp" equal to the timestamp you
generated and "X-Webhook-Signature" equal to the expected signature; use the
existing variables call_kwargs and test_data to locate where to add this check
(mirror the unsigned assertions but include webhook_secret and the
signature/timestamp assertions).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/core/providers.py`:
- Around line 46-48: The tests in test_credentials.py hardcode the expected
provider set and length (currently asserting {"openai","langfuse"} and len==2),
which breaks now that Provider.WEBHOOK_SECRET is added; update those assertions
to include "webhook_secret" (matching Provider.WEBHOOK_SECRET) or, better,
compute expected providers dynamically from the Provider/ProviderConfig registry
so the test no longer hardcodes the set/length (e.g., derive expectation from
the Provider enum or the providers mapping used in providers.py).

In `@backend/app/services/collections/create_collection.py`:
- Around line 317-332: The code is sending the same success callback twice;
remove the duplicate unsigned send by deleting the new send_callback block that
uses creation_request.callback_url, and instead move the provider credential
lookup (get_provider_credential) and webhook_secret extraction into the existing
success branch where success_payload is already sent (the earlier send_callback
call around success handling). Ensure the existing send_callback call uses the
retrieved webhook_secret when calling send_callback, referencing
creation_request.callback_url, get_provider_credential, webhook_secret, and
success_payload so only a single signed callback is sent.
- Around line 305-332: The except path in execute_job currently builds a success
payload using build_success_payload and logs/sends a "Collection created"
callback even on failure; update the except block (where collection may be
unbound) to either re-raise the exception to be handled by the outer failure
handler OR build and send an explicit failure payload (e.g.,
build_failure_payload or a dict containing collection_job.id, error details from
the caught exception, and status="failed") instead of using
build_success_payload; also replace the success log with an error/exception log
and ensure send_callback is invoked with the failure payload and webhook_secret
(using get_provider_credential) only when creation_request.callback_url is set.
- Around line 334-373: The second "except Exception as err:" is unmatched due to
misplaced code between the try and the handler; fix by consolidating exception
handling in execute_job: move the logging, provider cleanup
(provider.delete(result)), _mark_job_failed call, and callback logic
(build_failure_payload, get_provider_credential/Session, and send_callback) into
a single unified except block for the try that encloses the collection creation,
or alternatively remove the duplicate except and wrap the intervening lines in
the original try; ensure the unique symbols referenced—_mark_job_failed,
build_failure_payload, get_provider_credential, Session/engine, provider.delete,
and send_callback—are executed from that single exception handler so cleanup,
job failure marking, and callback notification always run on errors.

In `@backend/app/utils.py`:
- Around line 479-487: The code logs sensitive values (webhook_secret and
derived signature/timestamp) in the function that builds the webhook payload;
remove those direct logs and replace them with a non-sensitive, prefixed
message. Specifically, in the block where raw_body,
sign_webhook_payload(webhook_secret, raw_body), headers, and logger are used,
stop logging webhook_secret and the signature/timestamp; instead log a single
informational message prefixed with the function name in square brackets (e.g.,
"[function_name] webhook signing performed") or a boolean/DEBUG that signing was
attempted, and if you must show any secret-like value use
mask_string(sensitive_value) per guidelines. Ensure
headers["X-Webhook-Signature"] and headers["X-Webhook-Timestamp"] are still set
from sign_webhook_payload but do not emit their values to logs.

---

Outside diff comments:
In `@backend/app/services/collections/delete_collection.py`:
- Around line 241-247: The success callback path currently calls
send_callback(deletion_request.callback_url, success_payload) without the
webhook secret, leaving successful delete-collection callbacks unsigned; update
the call to include the webhook secret used for signing (pass the same
webhook_secret used for failure callbacks, e.g. deletion_request.webhook_secret
or the variable used elsewhere) so change the send_callback invocation in the
block that builds success_payload (references: deletion_request,
build_success_payload, collection_job, collection_id, send_callback) to pass the
webhook_secret as the signing argument.
- Around line 248-282: In the exception handler inside delete_collection.py
update the first _mark_job_failed_and_callback call to include organization_id,
remove the downstream webhook callback code that references send_callback and
success_payload (delete the block that fetches webhook_secret and calls
send_callback), delete the orphaned second except block, and re-raise the caught
exception after recording it (i.e., keep span.record_exception/ set_status, call
_mark_job_failed_and_callback with organization_id, then raise the exception) so
the error is propagated.

In `@backend/app/services/llm/jobs.py`:
- Around line 184-208: The code is sending the failure webhook twice—once with
webhook_secret and again without—causing duplicate deliveries and signature
failures; modify the block around send_callback, webhook_secret,
get_provider_credential and the tracer span so you only call send_callback once
for the failure path and include the webhook_secret (if available) in that
single call, remove the redundant send_callback invocation, and keep the tracer
span (llm.send_callback) setting "callback.url" and "callback.status" around
that single send to preserve tracing.
- Around line 963-979: Remove the duplicated and mis-indented exception handling
code: inside the except block create a single callback_response using
APIResponse.failure_response(error="Unexpected error occurred",
metadata=request.request_metadata) and call handle_job_error(job_uuid,
callback_url_str, callback_response, organization_id=organization_id,
project_id=project_id); delete the stray duplicate
callback_response/handle_job_error calls outside or with wrong indentation so
only the properly indented version that includes organization_id and project_id
remains.
- Around line 758-846: The code currently calls callback_response.model_dump()
and sends a webhook before callback_response is created, has mismatched
try/except/finally indentation, and places the JobCrud.update success path after
an except making it unreachable; fix by moving the webhook credential lookup
(get_provider_credential) and any send_callback calls into the success branch
after callback_response is assigned (i.e., inside the if result.success block),
ensure the try block is properly closed with its corresponding except and
finally (flush_telemetry) so no statements fall outside, and relocate the
database success update (JobCrud(session).update with
JobUpdate(status=JobStatus.SUCCESS)) and the success return
(callback_response.model_dump()) into the success branch before any returns so
they execute when result.success is true; verify all send_callback usage calls
callback_response that has been created and that exception handlers return via
handle_job_error as intended.

---

Nitpick comments:
In `@backend/app/tests/core/test_callback_ssrf.py`:
- Around line 330-338: Add a positive test for the signed callback path: call
the same code that produces call_kwargs but pass a webhook_secret, compute the
compact body bytes as json.dumps(test_data, separators=(",", ":")).encode(),
generate the expected timestamp and HMAC signature using the webhook_secret over
the exact bytes (including whatever signing format your code uses, e.g., sig =
hmac.new(webhook_secret.encode(), compact_bytes, sha256).hexdigest()), then
assert call_kwargs["data"] equals the compact_bytes and that
call_kwargs["headers"] contains "X-Webhook-Timestamp" equal to the timestamp you
generated and "X-Webhook-Signature" equal to the expected signature; use the
existing variables call_kwargs and test_data to locate where to add this check
(mirror the unsigned assertions but include webhook_secret and the
signature/timestamp assertions).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c07cc7d-204d-4b8f-9bd1-13b07f588937

📥 Commits

Reviewing files that changed from the base of the PR and between d7b3359 and bdd5f0b.

📒 Files selected for processing (10)
  • backend/app/core/providers.py
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/delete_collection.py
  • backend/app/services/doctransform/job.py
  • backend/app/services/llm/chain/executor.py
  • backend/app/services/llm/jobs.py
  • backend/app/services/response/callbacks.py
  • backend/app/services/response/jobs.py
  • backend/app/tests/core/test_callback_ssrf.py
  • backend/app/utils.py

Comment on lines +46 to +48
Provider.WEBHOOK_SECRET: ProviderConfig(
required_fields=["webhook_secret"], sensitive_fields=["webhook_secret"]
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update credential tests that hardcode the provider set.

Adding Provider.WEBHOOK_SECRET here expands the configured provider list; backend/app/tests/crud/test_credentials.py still asserts exactly {"openai", "langfuse"} and len(...) == 2, so those tests should be updated to include webhook_secret or derive expectations from the provider config.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/core/providers.py` around lines 46 - 48, The tests in
test_credentials.py hardcode the expected provider set and length (currently
asserting {"openai","langfuse"} and len==2), which breaks now that
Provider.WEBHOOK_SECRET is added; update those assertions to include
"webhook_secret" (matching Provider.WEBHOOK_SECRET) or, better, compute expected
providers dynamically from the Provider/ProviderConfig registry so the test no
longer hardcodes the set/length (e.g., derive expectation from the Provider enum
or the providers mapping used in providers.py).

Comment thread backend/app/services/collections/create_collection.py Outdated
Comment thread backend/app/services/collections/create_collection.py Outdated
Comment thread backend/app/services/collections/create_collection.py Outdated
Comment thread backend/app/utils.py Outdated
Provider.ELEVENLABS: ProviderConfig(
required_fields=["api_key"], sensitive_fields=["api_key"]
),
Provider.WEBHOOK_SECRET: ProviderConfig(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the credentials endpoint doc ..

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/collections/delete_collection.py`:
- Around line 106-115: The webhook secret lookup in _get_webhook_secret can
raise (e.g., from get_provider_credential or DB Session) and that exception can
bubble up and wrongly mark a deletion job as failed; make the lookup best-effort
by catching all exceptions inside _get_webhook_secret and returning None on any
error so callers will send unsigned webhooks rather than failing the job. Update
_get_webhook_secret to wrap the Session/get_provider_credential call in a
try/except (catch Exception), log the error if helpful, and return None when an
exception occurs.

In `@backend/app/services/llm/jobs.py`:
- Around line 176-189: The _get_webhook_secret lookup can raise on DB/decryption
errors and should "fail open" so optional signing doesn't block callbacks or
job-state updates; update _get_webhook_secret to catch any exceptions around the
Session/get_provider_credential call (and any decryption) and return None on
error (optionally log.debug or processLogger.debug the exception), leaving the
rest of the code (e.g., callers like handle_job_error) to proceed without a
secret.

In `@backend/app/utils.py`:
- Around line 479-486: The serialization and signing logic in send_callback
(json.dumps -> raw_body and sign_webhook_payload) can raise exceptions before
the request try/except, so wrap the json.dumps call and the webhook signing
(sign_webhook_payload, assignment of raw_body, headers, signature, timestamp_ms)
inside the same failure handling used for the request: catch exceptions (e.g.,
TypeError/ValueError or a broad Exception), log the error via logger with
context, and return False so malformed/non-serializable callback data stays
inside send_callback’s documented failure contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a1ebed5-faef-42a9-8c9d-81fcae4498dc

📥 Commits

Reviewing files that changed from the base of the PR and between bdd5f0b and 788a1f0.

📒 Files selected for processing (4)
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/delete_collection.py
  • backend/app/services/llm/jobs.py
  • backend/app/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/services/collections/create_collection.py

Comment thread backend/app/services/collections/delete_collection.py Outdated
Comment thread backend/app/services/llm/jobs.py Outdated
Comment thread backend/app/utils.py Outdated
return None


def _get_webhook_secret(project_id: int, organization_id: int) -> str | None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function could be put in a common place since it will be used across multiple services in the services folder

Comment thread backend/app/utils.py Outdated
signature, timestamp_ms = sign_webhook_payload(webhook_secret, raw_body)
headers["X-Webhook-Signature"] = signature
headers["X-Webhook-Timestamp"] = str(timestamp_ms)
logger.info("[send_callback] Callback signed with webhook secret")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two logs comes from this function - 2026-04-23 12:22:49,822 - [kaapi-celery] - [cdeff6a8f76d43fa98808762094d0bba] - INFO - app.utils - [send_callback] Callback signed with webhook secret
2026-04-23 12:22:50,782 - [kaapi-celery] - [cdeff6a8f76d43fa98808762094d0bba] - INFO - app.utils - [send_callback] Callback sent successfully to {callback_url}, maybe this can be merged into one log

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. But top one was for debugging purpose and is not required only. So removing that.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 98.26389% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/utils.py 87.50% 3 Missing ⚠️
backend/app/services/llm/jobs.py 83.33% 1 Missing ⚠️
backend/app/services/response/callbacks.py 50.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
backend/app/utils.py (2)

446-463: ⚠️ Potential issue | 🟠 Major

Make optional webhook-secret lookup fail open.

get_webhook_secret() can raise on DB/decryption/credential lookup failures, and every caller treats signing as optional. Returning None on lookup failure keeps callbacks and job-state transitions from failing because signing credentials are unavailable. This is the same unresolved risk as the earlier per-service helper comment, now centralized here.

🛡️ Proposed fix
 def get_webhook_secret(
     project_id: int | None, organization_id: int | None
 ) -> str | None:
     """Look up the configured webhook signing secret for this project, or None."""
     if project_id is None or organization_id is None:
         return None
     # Imported lazily: app.core.db pulls in app.crud, which imports app.utils,
     # so a top-level import here would deadlock module initialization.
     from app.core.db import engine
 
-    with Session(engine) as session:
-        creds = get_provider_credential(
-            session=session,
-            org_id=organization_id,
-            project_id=project_id,
-            provider="webhook_secret",
-        )
-    return creds.get("webhook_secret") if isinstance(creds, dict) else None
+    try:
+        with Session(engine) as session:
+            creds = get_provider_credential(
+                session=session,
+                org_id=organization_id,
+                project_id=project_id,
+                provider="webhook_secret",
+            )
+        return creds.get("webhook_secret") if isinstance(creds, dict) else None
+    except Exception:
+        logger.warning(
+            "[get_webhook_secret] Failed to load webhook secret; sending unsigned callback",
+            exc_info=True,
+        )
+        return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/utils.py` around lines 446 - 463, get_webhook_secret currently
can raise on DB/decryption/credential lookup failures; wrap the DB call and
credential retrieval (the Session(engine) context and call to
get_provider_credential) in a try/except that catches broad operational errors
(e.g., DB, decryption, or lookup exceptions) and returns None on any failure so
callers treat signing as optional; keep the lazy import of engine and return
creds.get("webhook_secret") if creds is a dict on success, otherwise return
None.

498-529: ⚠️ Potential issue | 🟠 Major

Catch serialization/signing failures inside send_callback.

The try now includes json.dumps(), but the handler still only catches requests.RequestException. A non-JSON-safe payload still raises TypeError before the callback can return False, so callers like failure handlers may skip job-state updates. This duplicates the earlier serialization failure concern, which is still present in the current code.

🛡️ Proposed fix
     try:
         raw_body = json.dumps(data, separators=(",", ":")).encode()
         headers = {"Content-Type": "application/json"}
 
         if webhook_secret:
             signature, timestamp_ms = sign_webhook_payload(webhook_secret, raw_body)
             headers["X-Webhook-Signature"] = signature
             headers["X-Webhook-Timestamp"] = str(timestamp_ms)
             logger.info("[send_callback] Callback signed with webhook secret")
 
         with requests.Session() as session:
             session.trust_env = False  # Ignores environment proxies and other implicit settings for SSRF safety
 
             response = session.post(
                 callback_url,
                 data=raw_body,
                 headers=headers,
                 timeout=(
                     settings.CALLBACK_CONNECT_TIMEOUT,
                     settings.CALLBACK_READ_TIMEOUT,
                 ),
                 allow_redirects=False,
             )
 
             response.raise_for_status()
 
             logger.info(f"[send_callback] Callback sent successfully to {callback_url}")
             return True
 
+    except (TypeError, ValueError) as e:
+        logger.error(
+            f"[send_callback] Failed to serialize/sign callback payload: {str(e)}",
+            exc_info=True,
+        )
+        return False
     except requests.RequestException as e:
         logger.error(f"[send_callback] Callback failed: {str(e)}", exc_info=True)
         return False

Verification:

#!/bin/bash
# Description: Demonstrate that stdlib json.dumps raises TypeError for common payload values.
python - <<'PY'
import json
from uuid import uuid4

try:
    json.dumps({"id": uuid4()})
except TypeError as exc:
    print(type(exc).__name__, str(exc))
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/utils.py` around lines 498 - 529, send_callback currently only
catches requests.RequestException, but json.dumps (and sign_webhook_payload) can
raise TypeError/ValueError during serialization or signing and will escape the
handler; wrap the serialization and signing steps (the json.dumps(...) call and
the sign_webhook_payload(...) call) in the same try/except or broaden the
existing except to also catch TypeError and ValueError (or Exception) so those
failures are logged via logger.error (with exc_info=True) and the function
returns False; locate the logic in send_callback around json.dumps and the
sign_webhook_payload invocation to apply the changed exception handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/docs/credentials/create.md`:
- Around line 45-58: The examples for webhook credentials are invalid and
inconsistent: fix the multi-provider JSON by closing the missing quote on the
"webhook_secret" key and nesting the provider object under "credential" so it
matches the provider-object shape (i.e., "credential": { "webhook_secret": {
"webhook_secret": "..." } }), and update the standalone "For registering Webhook
Secret" example to use the same object shape (credential -> webhook_secret ->
webhook_secret) so both examples are valid JSON and consistent; key symbols to
locate are the "credential" object and the "webhook_secret" key in the shown
examples.

---

Duplicate comments:
In `@backend/app/utils.py`:
- Around line 446-463: get_webhook_secret currently can raise on
DB/decryption/credential lookup failures; wrap the DB call and credential
retrieval (the Session(engine) context and call to get_provider_credential) in a
try/except that catches broad operational errors (e.g., DB, decryption, or
lookup exceptions) and returns None on any failure so callers treat signing as
optional; keep the lazy import of engine and return creds.get("webhook_secret")
if creds is a dict on success, otherwise return None.
- Around line 498-529: send_callback currently only catches
requests.RequestException, but json.dumps (and sign_webhook_payload) can raise
TypeError/ValueError during serialization or signing and will escape the
handler; wrap the serialization and signing steps (the json.dumps(...) call and
the sign_webhook_payload(...) call) in the same try/except or broaden the
existing except to also catch TypeError and ValueError (or Exception) so those
failures are logged via logger.error (with exc_info=True) and the function
returns False; locate the logic in send_callback around json.dumps and the
sign_webhook_payload invocation to apply the changed exception handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ceb16562-719e-4fda-a403-bfe7098f3838

📥 Commits

Reviewing files that changed from the base of the PR and between 788a1f0 and 3b73458.

📒 Files selected for processing (10)
  • backend/app/api/docs/credentials/create.md
  • backend/app/api/docs/credentials/update.md
  • backend/app/api/docs/credentials/update_by_org_project.md
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/delete_collection.py
  • backend/app/services/doctransform/job.py
  • backend/app/services/llm/chain/executor.py
  • backend/app/services/llm/jobs.py
  • backend/app/services/response/callbacks.py
  • backend/app/utils.py
✅ Files skipped from review due to trivial changes (2)
  • backend/app/api/docs/credentials/update_by_org_project.md
  • backend/app/api/docs/credentials/update.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/services/collections/delete_collection.py

Comment on lines +45 to +58
"webhook_secret": {
"webhook_secret: "webhook_secret"
},
}
}
```
#### For registering Webhook Secret
```json
{
"credential":{
"webhook_secret":"your-webhook-secret"
}

}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the webhook credential examples.

The multi-provider JSON is invalid ("webhook_secret: ... is missing the closing key quote), and the standalone example uses a different shape than the provider-object format. This will mislead API users.

📝 Proposed fix
     "langfuse": {
       "public_key": "pk-lf-....",
       "secret_key": "sk-lf-...",
       "host": "https://cloud.langfuse.com"
     },
     "webhook_secret": {
-      "webhook_secret: "webhook_secret"
-    },
+      "webhook_secret": "your-webhook-secret"
+    }
   }
 }

For registering Webhook Secret

{
-  "credential":{
-    "webhook_secret":"your-webhook-secret"
+  "credential": {
+    "webhook_secret": {
+      "webhook_secret": "your-webhook-secret"
+    }
  }
-
}
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @backend/app/api/docs/credentials/create.md around lines 45 - 58, The
examples for webhook credentials are invalid and inconsistent: fix the
multi-provider JSON by closing the missing quote on the "webhook_secret" key and
nesting the provider object under "credential" so it matches the provider-object
shape (i.e., "credential": { "webhook_secret": { "webhook_secret": "..." } }),
and update the standalone "For registering Webhook Secret" example to use the
same object shape (credential -> webhook_secret -> webhook_secret) so both
examples are valid JSON and consistent; key symbols to locate are the
"credential" object and the "webhook_secret" key in the shown examples.


</details>

<!-- fingerprinting:phantom:medusa:ibis -->

<!-- This is an auto-generated comment by CodeRabbit -->

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
backend/app/tests/core/test_callback_ssrf.py (1)

337-346: Add a positive send_callback signing-path test.

This covers the unsigned path, but not that send_callback(..., webhook_secret=...) actually emits X-Webhook-Signature/X-Webhook-Timestamp for the exact transmitted body.

Suggested test coverage
         assert "X-Webhook-Signature" not in call_kwargs["headers"]
         assert "X-Webhook-Timestamp" not in call_kwargs["headers"]
+
+    `@patch`("app.utils.validate_callback_url")
+    `@patch`("requests.Session")
+    def test_callback_adds_signature_headers_when_secret_provided(
+        self, mock_session_class: Any, mock_validate: Any
+    ) -> None:
+        mock_session = MagicMock()
+        mock_response = MagicMock()
+        mock_response.raise_for_status.return_value = None
+        mock_response.iter_content.return_value = [b"test"]
+        mock_session.post.return_value = mock_response
+        mock_session_class.return_value.__enter__.return_value = mock_session
+
+        test_data = {"status": "completed", "result": 42}
+        send_callback(
+            "https://api.example.com/callback",
+            test_data,
+            webhook_secret="secret",
+        )
+
+        call_kwargs = mock_session.post.call_args[1]
+        raw_body = json.dumps(test_data, separators=(",", ":")).encode()
+        timestamp = int(call_kwargs["headers"]["X-Webhook-Timestamp"])
+        expected_signature = hmac.new(
+            b"secret",
+            f"{timestamp}.".encode() + raw_body,
+            hashlib.sha256,
+        ).hexdigest()
+
+        assert call_kwargs["data"] == raw_body
+        assert call_kwargs["headers"]["X-Webhook-Signature"] == expected_signature
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/core/test_callback_ssrf.py` around lines 337 - 346, Add a
positive test that calls send_callback with a webhook_secret and asserts the
signing path: after invoking send_callback (use the same mock_session.post and
test_data as the unsigned test), grab call_kwargs =
mock_session.post.call_args[1], assert that "X-Webhook-Timestamp" and
"X-Webhook-Signature" are present in call_kwargs["headers"], verify the
timestamp is parseable/consistent, and recompute the expected HMAC over the
exact transmitted body (call_kwargs["data"] — which should equal
json.dumps(test_data, separators=(",", ":")).encode()) using the same
webhook_secret and the timestamp format your implementation uses, then assert
the computed signature equals call_kwargs["headers"]["X-Webhook-Signature"] to
ensure send_callback produced the correct signature for the exact bytes sent.
backend/app/tests/services/doctransformer/test_job/test_callbacks.py (3)

107-117: Consider extracting the repeated patch stack into a fixture/context manager.

Every test in this file re-declares the same set of patch(...) entries (Session, send_callback, get_webhook_secret, optional convert_document, TRANSFORMERS) and the mock_session.__enter__/__exit__ wiring. A small helper (e.g., @contextmanager def _patched_job_env(db, *, webhook_secret=None, convert_side_effect=None, storage=None) or a pytest fixture) would significantly cut duplication and make future patching changes (like the new get_webhook_secret) a one-line update across the suite.

Purely a DRY improvement — no behavioral impact.

Also applies to: 151-158, 188-199, 236-250, 284-295, 326-340, 386-393, 429-440, 483-497, 536-550

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/doctransformer/test_job/test_callbacks.py` around
lines 107 - 117, Create a small reusable contextmanager or pytest fixture to
centralize the repeated patch stack used across tests: replace the repeated with
(...) patch blocks that mock Session, send_callback, get_webhook_secret,
TRANSFORMERS (and optional convert_document/storage) and the manual
mock_session.__enter__/__exit__ wiring by implementing e.g. a contextmanager
_patched_job_env(db, *, webhook_secret=None, convert_side_effect=None,
storage=None) that yields the mocked objects; update tests to call this helper
(or use the fixture) and wire return_value/side_effects (for get_webhook_secret
and convert_document) and TRANSFORMERS={"test": MockTestTransformer} there so
future changes require a single edit.

13-13: Prefer built-in tuple over deprecated typing.Tuple.

Ruff flags this as UP035. On Python 3.11+ (per coding guidelines), use the built-in generic directly and drop the typing import.

♻️ Proposed change
-from typing import Tuple

Then update the annotations throughout the file:

-        self, db: Session, test_document: Tuple[Document, Project]
+        self, db: Session, test_document: tuple[Document, Project]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/doctransformer/test_job/test_callbacks.py` at line
13, The import from typing of Tuple is deprecated (UP035); remove "from typing
import Tuple" and replace any use of the typing alias Tuple[...] with the
built-in generic "tuple[...]" in this module (e.g., update function/type
annotations referenced in test_callbacks such as any occurrences of Tuple[int,
str] or Tuple[...]) so the file uses the built-in tuple generics on Python
3.11+.

47-48: Replace deprecated datetime.utcnow() with timezone-aware alternative.

In Python 3.12+, datetime.utcnow() is deprecated in favor of timezone-aware construction. Update to datetime.now(timezone.utc) to align with modern standards and avoid deprecation warnings.

Proposed change
-from datetime import datetime
+from datetime import datetime, timezone
@@
-        inserted_at=datetime.utcnow(),
-        updated_at=datetime.utcnow(),
+        inserted_at=datetime.now(timezone.utc),
+        updated_at=datetime.now(timezone.utc),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/doctransformer/test_job/test_callbacks.py` around
lines 47 - 48, Replace deprecated datetime.utcnow() usages for the inserted_at
and updated_at fields with timezone-aware calls: use datetime.now(timezone.utc)
and ensure timezone is imported (e.g., from datetime import datetime, timezone)
so both inserted_at and updated_at are timezone-aware.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/tests/core/test_callback_ssrf.py`:
- Around line 349-380: Add explicit return type annotations "-> None" to each
test method inside the TestSignWebhookPayload class
(test_returns_hex_signature_and_timestamp,
test_deterministic_with_fixed_timestamp,
test_different_secrets_produce_different_signatures,
test_different_timestamps_produce_different_signatures,
test_signature_matches_manual_hmac) so they conform to the project typing rule;
update each method signature to include the return type while leaving the body
and calls to sign_webhook_payload intact.

In `@backend/app/tests/services/doctransformer/test_job/test_callbacks.py`:
- Around line 382-384: The nested helper function capture should be annotated:
add type hints to its parameters and return type (e.g., def capture(path:
os.PathLike[str] | str, **kw: Any) -> None), and ensure Any and os are imported
or referenced; also apply the identical annotation to the second capture helper
mentioned (the one at the other occurrence that calls real_rmtree and appends
str(path) to removed).
- Around line 315-358: The test name implies ordering but currently only checks
final DB state; modify the test_failure_marks_job_failed_before_callback to
assert the FAILED update occurs before send_callback is invoked by attaching a
side_effect to the patched send_callback that refreshes the job
(db.refresh(job)) and asserts job.status == TransformationStatus.FAILED (or
capture a timestamp/order flag there), then run execute_job.__wrapped__ as
before; alternatively, if you prefer not to test ordering, rename the test to
reflect it only verifies the final job state. Ensure references to
send_callback, execute_job, job, and TransformationStatus.FAILED are used so the
side_effect can observe the job status at callback time.

---

Nitpick comments:
In `@backend/app/tests/core/test_callback_ssrf.py`:
- Around line 337-346: Add a positive test that calls send_callback with a
webhook_secret and asserts the signing path: after invoking send_callback (use
the same mock_session.post and test_data as the unsigned test), grab call_kwargs
= mock_session.post.call_args[1], assert that "X-Webhook-Timestamp" and
"X-Webhook-Signature" are present in call_kwargs["headers"], verify the
timestamp is parseable/consistent, and recompute the expected HMAC over the
exact transmitted body (call_kwargs["data"] — which should equal
json.dumps(test_data, separators=(",", ":")).encode()) using the same
webhook_secret and the timestamp format your implementation uses, then assert
the computed signature equals call_kwargs["headers"]["X-Webhook-Signature"] to
ensure send_callback produced the correct signature for the exact bytes sent.

In `@backend/app/tests/services/doctransformer/test_job/test_callbacks.py`:
- Around line 107-117: Create a small reusable contextmanager or pytest fixture
to centralize the repeated patch stack used across tests: replace the repeated
with (...) patch blocks that mock Session, send_callback, get_webhook_secret,
TRANSFORMERS (and optional convert_document/storage) and the manual
mock_session.__enter__/__exit__ wiring by implementing e.g. a contextmanager
_patched_job_env(db, *, webhook_secret=None, convert_side_effect=None,
storage=None) that yields the mocked objects; update tests to call this helper
(or use the fixture) and wire return_value/side_effects (for get_webhook_secret
and convert_document) and TRANSFORMERS={"test": MockTestTransformer} there so
future changes require a single edit.
- Line 13: The import from typing of Tuple is deprecated (UP035); remove "from
typing import Tuple" and replace any use of the typing alias Tuple[...] with the
built-in generic "tuple[...]" in this module (e.g., update function/type
annotations referenced in test_callbacks such as any occurrences of Tuple[int,
str] or Tuple[...]) so the file uses the built-in tuple generics on Python
3.11+.
- Around line 47-48: Replace deprecated datetime.utcnow() usages for the
inserted_at and updated_at fields with timezone-aware calls: use
datetime.now(timezone.utc) and ensure timezone is imported (e.g., from datetime
import datetime, timezone) so both inserted_at and updated_at are
timezone-aware.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0cf79a3-5cba-4c00-89c5-84b3efb7e98c

📥 Commits

Reviewing files that changed from the base of the PR and between 3b73458 and d14a59e.

📒 Files selected for processing (3)
  • backend/app/tests/core/test_callback_ssrf.py
  • backend/app/tests/services/doctransformer/test_job/test_callbacks.py
  • backend/app/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/utils.py

Comment on lines +349 to +380
class TestSignWebhookPayload:
def test_returns_hex_signature_and_timestamp(self):
sig, ts = sign_webhook_payload("secret", b"body")
assert isinstance(sig, str) and len(sig) == 64 # sha256 hex
assert isinstance(ts, int) and ts > 0

def test_deterministic_with_fixed_timestamp(self):
body = b'{"key":"value"}'
sig1, _ = sign_webhook_payload("secret", body, timestamp_ms=1000)
sig2, _ = sign_webhook_payload("secret", body, timestamp_ms=1000)
assert sig1 == sig2

def test_different_secrets_produce_different_signatures(self):
body = b"payload"
sig1, _ = sign_webhook_payload("secret-a", body, timestamp_ms=1000)
sig2, _ = sign_webhook_payload("secret-b", body, timestamp_ms=1000)
assert sig1 != sig2

def test_different_timestamps_produce_different_signatures(self):
body = b"payload"
sig1, _ = sign_webhook_payload("secret", body, timestamp_ms=1000)
sig2, _ = sign_webhook_payload("secret", body, timestamp_ms=2000)
assert sig1 != sig2

def test_signature_matches_manual_hmac(self):
secret, body, ts = "mysecret", b"hello", 999
expected = hmac.new(
secret.encode(), f"{ts}.".encode() + body, hashlib.sha256
).hexdigest()
sig, returned_ts = sign_webhook_payload(secret, body, timestamp_ms=ts)
assert sig == expected
assert returned_ts == ts
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add return type annotations to the new tests.

The new test methods should declare -> None to match the project typing rule and the surrounding test style. As per coding guidelines, **/*.py: “Always add type hints to all function parameters and return values in Python code”.

Suggested typing fix
 class TestSignWebhookPayload:
-    def test_returns_hex_signature_and_timestamp(self):
+    def test_returns_hex_signature_and_timestamp(self) -> None:
         sig, ts = sign_webhook_payload("secret", b"body")
         assert isinstance(sig, str) and len(sig) == 64  # sha256 hex
         assert isinstance(ts, int) and ts > 0
 
-    def test_deterministic_with_fixed_timestamp(self):
+    def test_deterministic_with_fixed_timestamp(self) -> None:
         body = b'{"key":"value"}'
         sig1, _ = sign_webhook_payload("secret", body, timestamp_ms=1000)
         sig2, _ = sign_webhook_payload("secret", body, timestamp_ms=1000)
         assert sig1 == sig2
 
-    def test_different_secrets_produce_different_signatures(self):
+    def test_different_secrets_produce_different_signatures(self) -> None:
         body = b"payload"
         sig1, _ = sign_webhook_payload("secret-a", body, timestamp_ms=1000)
         sig2, _ = sign_webhook_payload("secret-b", body, timestamp_ms=1000)
         assert sig1 != sig2
 
-    def test_different_timestamps_produce_different_signatures(self):
+    def test_different_timestamps_produce_different_signatures(self) -> None:
         body = b"payload"
         sig1, _ = sign_webhook_payload("secret", body, timestamp_ms=1000)
         sig2, _ = sign_webhook_payload("secret", body, timestamp_ms=2000)
         assert sig1 != sig2
 
-    def test_signature_matches_manual_hmac(self):
+    def test_signature_matches_manual_hmac(self) -> None:
         secret, body, ts = "mysecret", b"hello", 999
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/core/test_callback_ssrf.py` around lines 349 - 380, Add
explicit return type annotations "-> None" to each test method inside the
TestSignWebhookPayload class (test_returns_hex_signature_and_timestamp,
test_deterministic_with_fixed_timestamp,
test_different_secrets_produce_different_signatures,
test_different_timestamps_produce_different_signatures,
test_signature_matches_manual_hmac) so they conform to the project typing rule;
update each method signature to include the return type while leaving the body
and calls to sign_webhook_payload intact.

Comment on lines +315 to +358
def test_failure_marks_job_failed_before_callback(
self, db: Session, test_document: Tuple[Document, Project]
) -> None:
document, project = test_document
aws = self.setup_aws_s3()
self.create_s3_document_content(aws, document)

job = DocTransformationJobCrud(db, project_id=project.id).create(
DocTransformJobCreate(source_document_id=document.id)
)

with (
patch("app.services.doctransform.job.Session") as mock_session,
patch("app.services.doctransform.job.send_callback"),
patch(
"app.services.doctransform.job.get_webhook_secret", return_value=None
),
patch(
"app.services.doctransform.job.convert_document",
side_effect=RuntimeError("crash"),
),
patch(
"app.services.doctransform.registry.TRANSFORMERS",
{"test": MockTestTransformer},
),
):
mock_session.return_value.__enter__.return_value = db
mock_session.return_value.__exit__.return_value = None

with pytest.raises(RuntimeError):
execute_job.__wrapped__(
project_id=project.id,
job_id=str(job.id),
source_document_id=str(document.id),
transformer_name="test",
target_format="markdown",
task_id=str(uuid4()),
callback_url="https://example.com/webhook",
task_instance=None,
)

db.refresh(job)
assert job.status == TransformationStatus.FAILED
assert "crash" in job.error_message
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test name promises ordering, assertion doesn’t verify it.

test_failure_marks_job_failed_before_callback only asserts the final DB state; it never verifies that the FAILED update happened before send_callback was invoked. If a future refactor reordered those operations, this test would still pass silently. Consider recording observation order via a side effect on the send_callback mock and asserting the job was already FAILED at that moment, or rename the test to reflect what it actually checks.

♻️ Example: verify ordering
-            patch("app.services.doctransform.job.send_callback"),
+            patch("app.services.doctransform.job.send_callback") as mock_send,
@@
+        status_at_callback: list[TransformationStatus] = []
+
+        def record_status(*args, **kwargs):
+            db.refresh(job)
+            status_at_callback.append(job.status)
+
+        mock_send.side_effect = record_status
@@
         db.refresh(job)
         assert job.status == TransformationStatus.FAILED
         assert "crash" in job.error_message
+        assert status_at_callback == [TransformationStatus.FAILED]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/doctransformer/test_job/test_callbacks.py` around
lines 315 - 358, The test name implies ordering but currently only checks final
DB state; modify the test_failure_marks_job_failed_before_callback to assert the
FAILED update occurs before send_callback is invoked by attaching a side_effect
to the patched send_callback that refreshes the job (db.refresh(job)) and
asserts job.status == TransformationStatus.FAILED (or capture a timestamp/order
flag there), then run execute_job.__wrapped__ as before; alternatively, if you
prefer not to test ordering, rename the test to reflect it only verifies the
final job state. Ensure references to send_callback, execute_job, job, and
TransformationStatus.FAILED are used so the side_effect can observe the job
status at callback time.

Comment on lines +382 to +384
def capture(path, **kw):
removed.append(str(path))
real_rmtree(path, **kw)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints to the capture helpers.

As per coding guidelines ("Always add type hints to all function parameters and return values in Python code"), these nested helpers should be annotated.

♻️ Proposed change
-        def capture(path, **kw):
+        def capture(path: str, **kw: object) -> None:
             removed.append(str(path))
             real_rmtree(path, **kw)

Apply identically at Line 425.

Also applies to: 425-427

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/doctransformer/test_job/test_callbacks.py` around
lines 382 - 384, The nested helper function capture should be annotated: add
type hints to its parameters and return type (e.g., def capture(path:
os.PathLike[str] | str, **kw: Any) -> None), and ensure Any and os are imported
or referenced; also apply the identical annotation to the second capture helper
mentioned (the one at the other occurrence that calls real_rmtree and appends
str(path) to removed).

Comment thread backend/app/utils.py
return signature, timestamp_ms


def get_webhook_secret(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: would have been better to keep this the way get openai client or get langfuse client is for consistency.

@Prajna1999 Prajna1999 merged commit 7d4da2e into main Apr 24, 2026
3 checks passed
@Prajna1999 Prajna1999 deleted the feat/webhook-fingerprinting branch April 24, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Webhook: Implement payload fingerprinting

3 participants