-
Notifications
You must be signed in to change notification settings - Fork 7
improve send_callback function with callback url validation, restricting redirection and limiting response size #474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds SSRF protections and stricter callback handling: new IP/URL validation utilities, runtime validation calls in API endpoints before using callback URLs, removal of an old post_callback helper, and send_callback now streams responses, disables redirects, and enforces a max response size. New tests cover these behaviors. Changes
sequenceDiagram
participant Client as API Endpoint
participant Validator as validate_callback_url
participant DNS as DNS Resolver
participant IPCheck as _is_private_ip
participant Sender as send_callback
participant Remote as External Service
Client->>Validator: validate_callback_url(url)
Validator->>Validator: ensure HTTPS & hostname present
Validator->>DNS: resolve hostname
DNS-->>Validator: return IP list
loop for each IP
Validator->>IPCheck: classify IP (private/loopback/link-local/multicast)
IPCheck-->>Validator: blocked? / reason
end
alt any IP blocked
Validator-->>Client: raise ValueError (SSRF blocked)
else all IPs OK
Validator-->>Client: validation passed
Client->>Sender: send_callback(url, payload)
Sender->>Validator: re-validate URL (safety)
Sender->>Remote: POST (stream=True, allow_redirects=False, timeout=(..,..))
Remote-->>Sender: response stream
Sender->>Sender: enforce CALLBACK_MAX_RESPONSE_SIZE while streaming
alt within limit
Sender-->>Client: True (success)
else exceeded
Sender-->>Client: False (closed early)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
backend/app/core/config.py (1)
121-124: Centralized callback response size limit looks good (consider future configurability).Defining
CALLBACK_MAX_RESPONSE_SIZEhere and using it insend_callbackgives a clear, global control over max response size. If you expect different deployments to need different limits, consider wiring this through env settings later, but the current 1 MiB default is a solid starting point.backend/app/api/routes/llm.py (1)
8-8: Callback URL validation adds SSRF protection; consider surfacing a clean HTTP error to clients.Using
validate_callback_urlbeforestart_jobis the right place to block SSRF-style callbacks. Currently, aValueErrorfromvalidate_callback_urlwill bubble up; unless you already have a globalValueErrorhandler, FastAPI will likely turn this into a 500. To keep the API surface predictable, you may want to wrap this in a small try/except and raise anHTTPException(e.g., 400/422) or convert to anAPIResponse.failure_responseinstead.Example:
from app.utils import APIResponse, validate_callback_url +from fastapi import HTTPException ... - if request.callback_url: - validate_callback_url(str(request.callback_url)) + if request.callback_url: + try: + validate_callback_url(str(request.callback_url)) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e))Also applies to: 47-49
backend/app/api/routes/collections.py (1)
29-29: Consistent SSRF checks on collection callbacks; align error semantics with the rest of the API.Adding
validate_callback_urlto bothcreate_collectionanddelete_collectiongives consistent SSRF protection for collection callbacks. As with the LLM route, these calls currently raiseValueErrordirectly; without a global handler this will show up as a 500 to clients. Consider normalizing this to a 4xx viaHTTPExceptionor your standard error envelope so invalid callback URLs are clearly reported as client errors.Pattern sketch:
-from app.utils import APIResponse, load_description, validate_callback_url +from app.utils import APIResponse, load_description, validate_callback_url +from fastapi import HTTPException ... - if request.callback_url: - validate_callback_url(str(request.callback_url)) + if request.callback_url: + try: + validate_callback_url(str(request.callback_url)) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) ... - if request and request.callback_url: - validate_callback_url(str(request.callback_url)) + if request and request.callback_url: + try: + validate_callback_url(str(request.callback_url)) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e))Also applies to: 84-86, 136-138
backend/app/utils.py (2)
342-392:send_callbackhardening is good; consider minor polish and future extensibility.The updated
send_callbacknow:
- Re-validates the URL via
validate_callback_url.- Uses a
requests.Sessioncontext manager.- Disables redirects and enables streaming.
- Applies connect/read timeouts from settings.
- Enforces a max response size via
CALLBACK_MAX_RESPONSE_SIZE.This is a solid improvement for SSRF and DoS resistance. A couple of small follow‑ups you might consider (not blocking):
- If you want invalid callback URLs (from
validate_callback_url) to be logged distinctly from network failures, you could catchValueErrorseparately and log a clearer message before re‑raising or returningFalse.- If you anticipate needing different size limits per deployment, wiring
CALLBACK_MAX_RESPONSE_SIZEfrom env (as mentioned inSettings) would make tuning easier.No changes required for behaviour; these are purely polish/observability ideas.
33-37: ModernizeAPIResponsetype hints to use built‑in generics.Ruff’s hint about
typing.Dictbeing deprecated is valid here:metadata: Optional[Dict[str, Any]] = NoneIn Python 3.11+, you can simplify this to:
- metadata: Optional[Dict[str, Any]] = None + metadata: dict[str, Any] | None = Noneand drop
Dict/Optionalimports if they become unused. Not urgent, but keeps types idiomatic.backend/app/tests/utils/test_callback_ssrf.py (1)
1-383: Comprehensive SSRF and callback tests; only minor nit on patch scope.This test module gives very thorough coverage of
_is_private_ip,validate_callback_url, andsend_callback(including DNS round‑robin, IPv6, metadata endpoints, timeouts, redirects, streaming, and size limits), which is exactly what this change needs.One small optional improvement: for the
validate_callback_urltests you currently patchsocket.getaddrinfo:@patch("socket.getaddrinfo") def test_reject_localhost_by_name(self, mock_getaddrinfo): ...Patching
app.utils.socket.getaddrinfoinstead would narrow the monkey‑patch to the code under test and reduce the chance of surprising interactions with other tests that also usesocket:@patch("app.utils.socket.getaddrinfo") def test_reject_localhost_by_name(self, mock_getaddrinfo): ...Not required, but a bit safer in a larger test suite.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/api/routes/collections.py(3 hunks)backend/app/api/routes/documents.py(4 hunks)backend/app/api/routes/llm.py(2 hunks)backend/app/core/config.py(1 hunks)backend/app/core/util.py(0 hunks)backend/app/tests/utils/test_callback_ssrf.py(1 hunks)backend/app/utils.py(2 hunks)
💤 Files with no reviewable changes (1)
- backend/app/core/util.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/api/routes/llm.pybackend/app/api/routes/collections.pybackend/app/api/routes/documents.pybackend/app/core/config.pybackend/app/tests/utils/test_callback_ssrf.pybackend/app/utils.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/llm.pybackend/app/api/routes/collections.pybackend/app/api/routes/documents.py
backend/app/core/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/
Files:
backend/app/core/config.py
🧬 Code graph analysis (5)
backend/app/api/routes/llm.py (1)
backend/app/utils.py (1)
validate_callback_url(291-339)
backend/app/api/routes/collections.py (1)
backend/app/utils.py (1)
validate_callback_url(291-339)
backend/app/api/routes/documents.py (1)
backend/app/utils.py (2)
load_description(399-404)validate_callback_url(291-339)
backend/app/tests/utils/test_callback_ssrf.py (1)
backend/app/utils.py (4)
_is_private_ip(268-288)validate_callback_url(291-339)send_callback(342-395)_(408-409)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
.env.example (1)
83-86: Fix .env formatting to follow best practices.The new environment variable has several formatting issues flagged by static analysis:
- Spaces around equal signs (lines 84-86 all have this issue)
- Unquoted value on line 86
- Variable ordering (CALLBACK_MAX_RESPONSE_SIZE should come before CALLBACK_READ_TIMEOUT alphabetically)
While lines 84-85 also have spaces (existing code), it's a good opportunity to fix all three lines to follow .env best practices.
Apply this diff to fix the formatting:
-# Callback Timeouts and size limit(in seconds and MB respectively) -CALLBACK_CONNECT_TIMEOUT = 3 -CALLBACK_READ_TIMEOUT = 10 -CALLBACK_MAX_RESPONSE_SIZE = 1048576 #(1*1024*1024) +# Callback Timeouts and size limit (in seconds and MB respectively) +CALLBACK_CONNECT_TIMEOUT=3 +CALLBACK_MAX_RESPONSE_SIZE="1048576" +CALLBACK_READ_TIMEOUT=10Note: Added space after "limit" in the comment, removed spaces around
=, quoted the numeric value, and reordered alphabetically.backend/app/utils.py (1)
364-364: Remove redundant type cast.The
str()cast is unnecessary sincecallback_urlis already typed asstrin the function signature.- validate_callback_url(str(callback_url)) + validate_callback_url(callback_url)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.env.example(1 hunks)backend/app/api/routes/documents.py(3 hunks)backend/app/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/api/routes/documents.py
🧰 Additional context used
📓 Path-based instructions (2)
.env.example
📄 CodeRabbit inference engine (CLAUDE.md)
Provide .env.example as the template for .env
Files:
.env.example
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 84-84: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 85-85: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 86-86: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 86-86: [UnorderedKey] The CALLBACK_MAX_RESPONSE_SIZE key should go before the CALLBACK_READ_TIMEOUT key
(UnorderedKey)
[warning] 86-86: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/utils.py (2)
268-289: LGTM! Return type annotation has been corrected.The function correctly implements comprehensive IP blocking checks (loopback, link-local, multicast, private, reserved). The return type annotation
tuple[bool, str]now matches the actual return values, addressing the past review comment.
366-389: Excellent SSRF protection measures!The implementation demonstrates strong security practices:
- Pre-request URL validation prevents SSRF attacks
allow_redirects=Falseblocks redirect-based SSRFstream=Truewith chunked reading prevents memory exhaustion- Response size limit (enforced incrementally) protects against DoS via large responses
Note: The response body is read and discarded solely to enforce the size limit. While this consumes bandwidth, it's necessary to prevent attackers from exploiting unlimited response sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/app/utils.py (1)
291-339: Add DNS resolution timeout and normalize IPv6 addresses before IP checks.The HTTPS-only enforcement and per-IP classification are solid, but there are two gaps:
No timeout on
socket.getaddrinfo(DoS risk).
A slow/malicious DNS server can causevalidate_callback_urlto hang indefinitely. This was already flagged in a previous review and is still unresolved.IPv6 scope IDs may bypass blocking.
getaddrinfocan return addresses likefe80::1%eth0; passing this directly toipaddress.ip_addressraisesValueError, and_is_private_ipthen treats it as “not blocked”.You can address both with something along these lines:
+from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError @@ - addr_info = socket.getaddrinfo( - parsed.hostname, - parsed.port or 443, - socket.AF_UNSPEC, - socket.SOCK_STREAM, - ) + # Resolve DNS with a bounded timeout to avoid hanging validation + with ThreadPoolExecutor(max_workers=1) as executor: + future = executor.submit( + socket.getaddrinfo, + parsed.hostname, + parsed.port or 443, + socket.AF_UNSPEC, + socket.SOCK_STREAM, + ) + try: + addr_info = future.result(timeout=5.0) # consider making this configurable + except FuturesTimeoutError: + raise ValueError("DNS resolution timeout for callback URL") @@ - for info in addr_info: - ip_address = info[4][0] - is_blocked, reason = _is_private_ip(ip_address) + for info in addr_info: + raw_ip = info[4][0] + # Strip IPv6 scope IDs like "fe80::1%eth0" before classification + ip_for_check = raw_ip.split("%", 1)[0] + is_blocked, reason = _is_private_ip(ip_for_check) if is_blocked: raise ValueError( - f"Callback URL resolves to {reason} IP address: {ip_address}. " + f"Callback URL resolves to {reason} IP address: {raw_ip}. " f"This IP type is not allowed for callbacks." )This keeps the existing validation semantics but makes the function resilient to DNS-based DoS and correctly handles IPv6 link-local addresses with scope IDs.
🧹 Nitpick comments (2)
backend/app/utils.py (2)
2-10: Use builtindict[...]instead oftyping.Dict(Py3.11 style).New imports look good. To satisfy the Ruff hint and modern typing style, you can drop
Dictand use builtin generics:-from typing import Any, Dict, Generic, Optional, TypeVar +from typing import Any, Generic, Optional, TypeVar @@ class APIResponse(BaseModel, Generic[T]): @@ - metadata: Optional[Dict[str, Any]] = None + metadata: Optional[dict[str, Any]] = NoneAlso applies to: 34-37
268-289: Consider also blocking “unspecified” addresses.The checks for loopback, link-local, multicast, private, and reserved ranges look solid. You may also want to treat
ip_obj.is_unspecified(e.g.0.0.0.0,::) as blocked to avoid callbacks to non-routable/placeholder addresses:- checks = [ + checks = [ (ip_obj.is_loopback, "loopback/localhost"), (ip_obj.is_link_local, "link-local"), (ip_obj.is_multicast, "multicast"), (ip_obj.is_private, "private"), (ip_obj.is_reserved, "reserved"), + (ip_obj.is_unspecified, "unspecified"), ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/utils.py (2)
268-289:_is_private_iplogic and typing look solid; consider optionally treating “unspecified” as blocked.The function now returns
tuple[bool, str]consistently with its callers, and the set of checks (loopback, link-local, multicast, private, reserved) is a good baseline for callback/IP classification. If you want to be extra strict, you could also treatip_obj.is_unspecifiedas blocked (e.g.,0.0.0.0,::) so that clearly non-routable addresses are rejected up front, but this is optional and not a blocker.
9-9: Optional: modernizeDictusage to built‑indictfor type hints.Ruff’s UP035 hint applies here: for Python 3.11+ you can drop
Dictentirely and rely on PEP‑585 style built‑ins, which you’re already using elsewhere (dict[str, Any]insend_callback):-from typing import Any, Dict, Generic, Optional, TypeVar +from typing import Any, Generic, Optional, TypeVar @@ - metadata: Optional[Dict[str, Any]] = None + metadata: Optional[dict[str, Any]] = None @@ - cls, data: T, metadata: Optional[Dict[str, Any]] = None + cls, data: T, metadata: Optional[dict[str, Any]] = None @@ - metadata: Optional[Dict[str, Any]] = None, + metadata: Optional[dict[str, Any]] = None,This is non‑functional and can be done whenever you next touch this class.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
394-394: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/utils.py (2)
2-10: New networking imports are appropriate and used correctly.
ipaddress,socket, andurlparseare all exercised in the new SSRF helpers and keep concerns local to this module; no issues from a dependency or usage standpoint.
291-339:validate_callback_urlprovides a good HTTPS-only, IP-based SSRF guardrail.The URL parsing + HTTPS enforcement + hostname requirement +
getaddrinfoover all resolved addresses, combined with_is_private_ipchecks, gives you reasonable coverage against private/loopback/link-local/multicast/reserved targets and basic DNS‑rebind scenarios. The exception handling is also clear: callers seeValueErrorwith a useful message for both policy rejections and unexpected resolution errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/utils.py (2)
9-9: Use lowercasedictinstead of deprecatedtyping.Dict.For Python 3.9+, the built-in
dictcan be used directly in type annotations. TheDictimport is unused after you've already updatedsend_callbackto usedict[str, Any].-from typing import Any, Dict, Generic, Optional, TypeVar +from typing import Any, Generic, Optional, TypeVarThen update remaining usages in
APIResponse:- metadata: Optional[Dict[str, Any]] = None + metadata: Optional[dict[str, Any]] = None
363-367: Remove unnecessarystr()conversion.The
callback_urlparameter is already typed asstr, so thestr()conversion is redundant.try: - validate_callback_url(str(callback_url)) + validate_callback_url(callback_url) except ValueError as ve:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (2)
backend/app/api/routes/threads.py (1)
send_callback(36-48)backend/app/core/cloud/storage.py (2)
stream(125-127)stream(179-195)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/utils.py (3)
268-289: LGTM!The IP validation logic is well-structured with comprehensive checks covering loopback, link-local, multicast, private, and reserved IP ranges. The return type annotation is correct, and gracefully handling
ValueErrorfor invalid IP strings is appropriate.
291-339: LGTM!The SSRF validation is thorough: enforces HTTPS, validates hostname presence, resolves DNS, and checks all returned IP addresses against the private/reserved ranges. Checking all IPs from
getaddrinfoprovides protection against DNS round-robin attacks.
369-401: LGTM on the SSRF hardening!The implementation addresses all key SSRF vectors:
session.trust_env = Falseprevents proxy-based SSRFallow_redirects=Falseblocks redirect-based attacksstream=Truewith chunked size checking prevents response-based DoS- Pre-request URL validation with IP resolution
Summary
Target issue is #426
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
send_callbackfunction by adding callback url validation to check if the url is a proper public https url and not one of these - private, multicast, link-local (which covers cloud metadata endpoints), localhost/loopback and reservedsession.trust_env=falsefor extra protection as it ignores environment proxiesSummary by CodeRabbit
New Features
Security Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.