feat(security): SSRF guard for OTel export endpoint (CRITICAL-01)#134
Conversation
…AL-01)
Addresses CRITICAL-01 from the security audit: the POST /api/sessions/{id}/
export/otel endpoint accepted a user-supplied URL and made outbound HTTP/gRPC
requests with no IP-range validation. An attacker with API access could target
cloud metadata endpoints (169.254.169.254), internal services, or loopback.
Changes:
- New `crates/rewind-web/src/url_guard.rs`: validates export endpoints by
resolving the hostname and rejecting any IP in a blocked range before the
outbound connection. Covers: RFC 1918 (10/8, 172.16/12, 192.168/16),
link-local (169.254/16, fe80::/10), loopback (127/8, ::1), unspecified,
multicast, broadcast, documentation (192.0.2/24 etc, 2001:db8::/32),
shared-address-space (100.64/10), benchmarking (198.18/15), unique-local
v6 (fc00::/7), and v4-mapped v6 (::ffff:priv). 23 unit tests.
- `api.rs::export_otel` now calls `validate_export_endpoint()` before any
session lookup or outbound request. Returns 400 with SSRF error message.
- 4 new integration tests in api_tests.rs: loopback rejection, cloud
metadata rejection, RFC 1918 rejection, existing 404 test updated to
use a public endpoint IP.
- `docs/security-audit.md` updated: CRITICAL-01 + CRITICAL-02 marked as
fixed (PR #133 + #134), MEDIUM-09 marked as partially fixed, ship order
table updated with status column.
Known limitation: DNS rebinding between validation and the opentelemetry-otlp
library's connection-time re-resolution. Documented inline in url_guard.rs.
Requires upstream API changes to fully close.
This is PR #2 of the 5-PR audit ship order. No version bump (deferred
until all 4 security PRs ship per team decision).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
risjai
left a comment
There was a problem hiding this comment.
Code Review — PR #134: SSRF guard for OTel export (CRITICAL-01)
Verdict: Well-structured guard with strong IP-range coverage. I'd block merge on the octal IP parsing differential and the missing Teredo/6to4 ranges — both are exploitable bypasses. The URL parser also has a class of parser-differential bugs. Details below.
Overview
Adds crates/rewind-web/src/url_guard.rs — a standalone SSRF validation module. The old starts_with("http://") check in api.rs::export_otel is replaced by validate_export_endpoint() which parses the URL, resolves hostnames via ToSocketAddrs, and rejects any resolved IP in a private/reserved range. 23 unit tests, 4 integration tests. The DNS rebinding limitation is documented.
Good scope, good defense-in-depth posture (this runs even though PR #133 now gates the endpoint behind auth).
🔴 Must fix before merge
1. Octal/hex/decimal IP bypass via to_socket_addrs parser differential
When host.parse::<IpAddr>() fails (Rust's parser rejects non-standard forms like 0177.0.0.1, 0x7f000001, 2130706433), the code falls through to to_socket_addrs(), which delegates to the platform's getaddrinfo(3). On some libc implementations:
0177.0.0.1resolves as177.0.0.1(octal digits interpreted as decimal) — passes the guard as public, but the downstream HTTP client (hyper viareqwest) may interpret the octets differently viainet_aton, connecting to127.0.0.1- The safety of the guard depends on
getaddrinfoand the HTTP client agreeing on IP interpretation, which is not guaranteed across OSes or library versions
Fix: After host.parse::<IpAddr>() fails, reject any host that looks like a numeric IP (all digits/dots/hex). Only pass through hostnames with at least one alphabetic character:
fn looks_like_numeric_ip(host: &str) -> bool {
let h = host.strip_prefix("0x").unwrap_or(host);
!h.is_empty() && h.chars().all(|c| c.is_ascii_hexdigit() || c == '.' || c == 'x' || c == 'X')
}
// In validate_export_endpoint, between the IpAddr::parse and to_socket_addrs:
if looks_like_numeric_ip(&host) {
return Err(format!("Endpoint host '{host}' looks like a non-standard IP literal (rejected for SSRF safety)"));
}2. Missing Teredo (2001:0000::/32) and 6to4 (2002::/16) blocks
Both IPv6 transition mechanisms embed a destination IPv4 that a relay/gateway will route to. An attacker can encode 127.0.0.1 or 169.254.169.254 inside a Teredo or 6to4 address:
- Teredo:
2001:0000:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx— the embedded IPv4 is in the last 32 bits (obfuscated/inverted) - 6to4:
2002:7f00:0001::encodes127.0.0.1directly in bits 16-47
These are exploitable on hosts with Teredo/6to4 enabled (common in mixed-stack environments — exactly the K8s scenario this guard targets).
Fix: Add to is_blocked_v6:
|| is_teredo(ip) // 2001:0000::/32 — embeds routable IPv4
|| is_6to4(ip) // 2002::/16 — embeds destination IPv4fn is_teredo(ip: Ipv6Addr) -> bool {
let s = ip.segments();
s[0] == 0x2001 && s[1] == 0x0000
}
fn is_6to4(ip: Ipv6Addr) -> bool {
ip.segments()[0] == 0x2002
}Note: is_documentation_v6 blocks 2001:0db8::/32 but not 2001:0000::/32. These are different allocations; Teredo is the dangerous one.
🟡 Should address
3. URL parser does not reject backslash, whitespace, or percent-encoded authority
parse_host_port does not strip or reject \, \t, \n, \r, or %-encoded characters in the authority component. Classic parser-differential attacks:
http://evil.com\@127.0.0.1/— the guard'srsplit_once('@')extracts127.0.0.1, checks it (blocked!), but some HTTP clients following WHATWG URL parsing treat\as a path separator and connect toevil.comhttp://evil.com%5C@127.0.0.1/— percent-encoded variant of the same
In this specific case the guard would correctly reject the loopback IP, so the parser differential is "opposite direction" (guard is more restrictive than the client). But the mismatch still undermines trust in the guard's invariants — a future change could flip the direction.
Fix: Reject any authority containing \, %, or control characters:
if authority.bytes().any(|b| b == b'\\' || b == b'%' || b < 0x20) {
return None;
}Or, better yet, add the url crate as a dependency (it's already transitively present via reqwest) and use its WHATWG-compliant parser instead of the hand-rolled one. This eliminates the entire class of parser-differential bugs.
4. to_socket_addrs is blocking I/O in an async context
(host.as_str(), port).to_socket_addrs() calls getaddrinfo(3), which is a blocking syscall. In api.rs::export_otel (an async handler), this blocks the tokio worker thread for the duration of DNS resolution. Under load or with a slow resolver, this starves the runtime.
Fix: Wrap in tokio::task::spawn_blocking, or use tokio::net::lookup_host instead:
let addrs: Vec<_> = tokio::net::lookup_host((host.as_str(), port))
.await
.map_err(|e| format!("Failed to resolve '{host}': {e}"))?
.collect();This requires making validate_export_endpoint async. Alternatively, keep it sync and call it from spawn_blocking at the call site in api.rs.
🟢 Nits / optional
-
docs/security-audit.mdis 808 lines added as a new file. But the file already exists on master (added in an earlier session). Confirm this isn't a merge conflict waiting to happen — the system reminder says the file was modified, which suggests it was edited locally but never committed to master. If master already has it, this PR should modify, not create. -
Integration tests duplicate boilerplate. All 4 new integration tests (
test_export_otel_rejects_loopback_endpoint,_cloud_metadata_,_rfc1918_) share identicalAppStatesetup. Extract asetup_otel_test()helper (like the existingsetup()in the file). ~30 LOC saved, easier to maintain. -
parse_host_portstrips query and fragment, but the URL is passed through unchanged. If someone passeshttp://public.example.com/v1/traces#@169.254.169.254, the guard checkspublic.example.com(allowed) but the fragment@169.254.169.254is sent to the HTTP client. This isn't exploitable since HTTP clients strip fragments before sending, but it's worth a comment noting the invariant. -
The
urlcrate is already in the dep tree viareqwest → url 2.x. Using it directly inCargo.tomladds no new code to the binary. The hand-rolled parser works but is a maintenance burden — every time a new bypass vector is found, you're re-implementing URL parsing logic that theurlcrate already handles.
Test coverage
Strong for the IP-range checking. Gaps:
- No test for octal/hex/decimal IP forms — the critical bypass in #1
- No test for Teredo/6to4 addresses — the high bypass in #2
- No test for backslash/percent in authority — #3
- No async DNS test — the
#[ignore]test for public hosts is fine, but there's no test confirming thatto_socket_addrson a numeric-looking non-IP-parseable string is handled correctly e2e_allows_public_hostis#[ignore]— good (CI sandboxes), but consider adding a test with a mocked DNS resolver for deterministic coverage
Security cross-check against the audit
| Audit requirement | Status |
|---|---|
| Reject RFC 1918 | ✅ |
| Reject link-local / cloud metadata (169.254.x.x) | ✅ |
| Reject loopback | ✅ |
| Reject unspecified | ✅ |
| Reject multicast/broadcast | ✅ |
| Reject documentation ranges | ✅ |
| Reject CGNAT/shared (100.64/10) | ✅ |
| Reject benchmarking (198.18/15) | ✅ |
| Resolve hostname before connecting | ✅ |
| Validate ALL resolved IPs (not just first) | ✅ |
| IPv4-mapped IPv6 (::ffff:x.x.x.x) | ✅ |
| IPv4-compatible deprecated (::x.x.x.x) | ✅ |
| Unique-local (fc00::/7) | ✅ |
| Link-local v6 (fe80::/10) | ✅ |
| Teredo (2001:0000::/32) | ❌ Missing — exploitable |
| 6to4 (2002::/16) | ❌ Missing — exploitable |
| Octal/hex IP forms | ❌ Missing — platform-dependent bypass |
| DNS rebinding | |
| Parser differential (backslash/percent) |
Summary
- Block on: Octal/hex IP bypass (#1), Teredo/6to4 ranges (#2)
- Before merge: Authority sanitization (#3), async DNS (#4)
- Follow-up: Replace hand-rolled parser with
urlcrate, add bypass regression test suite
The guard's IP-range coverage is excellent for standard representations — 9 IPv4 categories and 8 IPv6 categories is more thorough than most SSRF guards I've reviewed. The two blocking issues are both "non-standard representation bypasses" which is the #1 class of SSRF guard evasion. Once those are patched, this is solid.
Addresses all blocking and should-fix items from the code review. 🔴 Blockers resolved: 1. Octal/hex/decimal IP bypass: added `looks_like_numeric_ip()` that rejects hosts consisting entirely of hex digits, dots, and x/X (e.g., 0177.0.0.1, 0x7f000001, 2130706433) BEFORE passing to DNS. Closes the parser-differential between Rust's IpAddr::parse (strict) and getaddrinfo(3) (platform-dependent lax parsing). 2. Teredo (2001:0000::/32) and 6to4 (2002::/16) blocks added to is_blocked_v6. Both IPv6 transition mechanisms embed a routable IPv4 that a relay/gateway will connect to. An attacker could encode 127.0.0.1 or 169.254.169.254 inside these addresses. 🟡 Should-fix resolved: 3. Authority sanitization: parse_host_port now rejects any authority containing backslash, percent-encoding, or control characters. Eliminates the entire class of parser-differential attacks where the guard and HTTP client disagree on the host. 4. Async DNS: validate_export_endpoint is now async, using tokio::net::lookup_host instead of std::net::ToSocketAddrs. No longer blocks the tokio worker thread during DNS resolution. New tests: 15 additional (Teredo, 6to4, octal/hex/decimal bypass, backslash, percent, control chars, end-to-end for each bypass class). Total url_guard tests: 38 (was 23). All pass. Clippy clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pushed 🔴 Blockers resolved:
🟡 Should-fix resolved:
Test count: 38 url_guard unit tests (was 23), 30 api integration tests, all green. Clippy clean. Follow-up (not in this PR): Replace hand-rolled URL parser with |
Follow-up review — fix commit
|
| # | Original ask | Status | Notes |
|---|---|---|---|
| 1 | Octal/hex/decimal IP bypass | ✅ | looks_like_numeric_ip() at url_guard.rs:73-78 rejects non-standard numeric forms before DNS. Tests: rejects_octal_ip_form, rejects_hex_ip_form, rejects_decimal_ip_form, plus 3 end-to-end. |
| 2 | Teredo/6to4 ranges | ✅ | is_teredo and is_6to4 at url_guard.rs:170-182. Both check segment prefixes correctly. Tests: blocks_teredo, blocks_6to4, e2e_rejects_teredo_v6, e2e_rejects_6to4_v6. |
| 3 | Authority sanitization | ✅ | parse_host_port rejects \, %, and < 0x20 at url_guard.rs:202-206. Tests: rejects_backslash_in_authority, rejects_percent_in_authority, rejects_control_chars_in_authority, e2e_rejects_backslash_bypass. |
| 4 | Async DNS | ✅ | validate_export_endpoint is now async, uses tokio::net::lookup_host. Call site in api.rs has .await. All tests migrated to #[tokio::test]. |
New issue in the fix
🟡 looks_like_numeric_ip false-positive on all-hex hostnames
The check h.bytes().all(|b| b.is_ascii_hexdigit() || b == b'.' || b == b'x' || b == b'X') rejects any host composed entirely of hex chars and dots. This means:
cafe.babe→ rejected (all hex digits + dot) — but this is a valid hostnamedead.beef→ rejected0xfeed.example.com→ allowed (contains non-hex alphabetics likel,p)aabb.cc→ rejected (.ccis a real TLD,aabbis all hex)
These are extremely unlikely as OTel collector hostnames, so this is a marginal over-restriction, not a security bypass. But it's worth noting for correctness. A tighter check: require that the string starts with a digit or 0x/0X prefix — real numeric IPs always start with a digit, while hostnames starting with a letter are never ambiguous:
fn looks_like_numeric_ip(host: &str) -> bool {
let first = host.bytes().next().unwrap_or(0);
if !(first.is_ascii_digit()) { return false; }
let h = host.strip_prefix("0x").or_else(|| host.strip_prefix("0X")).unwrap_or(host);
!h.is_empty() && h.bytes().all(|b| b.is_ascii_hexdigit() || b == b'.')
}This still catches 0177.0.0.1, 0x7f000001, 2130706433 but allows cafe.babe. Not a blocker — the current version is safe (over-rejects, never under-rejects).
Verification
- Confirmed
is_teredocheckssegments()[0] == 0x2001 && segments()[1] == 0x0000— correct for 2001:0000::/32 - Confirmed
is_6to4checkssegments()[0] == 0x2002— correct for 2002::/16 - Confirmed
looks_like_numeric_ip("0177.0.0.1")→ true,looks_like_numeric_ip("example.com")→ false - Confirmed authority sanitization fires before userinfo stripping (correct order — backslash in userinfo portion is caught)
- Confirmed
validate_export_endpointreturn type changed fromResult<&str, String>toResult<(), String>— call site doesn't use the returned URL, so this is a clean simplification - Test count: 38 url_guard tests (was 23), all converted to
#[tokio::test]where async
Verdict
Approve. All blocking bypasses closed. The cafe.babe false-positive is over-restriction, not under-restriction — safe to ship and tighten later if anyone hits it. Good work on the turnaround.
Summary
Addresses CRITICAL-01 from the security audit:
POST /api/sessions/{id}/export/otelaccepted a user-suppliedendpointURL and made outbound HTTP/gRPC requests with no IP-range validation. An attacker with API access (now gated by PR #133's auth, but defense-in-depth) could target cloud metadata endpoints (169.254.169.254), internal services, or loopback.Also includes the full
docs/security-audit.md(first commit to master) with CRITICAL-01, CRITICAL-02, and MEDIUM-09 marked as addressed.Implementation
New file:
crates/rewind-web/src/url_guard.rsA standalone SSRF guard module that:
urlcrate dep)ToSocketAddrsand validates all resolved IPsBlocked ranges (comprehensive):
Changed:
crates/rewind-web/src/api.rsThe old validation:
Replaced with:
This runs before session lookup or any outbound connection attempt.
Updated:
docs/security-audit.mdKnown limitation: DNS rebinding
The guard resolves the hostname once and validates the resulting IPs. The downstream
opentelemetry-otlpclient may re-resolve at connection time, so a malicious resolver could theoretically rebind between validation and connection. Fully closing this gap requires a pinned resolver, which isn't supported byopentelemetry-otlp's current API. The remaining window is narrow and documented inline inurl_guard.rs.Tests
23 url_guard unit tests covering:
4 integration tests in
api_tests.rs:test_export_otel_rejects_loopback_endpoint→ 400test_export_otel_rejects_cloud_metadata_endpoint→ 400test_export_otel_rejects_rfc1918_endpoint→ 400 (tests 10.x, 192.168.x, 172.16.x)test_export_otel_returns_404_for_missing_session→ updated to use public IP endpointFull suite: 370+ tests, 0 failures, clippy clean on touched files, UI build + 113 vitest tests pass.
Test plan
cargo test --workspace— all passcargo clippy -p rewind-web --all-targets -- -D warnings— zero errors in url_guard or api_testsnpm run build && npm test— 113/113 UI tests, prod build succeedstest_export_otel_returns_404_for_missing_sessionpasses with public endpointtest_export_otel_no_config_returns_501still passesShip order
query_rawPRAGMA lockdown (HIGH-02)No version bump — deferred until all 4 security PRs ship.
🤖 Generated with Claude Code