Skip to content

feat(security): SSRF guard for OTel export endpoint (CRITICAL-01)#134

Merged
risjai merged 2 commits into
masterfrom
feat/ssrf-otel-export
Apr 21, 2026
Merged

feat(security): SSRF guard for OTel export endpoint (CRITICAL-01)#134
risjai merged 2 commits into
masterfrom
feat/ssrf-otel-export

Conversation

@risjai
Copy link
Copy Markdown
Collaborator

@risjai risjai commented Apr 21, 2026

Summary

Addresses CRITICAL-01 from the security audit: POST /api/sessions/{id}/export/otel accepted a user-supplied endpoint URL and made outbound HTTP/gRPC requests with no IP-range validation. An attacker with API access (now gated by PR #133's auth, but defense-in-depth) could target cloud metadata endpoints (169.254.169.254), internal services, or loopback.

Also includes the full docs/security-audit.md (first commit to master) with CRITICAL-01, CRITICAL-02, and MEDIUM-09 marked as addressed.

Implementation

New file: crates/rewind-web/src/url_guard.rs

A standalone SSRF guard module that:

  1. Parses the URL to extract host and port (minimal parser — no url crate dep)
  2. If the host is an IP literal, validates it directly
  3. If the host is a name, resolves via ToSocketAddrs and validates all resolved IPs
  4. Rejects any IP that falls in a blocked range, returning a user-facing error string

Blocked ranges (comprehensive):

IPv4 IPv6
RFC 1918 (10/8, 172.16/12, 192.168/16) Loopback (::1)
Loopback (127/8) Unspecified (::)
Link-local (169.254/16 — cloud metadata) Multicast (ff00::/8)
Unspecified (0.0.0.0) Unique-local (fc00::/7)
Multicast (224/4) Link-local (fe80::/10)
Broadcast (255.255.255.255) v4-mapped (::ffff:priv — recurses into v4 check)
Documentation (192.0.2/24, 198.51.100/24, 203.0.113/24) v4-compatible deprecated (::/96)
Shared-address-space / CGNAT (100.64/10) Documentation (2001:db8::/32)
Benchmarking (198.18/15)

Changed: crates/rewind-web/src/api.rs

The old validation:

if !endpoint.starts_with("http://") && !endpoint.starts_with("https://") { ... }

Replaced with:

crate::url_guard::validate_export_endpoint(&export_endpoint)
    .map_err(|e| (StatusCode::BAD_REQUEST, e))?;

This runs before session lookup or any outbound connection attempt.

Updated: docs/security-audit.md

Known limitation: DNS rebinding

The guard resolves the hostname once and validates the resulting IPs. The downstream opentelemetry-otlp client may re-resolve at connection time, so a malicious resolver could theoretically rebind between validation and connection. Fully closing this gap requires a pinned resolver, which isn't supported by opentelemetry-otlp's current API. The remaining window is narrow and documented inline in url_guard.rs.

Tests

23 url_guard unit tests covering:

  • Every blocked IPv4 range (RFC 1918, link-local, loopback, unspecified, multicast, broadcast, documentation, CGNAT, benchmarking)
  • Every blocked IPv6 range (loopback, unspecified, multicast, unique-local, link-local, v4-mapped, documentation)
  • Public unicast allowed (v4 + v6)
  • URL parser: HTTP/HTTPS ports, explicit ports, IPv6 brackets, userinfo stripping, malformed rejection

4 integration tests in api_tests.rs:

  • test_export_otel_rejects_loopback_endpoint → 400
  • test_export_otel_rejects_cloud_metadata_endpoint → 400
  • test_export_otel_rejects_rfc1918_endpoint → 400 (tests 10.x, 192.168.x, 172.16.x)
  • test_export_otel_returns_404_for_missing_session → updated to use public IP endpoint

Full suite: 370+ tests, 0 failures, clippy clean on touched files, UI build + 113 vitest tests pass.

Test plan

  • cargo test --workspace — all pass
  • cargo clippy -p rewind-web --all-targets -- -D warnings — zero errors in url_guard or api_tests
  • npm run build && npm test — 113/113 UI tests, prod build succeeds
  • Unit: 23 IP-range category tests cover every blocked range
  • Integration: loopback, metadata, RFC 1918 all return 400 before session lookup
  • Existing test_export_otel_returns_404_for_missing_session passes with public endpoint
  • Existing test_export_otel_no_config_returns_501 still passes

Ship order

# Fix Status
1 Fail-closed auth (CRITICAL-02) ✅ PR #133 merged
2 SSRF guard on OTel export (CRITICAL-01) ✅ This PR
3 Blob redaction + hop-by-hop filtering (HIGH-01/06, MEDIUM-06/08) ⏳ Next
4 query_raw PRAGMA lockdown (HIGH-02) ⏳ Planned

No version bump — deferred until all 4 security PRs ship.

🤖 Generated with Claude Code

…AL-01)

Addresses CRITICAL-01 from the security audit: the POST /api/sessions/{id}/
export/otel endpoint accepted a user-supplied URL and made outbound HTTP/gRPC
requests with no IP-range validation. An attacker with API access could target
cloud metadata endpoints (169.254.169.254), internal services, or loopback.

Changes:
- New `crates/rewind-web/src/url_guard.rs`: validates export endpoints by
  resolving the hostname and rejecting any IP in a blocked range before the
  outbound connection. Covers: RFC 1918 (10/8, 172.16/12, 192.168/16),
  link-local (169.254/16, fe80::/10), loopback (127/8, ::1), unspecified,
  multicast, broadcast, documentation (192.0.2/24 etc, 2001:db8::/32),
  shared-address-space (100.64/10), benchmarking (198.18/15), unique-local
  v6 (fc00::/7), and v4-mapped v6 (::ffff:priv). 23 unit tests.
- `api.rs::export_otel` now calls `validate_export_endpoint()` before any
  session lookup or outbound request. Returns 400 with SSRF error message.
- 4 new integration tests in api_tests.rs: loopback rejection, cloud
  metadata rejection, RFC 1918 rejection, existing 404 test updated to
  use a public endpoint IP.
- `docs/security-audit.md` updated: CRITICAL-01 + CRITICAL-02 marked as
  fixed (PR #133 + #134), MEDIUM-09 marked as partially fixed, ship order
  table updated with status column.

Known limitation: DNS rebinding between validation and the opentelemetry-otlp
library's connection-time re-resolution. Documented inline in url_guard.rs.
Requires upstream API changes to fully close.

This is PR #2 of the 5-PR audit ship order. No version bump (deferred
until all 4 security PRs ship per team decision).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
rewind Ready Ready Preview, Comment Apr 21, 2026 7:14am

Copy link
Copy Markdown
Collaborator Author

@risjai risjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #134: SSRF guard for OTel export (CRITICAL-01)

Verdict: Well-structured guard with strong IP-range coverage. I'd block merge on the octal IP parsing differential and the missing Teredo/6to4 ranges — both are exploitable bypasses. The URL parser also has a class of parser-differential bugs. Details below.


Overview

Adds crates/rewind-web/src/url_guard.rs — a standalone SSRF validation module. The old starts_with("http://") check in api.rs::export_otel is replaced by validate_export_endpoint() which parses the URL, resolves hostnames via ToSocketAddrs, and rejects any resolved IP in a private/reserved range. 23 unit tests, 4 integration tests. The DNS rebinding limitation is documented.

Good scope, good defense-in-depth posture (this runs even though PR #133 now gates the endpoint behind auth).


🔴 Must fix before merge

1. Octal/hex/decimal IP bypass via to_socket_addrs parser differential

When host.parse::<IpAddr>() fails (Rust's parser rejects non-standard forms like 0177.0.0.1, 0x7f000001, 2130706433), the code falls through to to_socket_addrs(), which delegates to the platform's getaddrinfo(3). On some libc implementations:

  • 0177.0.0.1 resolves as 177.0.0.1 (octal digits interpreted as decimal) — passes the guard as public, but the downstream HTTP client (hyper via reqwest) may interpret the octets differently via inet_aton, connecting to 127.0.0.1
  • The safety of the guard depends on getaddrinfo and the HTTP client agreeing on IP interpretation, which is not guaranteed across OSes or library versions

Fix: After host.parse::<IpAddr>() fails, reject any host that looks like a numeric IP (all digits/dots/hex). Only pass through hostnames with at least one alphabetic character:

fn looks_like_numeric_ip(host: &str) -> bool {
    let h = host.strip_prefix("0x").unwrap_or(host);
    !h.is_empty() && h.chars().all(|c| c.is_ascii_hexdigit() || c == '.' || c == 'x' || c == 'X')
}
// In validate_export_endpoint, between the IpAddr::parse and to_socket_addrs:
if looks_like_numeric_ip(&host) {
    return Err(format!("Endpoint host '{host}' looks like a non-standard IP literal (rejected for SSRF safety)"));
}

2. Missing Teredo (2001:0000::/32) and 6to4 (2002::/16) blocks

Both IPv6 transition mechanisms embed a destination IPv4 that a relay/gateway will route to. An attacker can encode 127.0.0.1 or 169.254.169.254 inside a Teredo or 6to4 address:

  • Teredo: 2001:0000:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx — the embedded IPv4 is in the last 32 bits (obfuscated/inverted)
  • 6to4: 2002:7f00:0001:: encodes 127.0.0.1 directly in bits 16-47

These are exploitable on hosts with Teredo/6to4 enabled (common in mixed-stack environments — exactly the K8s scenario this guard targets).

Fix: Add to is_blocked_v6:

|| is_teredo(ip)    // 2001:0000::/32 — embeds routable IPv4
|| is_6to4(ip)      // 2002::/16 — embeds destination IPv4
fn is_teredo(ip: Ipv6Addr) -> bool {
    let s = ip.segments();
    s[0] == 0x2001 && s[1] == 0x0000
}
fn is_6to4(ip: Ipv6Addr) -> bool {
    ip.segments()[0] == 0x2002
}

Note: is_documentation_v6 blocks 2001:0db8::/32 but not 2001:0000::/32. These are different allocations; Teredo is the dangerous one.


🟡 Should address

3. URL parser does not reject backslash, whitespace, or percent-encoded authority

parse_host_port does not strip or reject \, \t, \n, \r, or %-encoded characters in the authority component. Classic parser-differential attacks:

  • http://evil.com\@127.0.0.1/ — the guard's rsplit_once('@') extracts 127.0.0.1, checks it (blocked!), but some HTTP clients following WHATWG URL parsing treat \ as a path separator and connect to evil.com
  • http://evil.com%5C@127.0.0.1/ — percent-encoded variant of the same

In this specific case the guard would correctly reject the loopback IP, so the parser differential is "opposite direction" (guard is more restrictive than the client). But the mismatch still undermines trust in the guard's invariants — a future change could flip the direction.

Fix: Reject any authority containing \, %, or control characters:

if authority.bytes().any(|b| b == b'\\' || b == b'%' || b < 0x20) {
    return None;
}

Or, better yet, add the url crate as a dependency (it's already transitively present via reqwest) and use its WHATWG-compliant parser instead of the hand-rolled one. This eliminates the entire class of parser-differential bugs.

4. to_socket_addrs is blocking I/O in an async context

(host.as_str(), port).to_socket_addrs() calls getaddrinfo(3), which is a blocking syscall. In api.rs::export_otel (an async handler), this blocks the tokio worker thread for the duration of DNS resolution. Under load or with a slow resolver, this starves the runtime.

Fix: Wrap in tokio::task::spawn_blocking, or use tokio::net::lookup_host instead:

let addrs: Vec<_> = tokio::net::lookup_host((host.as_str(), port))
    .await
    .map_err(|e| format!("Failed to resolve '{host}': {e}"))?
    .collect();

This requires making validate_export_endpoint async. Alternatively, keep it sync and call it from spawn_blocking at the call site in api.rs.


🟢 Nits / optional

  • docs/security-audit.md is 808 lines added as a new file. But the file already exists on master (added in an earlier session). Confirm this isn't a merge conflict waiting to happen — the system reminder says the file was modified, which suggests it was edited locally but never committed to master. If master already has it, this PR should modify, not create.

  • Integration tests duplicate boilerplate. All 4 new integration tests (test_export_otel_rejects_loopback_endpoint, _cloud_metadata_, _rfc1918_) share identical AppState setup. Extract a setup_otel_test() helper (like the existing setup() in the file). ~30 LOC saved, easier to maintain.

  • parse_host_port strips query and fragment, but the URL is passed through unchanged. If someone passes http://public.example.com/v1/traces#@169.254.169.254, the guard checks public.example.com (allowed) but the fragment @169.254.169.254 is sent to the HTTP client. This isn't exploitable since HTTP clients strip fragments before sending, but it's worth a comment noting the invariant.

  • The url crate is already in the dep tree via reqwest → url 2.x. Using it directly in Cargo.toml adds no new code to the binary. The hand-rolled parser works but is a maintenance burden — every time a new bypass vector is found, you're re-implementing URL parsing logic that the url crate already handles.


Test coverage

Strong for the IP-range checking. Gaps:

  1. No test for octal/hex/decimal IP forms — the critical bypass in #1
  2. No test for Teredo/6to4 addresses — the high bypass in #2
  3. No test for backslash/percent in authority#3
  4. No async DNS test — the #[ignore] test for public hosts is fine, but there's no test confirming that to_socket_addrs on a numeric-looking non-IP-parseable string is handled correctly
  5. e2e_allows_public_host is #[ignore] — good (CI sandboxes), but consider adding a test with a mocked DNS resolver for deterministic coverage

Security cross-check against the audit

Audit requirement Status
Reject RFC 1918
Reject link-local / cloud metadata (169.254.x.x)
Reject loopback
Reject unspecified
Reject multicast/broadcast
Reject documentation ranges
Reject CGNAT/shared (100.64/10)
Reject benchmarking (198.18/15)
Resolve hostname before connecting
Validate ALL resolved IPs (not just first)
IPv4-mapped IPv6 (::ffff:x.x.x.x)
IPv4-compatible deprecated (::x.x.x.x)
Unique-local (fc00::/7)
Link-local v6 (fe80::/10)
Teredo (2001:0000::/32) ❌ Missing — exploitable
6to4 (2002::/16) ❌ Missing — exploitable
Octal/hex IP forms ❌ Missing — platform-dependent bypass
DNS rebinding ⚠️ Documented as known limitation
Parser differential (backslash/percent) ⚠️ Not handled

Summary

  • Block on: Octal/hex IP bypass (#1), Teredo/6to4 ranges (#2)
  • Before merge: Authority sanitization (#3), async DNS (#4)
  • Follow-up: Replace hand-rolled parser with url crate, add bypass regression test suite

The guard's IP-range coverage is excellent for standard representations — 9 IPv4 categories and 8 IPv6 categories is more thorough than most SSRF guards I've reviewed. The two blocking issues are both "non-standard representation bypasses" which is the #1 class of SSRF guard evasion. Once those are patched, this is solid.

Addresses all blocking and should-fix items from the code review.

🔴 Blockers resolved:

1. Octal/hex/decimal IP bypass: added `looks_like_numeric_ip()` that
   rejects hosts consisting entirely of hex digits, dots, and x/X
   (e.g., 0177.0.0.1, 0x7f000001, 2130706433) BEFORE passing to DNS.
   Closes the parser-differential between Rust's IpAddr::parse (strict)
   and getaddrinfo(3) (platform-dependent lax parsing).

2. Teredo (2001:0000::/32) and 6to4 (2002::/16) blocks added to
   is_blocked_v6. Both IPv6 transition mechanisms embed a routable IPv4
   that a relay/gateway will connect to. An attacker could encode
   127.0.0.1 or 169.254.169.254 inside these addresses.

🟡 Should-fix resolved:

3. Authority sanitization: parse_host_port now rejects any authority
   containing backslash, percent-encoding, or control characters.
   Eliminates the entire class of parser-differential attacks where
   the guard and HTTP client disagree on the host.

4. Async DNS: validate_export_endpoint is now async, using
   tokio::net::lookup_host instead of std::net::ToSocketAddrs.
   No longer blocks the tokio worker thread during DNS resolution.

New tests: 15 additional (Teredo, 6to4, octal/hex/decimal bypass,
backslash, percent, control chars, end-to-end for each bypass class).
Total url_guard tests: 38 (was 23). All pass. Clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@risjai
Copy link
Copy Markdown
Collaborator Author

risjai commented Apr 21, 2026

Pushed 5bd16fa addressing all blockers + should-fix items.

🔴 Blockers resolved:

# Issue Fix
1 Octal/hex/decimal IP bypass looks_like_numeric_ip() rejects hosts that are all hex-digits/dots/x before DNS resolution. Tests: e2e_rejects_octal_ip_bypass, e2e_rejects_hex_ip_bypass, e2e_rejects_decimal_ip_bypass
2 Missing Teredo/6to4 is_teredo() blocks 2001:0000::/32, is_6to4() blocks 2002::/16. Tests: blocks_teredo, blocks_6to4, e2e_rejects_teredo_v6, e2e_rejects_6to4_v6

🟡 Should-fix resolved:

# Issue Fix
3 Backslash/percent/control chars in authority parse_host_port now rejects \, %, and bytes < 0x20 in the authority. Tests: rejects_backslash_in_authority, rejects_percent_in_authority, rejects_control_chars_in_authority, e2e_rejects_backslash_bypass
4 Blocking DNS validate_export_endpoint is now async, using tokio::net::lookup_host instead of std::net::ToSocketAddrs

Test count: 38 url_guard unit tests (was 23), 30 api integration tests, all green. Clippy clean.

Follow-up (not in this PR): Replace hand-rolled URL parser with url crate — reviewer is right that it's already transitively present and would eliminate the parser-differential class entirely. Scoping as a separate PR to keep this one focused on the IP-range guard.

@risjai
Copy link
Copy Markdown
Collaborator Author

risjai commented Apr 21, 2026

Follow-up review — fix commit 5bd16fa7

All 4 review items addressed. One minor false-positive in the new check, otherwise approve.

Status vs original review

# Original ask Status Notes
1 Octal/hex/decimal IP bypass looks_like_numeric_ip() at url_guard.rs:73-78 rejects non-standard numeric forms before DNS. Tests: rejects_octal_ip_form, rejects_hex_ip_form, rejects_decimal_ip_form, plus 3 end-to-end.
2 Teredo/6to4 ranges is_teredo and is_6to4 at url_guard.rs:170-182. Both check segment prefixes correctly. Tests: blocks_teredo, blocks_6to4, e2e_rejects_teredo_v6, e2e_rejects_6to4_v6.
3 Authority sanitization parse_host_port rejects \, %, and < 0x20 at url_guard.rs:202-206. Tests: rejects_backslash_in_authority, rejects_percent_in_authority, rejects_control_chars_in_authority, e2e_rejects_backslash_bypass.
4 Async DNS validate_export_endpoint is now async, uses tokio::net::lookup_host. Call site in api.rs has .await. All tests migrated to #[tokio::test].

New issue in the fix

🟡 looks_like_numeric_ip false-positive on all-hex hostnames

The check h.bytes().all(|b| b.is_ascii_hexdigit() || b == b'.' || b == b'x' || b == b'X') rejects any host composed entirely of hex chars and dots. This means:

  • cafe.babe → rejected (all hex digits + dot) — but this is a valid hostname
  • dead.beef → rejected
  • 0xfeed.example.com → allowed (contains non-hex alphabetics like l, p)
  • aabb.cc → rejected (.cc is a real TLD, aabb is all hex)

These are extremely unlikely as OTel collector hostnames, so this is a marginal over-restriction, not a security bypass. But it's worth noting for correctness. A tighter check: require that the string starts with a digit or 0x/0X prefix — real numeric IPs always start with a digit, while hostnames starting with a letter are never ambiguous:

fn looks_like_numeric_ip(host: &str) -> bool {
    let first = host.bytes().next().unwrap_or(0);
    if !(first.is_ascii_digit()) { return false; }
    let h = host.strip_prefix("0x").or_else(|| host.strip_prefix("0X")).unwrap_or(host);
    !h.is_empty() && h.bytes().all(|b| b.is_ascii_hexdigit() || b == b'.')
}

This still catches 0177.0.0.1, 0x7f000001, 2130706433 but allows cafe.babe. Not a blocker — the current version is safe (over-rejects, never under-rejects).

Verification

  • Confirmed is_teredo checks segments()[0] == 0x2001 && segments()[1] == 0x0000 — correct for 2001:0000::/32
  • Confirmed is_6to4 checks segments()[0] == 0x2002 — correct for 2002::/16
  • Confirmed looks_like_numeric_ip("0177.0.0.1") → true, looks_like_numeric_ip("example.com") → false
  • Confirmed authority sanitization fires before userinfo stripping (correct order — backslash in userinfo portion is caught)
  • Confirmed validate_export_endpoint return type changed from Result<&str, String> to Result<(), String> — call site doesn't use the returned URL, so this is a clean simplification
  • Test count: 38 url_guard tests (was 23), all converted to #[tokio::test] where async

Verdict

Approve. All blocking bypasses closed. The cafe.babe false-positive is over-restriction, not under-restriction — safe to ship and tighten later if anyone hits it. Good work on the turnaround.

@risjai risjai merged commit cc0a3f4 into master Apr 21, 2026
7 checks passed
@risjai risjai deleted the feat/ssrf-otel-export branch April 21, 2026 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant