The purpose of this project is to explore an approach to handling relative URLs safely for redirects and forwards. Many web security vulnerabilities that originate from unvalidated redirects and forwards are often remediated by restricting URLs. This restriction usually takes the form of an allow-list of known good absolute URLs in some capacity. See OWASP Validating URLs or Google Open Redirect for examples of this. Unfortunately, not all applications can adopt an allow-listing approach because the absolute URL may not be known ahead of time. This can cause friction as the one-size-fits all approach does not always work.
Objectively, URL parsing is difficult. There are many individual components that comprise a URL, and how each component interacts with one another can be confusing. For example, authority delegation in a URL. Orange Tsai presented A New Era of SSRF at Black Hat USA 2017 highlighting some of the problems that can arise.
Much like any untrusted user input, relative URLs should be normalized, sanitised, and then validated - in that order. Normalisation and sanitation should be done through established URL parsing libraries such as URL Node package that follow the WHATWG standard. The output of these operations should then be validated using a strict pattern, only allowing required characters. Dangerous characters such as @
, #
and multiple /
characters should not be on the allow list.
Where possible handle absolute URLs to avoid introducing unnecessary complexity, OWASP Validating URLs is a great resource on such solutions.
The syntax and semantics of a URI are intentionally broad to create an extensible means for identifying resources. This introduces ambiguity as there are inconsistencies between URL parsers and the RFC2396 / RFC3986 specifications. WHATWG defined a contemporary implementation based on these specifications forming a standard. The following comporises URL Strings and URL Objects.
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ href │
├──────────┬──┬─────────────────────┬─────────────────────┬───────────────────────────┬───────┤
│ protocol │ │ auth │ host │ path │ hash │
│ │ │ ├──────────────┬──────┼──────────┬────────────────┤ │
│ │ │ │ hostname │ port │ pathname │ search │ │
│ │ │ │ │ │ ├─┬──────────────┤ │
│ │ │ │ │ │ │ │ query │ │
" https: // user : pass @ sub.host.com : 8080 /p/a/t/h ? query=string #hash "
│ │ │ │ │ hostname │ port │ │ │ │
│ │ │ │ ├──────────────┴──────┤ │ │ │
│ protocol │ │ username │ password │ host │ │ │ │
├──────────┴──┼──────────┴──────────┼─────────────────────┤ │ │ │
│ origin │ │ origin │ pathname │ search │ hash │
├─────────────┴─────────────────────┴─────────────────────┴──────────┴────────────────┴───────┤
│ href │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
At a code level, a URL can be parsed and accessed through a convinient object as seen below:
const { URL } = require('url');
var url = 'https://user:pass@sub.host.com:8080/p/a/t/h?query=string#has'
var newUrl = new URL(url);
console.log(newUrl)
URL {
href: 'https://user:pass@sub.host.com:8080//p/a/t/h?query=string#has',
origin: 'https://sub.host.com:8080',
protocol: 'https:',
username: 'user',
password: 'pass',
host: 'sub.host.com:8080',
hostname: 'sub.host.com',
port: '8080',
pathname: '//p/a/t/h',
search: '?query=string',
searchParams: URLSearchParams { 'query' => 'string' },
hash: '#has'
}
3.2. Authority
Many URI schemes include a hierarchical element for a naming
authority so that governance of the name space defined by the
remainder of the URI is delegated to that authority (which may, in
turn, delegate it further). The generic syntax provides a common
means for distinguishing an authority based on a registered name or
server address, along with optional port and user information.
The authority component is preceded by a double slash ("//") and is
terminated by the next slash ("/"), question mark ("?"), or number
sign ("#") character, or by the end of the URI.
authority = [ userinfo "@" ] host [ ":" port ]
URI producers and normalizers should omit the ":" delimiter that
separates host from port if the port component is empty. Some
schemes do not allow the userinfo and/or port subcomponents.
If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. Non-
validating parsers (those that merely separate a URI reference into
its major components) will often ignore the subcomponent structure of
authority, treating it as an opaque string from the double-slash to
the first terminating delimiter, until such time as the URI is
dereferenced.
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
A relative reference that begins with two slash characters is termed
a network-path reference; such references are rarely used. A
relative reference that begins with a single slash character is
termed an absolute-path reference. A relative reference that does
not begin with a slash character is termed a relative-path reference.
Defined in the WHATWG Goals, if a url contains percent-encoded bytes it returns percent-decode.
An example of this can be seen below:
node app.js
Server running at http://127.0.0.1:3000/
URL Requested
Raw url: /?nextUrl=/nikola.dev
Parsed nextUrl parameter: /nikola.dev
URL Requested
Raw url: /?nextUrl=%2Fnikola.dev
Parsed nextUrl parameter: /nikola.dev
Modern browsers automatically convert back slashes (\
) into forward slashes (/
) despite this being against RFC3986 - URI Genric Syntax. In addition, the @
character can be used to define a target host redirecting the victim to a new domain, this type of attack is defined as Semantic Attacks.
The dangerous characters and encoded versions can be seen below:
127.0.0.1:3000?nextUrl=//nikola.dev
127.0.0.1:3000?nextUrl=/%2Fnikola.dev
127.0.0.1:3000?nextUrl=%2F%2Fnikola.dev
127.0.0.1:3000?nextUrl=\\nikola.dev
127.0.0.1:3000?nextUrl=\%5Cnikola.dev
127.0.0.1:3000?nextUrl=%5C%5Cnikola.dev
Interestingly, the \
and /
characters (and URL encoded equivalents) can repeat and are interchangable. The following is a valid payload:
http://127.0.0.1:3000/?nextUrl=/%5C/%5C/\%2F\/\%2F\/\%2F\/nikola.dev
Attackers can use this to bypass filters also depending on the underlying logic, for example if the nextUrl
must have example.com
this can be bypassed:
127.0.0.1:3000?nextUrl=//example.com%40nikola.dev
127.0.0.1:3000?nextUrl=//example.com@nikola.dev
Run the application locally using the following:
node app.js
- Simple redirect
- Redirect to parameters for example
?params=blah