Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Align forbidden host and domain code points with spec and other browsers
https://bugs.webkit.org/show_bug.cgi?id=257213
rdar://109725285

Reviewed by Chris Dumez.

https://url.spec.whatwg.org/#forbidden-host-code-point and https://url.spec.whatwg.org/#forbidden-domain-code-point
are different sets of code points, the latter used when parsing special URL schemes and the former used when parsing
non-special URL schemes, though there is a large overlap.  Chrome and Firefox pass the tests that this PR makes WebKit pass.

* LayoutTests/imported/w3c/web-platform-tests/url/a-element-xhtml_exclude=(file_javascript_mailto)-expected.txt:
* LayoutTests/imported/w3c/web-platform-tests/url/a-element_exclude=(file_javascript_mailto)-expected.txt:
* LayoutTests/imported/w3c/web-platform-tests/url/url-constructor.any.worker_exclude=(file_javascript_mailto)-expected.txt:
* LayoutTests/imported/w3c/web-platform-tests/url/url-constructor.any_exclude=(file_javascript_mailto)-expected.txt:
* LayoutTests/imported/w3c/web-platform-tests/url/url-origin.any-expected.txt:
* LayoutTests/imported/w3c/web-platform-tests/url/url-origin.any.worker-expected.txt:
* LayoutTests/imported/w3c/web-platform-tests/url/url-setters-stripping.any-expected.txt:
* LayoutTests/imported/w3c/web-platform-tests/url/url-setters-stripping.any.worker-expected.txt:
* Source/WTF/wtf/URLParser.cpp:
(WTF::isC0Control):
(WTF::URLParser::isForbiddenHostCodePoint):
(WTF::URLParser::isForbiddenDomainCodePoint):
(WTF::URLParser::hasForbiddenHostCodePoint):
(WTF::URLParser::parseHostAndPort):
(WTF::isForbiddenHostCodePoint): Deleted.
* Source/WTF/wtf/URLParser.h:

Canonical link: https://commits.webkit.org/264482@main
  • Loading branch information
achristensen07 committed May 24, 2023
1 parent 56e5d7c commit 235db6f
Show file tree
Hide file tree
Showing 10 changed files with 89 additions and 72 deletions.
Expand Up @@ -433,7 +433,7 @@ PASS Parsing: <http://ho%5Dst/> against <about:blank>
PASS Parsing: <http://ho%7Cst/> against <about:blank>
PASS Parsing: <http://ho%7Fst/> against <about:blank>
PASS Parsing: <http://!"$&'()*+,-.;=_`{}~/> against <about:blank>
FAIL Parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> against <about:blank> assert_equals: href expected "sc://%01%02%03%04%05%06%07%08%0B%0C%0E%0F%10%11%12%13%14%15%16%17%18%19%1A%1B%1C%1D%1E%1F%7F!\"$%&'()*+,-.;=_`{}~/" but got "sc://\x01\x02\x03\x04\x05\x06\x07\b\v\f\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!\"$%&'()*+,-.;=_`{}~/"
PASS Parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> against <about:blank>
PASS Parsing: <ftp://example.com%80/> against <about:blank>
PASS Parsing: <ftp://example.com%A0/> against <about:blank>
PASS Parsing: <https://example.com%80/> against <about:blank>
Expand Down
Expand Up @@ -433,7 +433,7 @@ PASS Parsing: <http://ho%5Dst/> against <about:blank>
PASS Parsing: <http://ho%7Cst/> against <about:blank>
PASS Parsing: <http://ho%7Fst/> against <about:blank>
PASS Parsing: <http://!"$&'()*+,-.;=_`{}~/> against <about:blank>
FAIL Parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> against <about:blank> assert_equals: href expected "sc://%01%02%03%04%05%06%07%08%0B%0C%0E%0F%10%11%12%13%14%15%16%17%18%19%1A%1B%1C%1D%1E%1F%7F!\"$%&'()*+,-.;=_`{}~/" but got "sc://\x01\x02\x03\x04\x05\x06\x07\b\v\f\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!\"$%&'()*+,-.;=_`{}~/"
PASS Parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> against <about:blank>
PASS Parsing: <ftp://example.com%80/> against <about:blank>
PASS Parsing: <ftp://example.com%A0/> against <about:blank>
PASS Parsing: <https://example.com%80/> against <about:blank>
Expand Down
Expand Up @@ -432,7 +432,7 @@ PASS Parsing: <http://ho%5Dst/> without base
PASS Parsing: <http://ho%7Cst/> without base
PASS Parsing: <http://ho%7Fst/> without base
PASS Parsing: <http://!"$&'()*+,-.;=_`{}~/> without base
FAIL Parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> without base "sc:// !"$%&'()*+,-.;=_`{}~/" cannot be parsed as a URL.
PASS Parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> without base
PASS Parsing: <ftp://example.com%80/> without base
PASS Parsing: <ftp://example.com%A0/> without base
PASS Parsing: <https://example.com%80/> without base
Expand Down
Expand Up @@ -432,7 +432,7 @@ PASS Parsing: <http://ho%5Dst/> without base
PASS Parsing: <http://ho%7Cst/> without base
PASS Parsing: <http://ho%7Fst/> without base
PASS Parsing: <http://!"$&'()*+,-.;=_`{}~/> without base
FAIL Parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> without base "sc:// !"$%&'()*+,-.;=_`{}~/" cannot be parsed as a URL.
PASS Parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> without base
PASS Parsing: <ftp://example.com%80/> without base
PASS Parsing: <ftp://example.com%A0/> without base
PASS Parsing: <https://example.com%80/> without base
Expand Down
Expand Up @@ -266,7 +266,7 @@ PASS Origin parsing: <wow:%1G> without base
PASS Origin parsing: <wow:￿> without base
PASS Origin parsing: <http://example.com/U+d800𐟾U+dfff﷐﷏﷯ﷰ￾￿?U+d800𐟾U+dfff﷐﷏﷯ﷰ￾￿> without base
PASS Origin parsing: <http://!"$&'()*+,-.;=_`{}~/> without base
FAIL Origin parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> without base "sc:// !"$%&'()*+,-.;=_`{}~/" cannot be parsed as a URL.
PASS Origin parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> without base
PASS Origin parsing: <ftp://%e2%98%83> without base
PASS Origin parsing: <https://%e2%98%83> without base
PASS Origin parsing: <http://127.0.0.1:10100/relative_import.html> without base
Expand Down
Expand Up @@ -266,7 +266,7 @@ PASS Origin parsing: <wow:%1G> without base
PASS Origin parsing: <wow:￿> without base
PASS Origin parsing: <http://example.com/U+d800𐟾U+dfff﷐﷏﷯ﷰ￾￿?U+d800𐟾U+dfff﷐﷏﷯ﷰ￾￿> without base
PASS Origin parsing: <http://!"$&'()*+,-.;=_`{}~/> without base
FAIL Origin parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> without base "sc:// !"$%&'()*+,-.;=_`{}~/" cannot be parsed as a URL.
PASS Origin parsing: <sc:// !"$%&'()*+,-.;=_`{}~/> without base
PASS Origin parsing: <ftp://%e2%98%83> without base
PASS Origin parsing: <https://%e2%98%83> without base
PASS Origin parsing: <http://127.0.0.1:10100/relative_import.html> without base
Expand Down
Expand Up @@ -241,12 +241,12 @@ PASS Setting username with trailing U+001F (wpt++:)
PASS Setting password with leading U+001F (wpt++:)
PASS Setting password with middle U+001F (wpt++:)
PASS Setting password with trailing U+001F (wpt++:)
FAIL Setting host with leading U+001F (wpt++:) assert_equals: property expected "%1Ftest:8000" but got "host:8000"
FAIL Setting hostname with leading U+001F (wpt++:) assert_equals: property expected "%1Ftest" but got "host"
FAIL Setting host with middle U+001F (wpt++:) assert_equals: property expected "te%1Fst:8000" but got "host:8000"
FAIL Setting hostname with middle U+001F (wpt++:) assert_equals: property expected "te%1Fst" but got "host"
FAIL Setting host with trailing U+001F (wpt++:) assert_equals: property expected "test%1F:8000" but got "host:8000"
FAIL Setting hostname with trailing U+001F (wpt++:) assert_equals: property expected "test%1F" but got "host"
PASS Setting host with leading U+001F (wpt++:)
PASS Setting hostname with leading U+001F (wpt++:)
PASS Setting host with middle U+001F (wpt++:)
PASS Setting hostname with middle U+001F (wpt++:)
PASS Setting host with trailing U+001F (wpt++:)
PASS Setting hostname with trailing U+001F (wpt++:)
PASS Setting port with leading U+001F (wpt++:)
PASS Setting port with middle U+001F (wpt++:)
PASS Setting port with trailing U+001F (wpt++:)
Expand Down
Expand Up @@ -241,12 +241,12 @@ PASS Setting username with trailing U+001F (wpt++:)
PASS Setting password with leading U+001F (wpt++:)
PASS Setting password with middle U+001F (wpt++:)
PASS Setting password with trailing U+001F (wpt++:)
FAIL Setting host with leading U+001F (wpt++:) assert_equals: property expected "%1Ftest:8000" but got "host:8000"
FAIL Setting hostname with leading U+001F (wpt++:) assert_equals: property expected "%1Ftest" but got "host"
FAIL Setting host with middle U+001F (wpt++:) assert_equals: property expected "te%1Fst:8000" but got "host:8000"
FAIL Setting hostname with middle U+001F (wpt++:) assert_equals: property expected "te%1Fst" but got "host"
FAIL Setting host with trailing U+001F (wpt++:) assert_equals: property expected "test%1F:8000" but got "host:8000"
FAIL Setting hostname with trailing U+001F (wpt++:) assert_equals: property expected "test%1F" but got "host"
PASS Setting host with leading U+001F (wpt++:)
PASS Setting hostname with leading U+001F (wpt++:)
PASS Setting host with middle U+001F (wpt++:)
PASS Setting hostname with middle U+001F (wpt++:)
PASS Setting host with trailing U+001F (wpt++:)
PASS Setting hostname with trailing U+001F (wpt++:)
PASS Setting port with leading U+001F (wpt++:)
PASS Setting port with middle U+001F (wpt++:)
PASS Setting port with trailing U+001F (wpt++:)
Expand Down
121 changes: 68 additions & 53 deletions Source/WTF/wtf/URLParser.cpp
Expand Up @@ -56,50 +56,51 @@ enum URLCharacterClass {
UserInfo = 0x1,
Default = 0x2,
ForbiddenHost = 0x4,
QueryPercent = 0x8,
SlashQuestionOrHash = 0x10,
ValidScheme = 0x20,
ForbiddenDomain = 0x8,
QueryPercent = 0x10,
SlashQuestionOrHash = 0x20,
ValidScheme = 0x40,
};

static const uint8_t characterClassTable[256] = {
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x0
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x1
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x2
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x3
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x4
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x5
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x6
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x7
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x8
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x9
UserInfo | Default | QueryPercent | ForbiddenHost, // 0xA
UserInfo | Default | QueryPercent | ForbiddenHost, // 0xB
UserInfo | Default | QueryPercent | ForbiddenHost, // 0xC
UserInfo | Default | QueryPercent | ForbiddenHost, // 0xD
UserInfo | Default | QueryPercent | ForbiddenHost, // 0xE
UserInfo | Default | QueryPercent | ForbiddenHost, // 0xF
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x10
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x11
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x12
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x13
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x14
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x15
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x16
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x17
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x18
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x19
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x1A
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x1B
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x1C
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x1D
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x1E
UserInfo | Default | QueryPercent | ForbiddenHost, // 0x1F
UserInfo | Default | QueryPercent | ForbiddenHost, // ' '
UserInfo | Default | QueryPercent | ForbiddenHost | ForbiddenDomain, // 0x0
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x1
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x2
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x3
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x4
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x5
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x6
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x7
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x8
UserInfo | Default | QueryPercent | ForbiddenHost | ForbiddenDomain, // 0x9
UserInfo | Default | QueryPercent | ForbiddenHost | ForbiddenDomain, // 0xA
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0xB
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0xC
UserInfo | Default | QueryPercent | ForbiddenHost | ForbiddenDomain, // 0xD
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0xE
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0xF
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x10
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x11
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x12
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x13
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x14
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x15
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x16
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x17
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x18
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x19
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x1A
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x1B
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x1C
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x1D
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x1E
UserInfo | Default | QueryPercent | ForbiddenDomain, // 0x1F
UserInfo | Default | QueryPercent | ForbiddenHost | ForbiddenDomain, // ' '
0, // '!'
UserInfo | Default | QueryPercent, // '"'
UserInfo | Default | QueryPercent | SlashQuestionOrHash | ForbiddenHost, // '#'
UserInfo | Default | QueryPercent | SlashQuestionOrHash | ForbiddenHost | ForbiddenDomain, // '#'
0, // '$'
ForbiddenHost, // '%'
ForbiddenDomain, // '%'
0, // '&'
0, // '\''
0, // '('
Expand All @@ -109,7 +110,7 @@ static const uint8_t characterClassTable[256] = {
0, // ','
ValidScheme, // '-'
ValidScheme, // '.'
UserInfo | SlashQuestionOrHash | ForbiddenHost, // '/'
UserInfo | SlashQuestionOrHash | ForbiddenHost | ForbiddenDomain, // '/'
ValidScheme, // '0'
ValidScheme, // '1'
ValidScheme, // '2'
Expand All @@ -120,13 +121,13 @@ static const uint8_t characterClassTable[256] = {
ValidScheme, // '7'
ValidScheme, // '8'
ValidScheme, // '9'
UserInfo | ForbiddenHost, // ':'
UserInfo | ForbiddenHost | ForbiddenDomain, // ':'
UserInfo, // ';'
UserInfo | Default | QueryPercent | ForbiddenHost, // '<'
UserInfo | Default | QueryPercent | ForbiddenHost | ForbiddenDomain, // '<'
UserInfo, // '='
UserInfo | Default | QueryPercent | ForbiddenHost, // '>'
UserInfo | Default | SlashQuestionOrHash | ForbiddenHost, // '?'
UserInfo | ForbiddenHost, // '@'
UserInfo | Default | QueryPercent | ForbiddenHost | ForbiddenDomain, // '>'
UserInfo | Default | SlashQuestionOrHash | ForbiddenHost | ForbiddenDomain, // '?'
UserInfo | ForbiddenHost | ForbiddenDomain, // '@'
ValidScheme, // 'A'
ValidScheme, // 'B'
ValidScheme, // 'C'
Expand All @@ -153,10 +154,10 @@ static const uint8_t characterClassTable[256] = {
ValidScheme, // 'X'
ValidScheme, // 'Y'
ValidScheme, // 'Z'
UserInfo | ForbiddenHost, // '['
UserInfo | SlashQuestionOrHash | ForbiddenHost, // '\\'
UserInfo | ForbiddenHost, // ']'
UserInfo | ForbiddenHost, // '^'
UserInfo | ForbiddenHost | ForbiddenDomain, // '['
UserInfo | SlashQuestionOrHash | ForbiddenHost | ForbiddenDomain, // '\\'
UserInfo | ForbiddenHost | ForbiddenDomain, // ']'
UserInfo | ForbiddenHost | ForbiddenDomain, // '^'
0, // '_'
UserInfo | Default, // '`'
ValidScheme, // 'a'
Expand Down Expand Up @@ -186,10 +187,10 @@ static const uint8_t characterClassTable[256] = {
ValidScheme, // 'y'
ValidScheme, // 'z'
UserInfo | Default, // '{'
UserInfo | ForbiddenHost, // '|'
UserInfo | ForbiddenHost | ForbiddenDomain, // '|'
UserInfo | Default, // '}'
0, // '~'
QueryPercent | ForbiddenHost, // 0x7F
QueryPercent | ForbiddenDomain, // 0x7F
QueryPercent, // 0x80
QueryPercent, // 0x81
QueryPercent, // 0x82
Expand Down Expand Up @@ -330,7 +331,21 @@ template<typename CharacterType> ALWAYS_INLINE static bool isInUserInfoEncodeSet
template<typename CharacterType> ALWAYS_INLINE static bool isPercentOrNonASCII(CharacterType character) { return !isASCII(character) || character == '%'; }
template<typename CharacterType> ALWAYS_INLINE static bool isSlashQuestionOrHash(CharacterType character) { return character <= '\\' && characterClassTable[character] & SlashQuestionOrHash; }
template<typename CharacterType> ALWAYS_INLINE static bool isValidSchemeCharacter(CharacterType character) { return character <= 'z' && characterClassTable[character] & ValidScheme; }
template<typename CharacterType> ALWAYS_INLINE static bool isForbiddenHostCodePoint(CharacterType character) { return character <= 0x7F && characterClassTable[character] & ForbiddenHost; }

template<typename CharacterType>
ALWAYS_INLINE bool URLParser::isForbiddenHostCodePoint(CharacterType character)
{
ASSERT(!m_urlIsSpecial);
return character <= 0x7F && characterClassTable[character] & ForbiddenHost;
}

template<typename CharacterType>
ALWAYS_INLINE bool URLParser::isForbiddenDomainCodePoint(CharacterType character)
{
ASSERT(m_urlIsSpecial);
return character <= 0x7F && characterClassTable[character] & ForbiddenDomain;
}

ALWAYS_INLINE static bool shouldPercentEncodeQueryByte(uint8_t byte, const bool& urlIsSpecial)
{
if (characterClassTable[byte] & QueryPercent)
Expand Down Expand Up @@ -2568,7 +2583,7 @@ template<typename CharacterType> std::optional<URLParser::LCharBuffer> URLParser
bool URLParser::hasForbiddenHostCodePoint(const URLParser::LCharBuffer& asciiDomain)
{
for (size_t i = 0; i < asciiDomain.size(); ++i) {
if (isForbiddenHostCodePoint(asciiDomain[i]))
if (isForbiddenDomainCodePoint(asciiDomain[i]))
return true;
}
return false;
Expand Down Expand Up @@ -2779,7 +2794,7 @@ auto URLParser::parseHostAndPort(CodePointIterator<CharacterType> iterator) -> H
continue;
if (*iterator == ':')
break;
if (isForbiddenHostCodePoint(*iterator))
if (isForbiddenDomainCodePoint(*iterator))
return HostParsingResult::InvalidHost;
}
auto address = parseIPv4Host(hostIterator, CodePointIterator<CharacterType>(hostIterator, iterator));
Expand Down
4 changes: 3 additions & 1 deletion Source/WTF/wtf/URLParser.h
Expand Up @@ -120,7 +120,7 @@ class URLParser {
template<typename CharacterType> std::optional<LCharBuffer> domainToASCII(StringImpl&, const CodePointIterator<CharacterType>& iteratorForSyntaxViolationPosition);
template<typename CharacterType> LCharBuffer percentDecode(const LChar*, size_t, const CodePointIterator<CharacterType>& iteratorForSyntaxViolationPosition);
static LCharBuffer percentDecode(const LChar*, size_t);
static bool hasForbiddenHostCodePoint(const LCharBuffer&);
bool hasForbiddenHostCodePoint(const LCharBuffer&);
void percentEncodeByte(uint8_t);
void appendToASCIIBuffer(UChar32);
void appendToASCIIBuffer(const char*, size_t);
Expand Down Expand Up @@ -151,6 +151,8 @@ class URLParser {

enum class URLPart;
template<typename CharacterType> void copyURLPartsUntil(const URL& base, URLPart, const CodePointIterator<CharacterType>&, const URLTextEncoding*&);
template<typename CharacterType> bool isForbiddenHostCodePoint(CharacterType);
template<typename CharacterType> bool isForbiddenDomainCodePoint(CharacterType);
static size_t urlLengthUntilPart(const URL&, URLPart);
void popPath();
bool shouldPopPath(unsigned);
Expand Down

0 comments on commit 235db6f

Please sign in to comment.