Skip to content
Permalink
Browse files
Use unorm2_normalize instead of precomposedStringWithCanonicalMapping…
… in userVisibleString

https://bugs.webkit.org/show_bug.cgi?id=192945

Reviewed by Alex Christensen.

Replace use of the nice NSString function precomposedStringWithCanonicalMapping with the ICU
API unorm2_normalize. This is to prep the code for translation to cross-platform C++. Of
course this is much worse than the preexisting code, but this is just a transitional
measure and not the final state of the code. It wouldn't make sense to do this if the code
were to remain Objective C++.

* wtf/cocoa/NSURLExtras.mm:
(WTF::toNormalizationFormC):
(WTF::userVisibleString):

Canonical link: https://commits.webkit.org/207939@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@239970 268f45cc-cd09-0410-ab3c-d52691b4dbfc
  • Loading branch information
Michael Catanzaro committed Jan 15, 2019
1 parent bb6da90 commit f8e2493a1845a181bca3d0ba5644086692114cbb
Showing 2 changed files with 46 additions and 1 deletion.
@@ -1,3 +1,20 @@
2019-01-14 Michael Catanzaro <mcatanzaro@igalia.com>

Use unorm2_normalize instead of precomposedStringWithCanonicalMapping in userVisibleString
https://bugs.webkit.org/show_bug.cgi?id=192945

Reviewed by Alex Christensen.

Replace use of the nice NSString function precomposedStringWithCanonicalMapping with the ICU
API unorm2_normalize. This is to prep the code for translation to cross-platform C++. Of
course this is much worse than the preexisting code, but this is just a transitional
measure and not the final state of the code. It wouldn't make sense to do this if the code
were to remain Objective C++.

* wtf/cocoa/NSURLExtras.mm:
(WTF::toNormalizationFormC):
(WTF::userVisibleString):

2019-01-14 Alex Christensen <achristensen@webkit.org>

Bulgarian TLD should not punycode-encode URLs with Bulgarian Cyrillic characters
@@ -32,6 +32,7 @@

#import <unicode/uchar.h>
#import <unicode/uidna.h>
#import <unicode/unorm.h>
#import <unicode/uscript.h>
#import <wtf/Function.h>
#import <wtf/HexNumber.h>
@@ -1105,6 +1106,31 @@ static CFStringRef createStringWithEscapedUnsafeCharacters(CFStringRef string)
return CFStringCreateWithCharacters(nullptr, outBuffer.data(), outBuffer.size());
}

static String toNormalizationFormC(const String& string)
{
auto sourceBuffer = string.charactersWithNullTermination();
ASSERT(sourceBuffer.last() == '\0');
sourceBuffer.removeLast();

String result;
Vector<UChar, URL_BYTES_BUFFER_LENGTH> normalizedCharacters(sourceBuffer.size());
UErrorCode uerror = U_ZERO_ERROR;
int32_t normalizedLength = 0;
const UNormalizer2 *normalizer = unorm2_getNFCInstance(&uerror);
if (!U_FAILURE(uerror)) {
normalizedLength = unorm2_normalize(normalizer, sourceBuffer.data(), sourceBuffer.size(), normalizedCharacters.data(), normalizedCharacters.size(), &uerror);
if (uerror == U_BUFFER_OVERFLOW_ERROR) {
uerror = U_ZERO_ERROR;
normalizedCharacters.resize(normalizedLength);
normalizedLength = unorm2_normalize(normalizer, sourceBuffer.data(), sourceBuffer.size(), normalizedCharacters.data(), normalizedLength, &uerror);
}
if (!U_FAILURE(uerror))
result = String(normalizedCharacters.data(), normalizedLength);
}

return result;
}

NSString *userVisibleString(NSURL *URL)
{
NSData *data = originalURLData(URL);
@@ -1175,7 +1201,9 @@ static CFStringRef createStringWithEscapedUnsafeCharacters(CFStringRef string)
result = mappedResult;
}

result = [result precomposedStringWithCanonicalMapping];
auto wtfString = String(result.get());
auto normalized = toNormalizationFormC(wtfString);
result = static_cast<NSString *>(normalized);
return CFBridgingRelease(createStringWithEscapedUnsafeCharacters((__bridge CFStringRef)result.get()));
}

0 comments on commit f8e2493

Please sign in to comment.