Skip to content

Commit

Permalink
fix: Work around endianness problem in Util::read_text_file
Browse files Browse the repository at this point in the history
The code in Util::read_text_file for converting UTF-16LE to UTF-8 only
works on little-endian machines. This makes the unit test fail on
big-endian machines.

Since the conversion is only needed on Windows (for Visual Studio, which
creates UTF-16LE .rsp files) in practice, work around the problem by
only doing the conversion in Windows builds.

Fixes #1014.
  • Loading branch information
jrosdahl committed Mar 19, 2022
1 parent 05ac8dc commit dfb3111
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 0 deletions.
8 changes: 8 additions & 0 deletions src/Util.cpp
Expand Up @@ -209,13 +209,15 @@ rewrite_stderr_to_absolute_paths(string_view text)
return result;
}

#ifdef _WIN32
bool
has_utf16_le_bom(string_view text)
{
return text.size() > 1
&& ((static_cast<uint8_t>(text[0]) == 0xff
&& static_cast<uint8_t>(text[1]) == 0xfe));
}
#endif

} // namespace

Expand Down Expand Up @@ -1190,14 +1192,20 @@ std::string
read_text_file(const std::string& path, size_t size_hint)
{
std::string result = read_file(path, size_hint);
#ifdef _WIN32
// Convert to UTF-8 if the content starts with a UTF-16 little-endian BOM.
//
// Note that this code assumes a little-endian machine, which is why it's
// #ifdef-ed to only run on Windows (which is always little-endian) where it's
// actually needed.
if (has_utf16_le_bom(result)) {
result.erase(0, 2); // Remove BOM.
std::u16string result_as_u16((result.size() / 2) + 1, '\0');
result_as_u16 = reinterpret_cast<const char16_t*>(result.c_str());
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> converter;
result = converter.to_bytes(result_as_u16);
}
#endif
return result;
}

Expand Down
2 changes: 2 additions & 0 deletions unittest/test_Util.cpp
Expand Up @@ -689,6 +689,7 @@ TEST_CASE("Util::{read,write,copy}_file with binary files")
CHECK(Util::read_file("copy") == data);
}

#ifdef _WIN32
TEST_CASE("Util::read_text_file with UTF-16 little endian encoding")
{
TestContext test_context;
Expand All @@ -706,6 +707,7 @@ TEST_CASE("Util::read_text_file with UTF-16 little endian encoding")
Util::write_file("test", data);
CHECK(Util::read_text_file("test") == "abc");
}
#endif

TEST_CASE("Util::remove_extension")
{
Expand Down

0 comments on commit dfb3111

Please sign in to comment.