Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VS2017 build warning C4819 on non-english code page. #85

Closed
gavxin opened this issue Jan 28, 2018 · 12 comments
Closed

VS2017 build warning C4819 on non-english code page. #85

gavxin opened this issue Jan 28, 2018 · 12 comments
Assignees

Comments

@gavxin
Copy link

gavxin commented Jan 28, 2018

Hello, I'm trying to build abseil-cpp on Windows with installed VS 2017. My default code page is 936 (Simplified Chinese) and when build abseil will occur following error.

error C2220: warning treated as error - no 'object' file generated
warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss

Here is one solution. Add /utf-8 option to MSVC_FLAGS in file absl/copts.bzl

index 20c9b61..0052db8 100644
--- a/absl/copts.bzl
+++ b/absl/copts.bzl
@@ -126,6 +126,7 @@ MSVC_FLAGS = [
     "/DNOMINMAX",  # Don't define min and max macros (windows.h)
     "/DWIN32_LEAN_AND_MEAN",  # Don't bloat namespace with incompatible winsock versions.
     "/D_CRT_SECURE_NO_WARNINGS",  # Don't warn about usage of insecure C functions
+    "/utf-8",   # Fix C4819 for non-english code page.
 ]
@JonathanDCohen JonathanDCohen self-assigned this Jan 29, 2018
@JonathanDCohen
Copy link
Contributor

Which character was the compiler complaining about?

@gavxin
Copy link
Author

gavxin commented Jan 30, 2018

I found two position.

unaligned_access.h the two double quotes are unicode chars.

// “ARMv7 or higher”, so we have to filter away all ARMv5 and ARMv6

absl/strings/str_split_test.cc the test case TEST(Split, UTF8) is testing utf-8 string splitting. It's just test case so maybe no need to change.

TEST(Split, UTF8) {
// Tests splitting utf8 strings and utf8 delimiters.
{
// A utf8 input std::string with an ascii delimiter.
std::vector<absl::string_view> v = absl::StrSplit("a,κόσμε", ',');
EXPECT_THAT(v, ElementsAre("a", "κόσμε"));
}
{
// A utf8 input std::string and a utf8 delimiter.
std::vector<absl::string_view> v = absl::StrSplit("a,κόσμε,b", ",κόσμε,");
EXPECT_THAT(v, ElementsAre("a", "b"));
}
{
// A utf8 input std::string and ByAnyChar with ascii chars.
std::vector<absl::string_view> v =
absl::StrSplit("Foo hällo th丞re", absl::ByAnyChar(" \t"));
EXPECT_THAT(v, ElementsAre("Foo", "hällo", "th丞re"));
}
}

After modify these two file, the build success.

@JonathanDCohen
Copy link
Contributor

Since Abseil is fairly low-level, instead of forcing utf-8 in msvc builds, I think the correct move is to make our source files all ascii. So I'll write up a patch to change that test to use unicode escape characters, which should make it agnostic to file encoding.

That being said, Abseil files are utf8 internally, so I'm not sure what other monsters there may be lurking if someone tries to use abseil source with other encodings.

@gavxin
Copy link
Author

gavxin commented Jan 31, 2018

It make sense, it's better make source file as ascii than forcing utf-8 for msvc.

Thanks

@manshreck
Copy link
Contributor

Note that the test is actually testing utf8 characters. I don't see how we can have a test for utf8 in ASCII ...

We should definitely prefer ASCII for source files, IMO, but I'm not sure requiring them to be ASCII is the best option.

@mbxx
Copy link
Member

mbxx commented Jan 31, 2018

Jon and I chatted about this yesterday, and I came to the conclusion that Abseil files should be ASCII. Abseil strives to be as broadly usable as possible. Usually this takes the form of writing code that works on a wide variety of platforms, dodging bugs in those platforms if necessary. Here it means avoiding UTF-8 because some toolchains we want to support can't handle it.

The alternative isn't as bad as I feared: escaping the unicode characters in the test strings lets them be UTF-8 test strings that are specified using only ASCII characters in the source code. I'm a little sad, because suddenly it's no longer clear at a glance that the escaped characters are well-formed UTF-8, but this feels like the right thing to do to me.

@jorgbrown
Copy link
Contributor

The MSVC error message says "Save the file in Unicode format to prevent data loss"

Just curious: What does it mean to save the file in Unicode format? Does it just add a UTF-8 BOM to the beginning of the file? Or is there additional metadata in the filesystem to tell MSVC that the file is Unicode?

@JonathanDCohen
Copy link
Contributor

"Unicode format" seems ambiguous. Anyways it looks like my internal patch to remove non-ascii characters has been upstreamed, and ASCII fits in all flavors of Unicode. Is this problem still persisting?

@gavxin
Copy link
Author

gavxin commented Feb 2, 2018

Try to build 0ec11ba , str_split_test.cc still cause following warnings.

ERROR: E:/xx/abseil-cpp/absl/strings/BUILD.bazel:214:1: C++ compilation of rule '//absl/strings:str_split_test' failed (Exit 2)
absl/strings/str_split_test.cc(624): error C2220: warning treated as error - no 'object' file generated
absl/strings/str_split_test.cc(624): warning C4566: character represented by universal-character-name '\u1F79' cannot be represented in the current code page (936)
absl/strings/str_split_test.cc(644): warning C4566: character represented by universal-character-name '\u00E4' cannot be represented in the current code page (936)
absl/strings/str_split_test.cc(645): warning C4566: character represented by universal-character-name '\u00E4' cannot be represented in the current code page (936)

It just test, maybe I should ignore it.

@gavxin gavxin closed this as completed Feb 11, 2018
@ahedberg
Copy link
Contributor

I'm looking into fixing the str_split_test.cc failures.

@ahedberg ahedberg reopened this Feb 14, 2018
@ahedberg ahedberg assigned ahedberg and unassigned JonathanDCohen Feb 14, 2018
@ahedberg
Copy link
Contributor

This has been fixed internally; just waiting for the change to be propagated to GitHub.

@mbxx
Copy link
Member

mbxx commented Feb 16, 2018

I just pushed the change to GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants