-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_utf8 truncates characters instead of performing the conversion #395
Comments
This comment has been minimized.
This comment has been minimized.
I have looked more into This one works as expected: #include <boost/spirit/home/support/utf8.hpp>
#include <string>
int main()
{
std::wstring s = L"привет";
return "\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82" != boost::spirit::to_utf8(s);
} |
Can you try to execute next code on Windows machine?
|
I think the problem is in you string literal. On Windows Try to run: #include <boost/spirit/home/support/utf8.hpp>
#include <string>
#include <iostream>
int main()
{
std::wstring s = L"𠼭";
for (auto c : s) std::wcout << +c << '\n';
std::wcout << L"'" << s << L"'\n";
return "\xf0\xa0\xbc\xad" == boost::spirit::to_utf8(s);
} |
I made a research and there is a problem in The problemFrom the [lex.ccon]/6:
#include <iostream>
#include <string>
int main()
{
using namespace std::literals;
std::cout << "sizeof(wchar_t): " << sizeof(wchar_t) << '\n';
std::cout << "string literal size: " << L"𠼭"s.size() << '\n';
} Linux (GCC 8.2/Clang 7)
Windows (MSVC 14.1/GCC 8.2/Clang 7):
On Windows the content of wchar_t string literal seems to be UTF-16, and this is where the problem come. Spirit does not do charset conversions, it simply feeds the data to What you can doIf you want a UTF-8 encoded string from the string literal, you would better to use the C++11 #include <boost/core/lightweight_test.hpp>
#include <boost/spirit/home/support/utf8.hpp>
#include <iostream>
int main()
{
auto s = U"𠼭";
BOOST_TEST_EQ("\xf0\xa0\xbc\xad", boost::spirit::to_utf8(s));
BOOST_TEST_CSTR_EQ("\xf0\xa0\xbc\xad", u8"𠼭");
return boost::report_errors();
} |
@isnullxbh Can you please check if #413 solves the problem for you? |
Hi, @Kojoley! Thanks a lot for publishing your research results! I can't check it on Windows machine at the moment, but I saw your merge and I'm trusting to tests you have written. |
spirit/include/boost/spirit/home/support/utf8.hpp
Line 66 in 925f40a
Is it permissible - convert wchar_t to char (in the case of type of the input parameter is std::wstring)?
The text was updated successfully, but these errors were encountered: