Unicode charset #504

cmazakas · 2019-05-04T13:59:37Z

This pull request gives X3 Unicode literal support for its char_set<boost::spirit::char_encoding::unicode>. This passively enables boost::spirit::x3::unicode::char_ to now support string literals such as: U"hello, \u20ac!".

This PR also extends the X3 test utility to work with boost::basic_string_view as well.

djowel · 2019-05-04T18:06:22Z

Looks good. I'll study this as soon as I fix the (unrelated) failing tests on Visual Studio.

djowel · 2019-05-04T19:31:44Z

One comment ^^, otherwise it looks good. I just fixed the (unrelated) VS failing test.

djowel · 2019-05-04T19:28:06Z

test/x3/char_set.cpp

@@ -0,0 +1,89 @@
+#define BOOST_SPIRIT_X3_UNICODE


You'll need a copyright notice at the top with the Boost license. See other files.

Kojoley · 2019-05-05T14:18:44Z

include/boost/spirit/home/x3/support/traits/string_traits.hpp

@@ -93,42 +93,63 @@ namespace boost { namespace spirit { namespace x3 { namespace traits
    template <>
    struct char_type_of<wchar_t> : mpl::identity<wchar_t> {};

+    template <>
+    struct char_type_of<char32_t> : mpl::identity<char32_t> {};


I am inclined towards simplifying and generalizing the char_type_of instead because there are more char types and soon or later the question will rise again.

I am inclined towards simplifying and generalizing the char_type_of instead because there are more char types and soon or later the question will rise again.

Makes sense. I'd love to see some ideas in code.

Oh cool, that PR obsoletes mine then, no? I'm fine with that. Less code maintenance in general is better.

We can still keep the small tests that I added though as those still have some value.

We can still keep the small tests that I added though as those still have some value.

Absolutely. I just merged #507. Could you please do a final tweak on this PR to deal the changes? Thanks!

Alright, took a big of git magic to get there but I managed to condense my PR down to 1 commit that just updates the test files.

Kojoley · 2019-05-05T14:57:28Z

test/x3/char_set.cpp

+    // ascii
+    {
+        using namespace boost::spirit::x3;
+        using char_set = char_set<boost::spirit::char_encoding::ascii>;


Ain't we already have those tests?

char_set is just an underlying parser of char_("bla") if you did not know it https://wandbox.org/permlink/euLu4sbtPOFKnxBg

Ain't we already have those tests?

Good question! I was thinking about that actually. Was it in char1.cpp? Perhaps we can just augment those?

Ah, Kojoley is correct.

… if you did not know it

At the time, I didn't. You are correct, the ASCII tests are repetitious in this case.

Should I simply move the Unicode tests here to the char1.cpp test file?

At the time, I didn't. You are correct, the ASCII tests are repetitious in this case.

Should I simply move the Unicode tests here to the char1.cpp test file?

Yes please :-)

cmazakas · 2019-05-07T15:30:10Z

For some reason, the Travis CI link is showing the wrong build job. This is the correct one that passes for the latest changes:
https://travis-ci.org/boostorg/spirit/builds/529120733

djowel · 2019-05-07T22:20:51Z

For some reason, the Travis CI link is showing the wrong build job. This is the correct one that passes for the latest changes:
https://travis-ci.org/boostorg/spirit/builds/529120733

We should be good.

djowel · 2019-05-07T22:22:29Z

Merged. Many thanks for the splendid PR.

djowel self-assigned this May 4, 2019

djowel requested a review from Kojoley May 4, 2019 19:32

djowel requested changes May 4, 2019

View reviewed changes

Kojoley reviewed May 5, 2019

View reviewed changes

cmazakas force-pushed the unicode-charset branch from 4fab420 to 0002ce1 Compare May 6, 2019 15:54

cmazakas closed this May 7, 2019

cmazakas force-pushed the unicode-charset branch from 0002ce1 to 90d03d9 Compare May 7, 2019 03:55

Updated tests

017478b

cmazakas reopened this May 7, 2019

djowel merged commit 3d0dafe into boostorg:develop May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode charset #504

Unicode charset #504

cmazakas commented May 4, 2019

djowel commented May 4, 2019

djowel commented May 4, 2019

djowel May 4, 2019

cmazakas May 5, 2019

Kojoley May 5, 2019 •

edited

djowel May 5, 2019

Kojoley May 6, 2019

cmazakas May 6, 2019 •

edited

djowel May 6, 2019

cmazakas May 7, 2019

Kojoley May 5, 2019 •

edited

djowel May 5, 2019

cmazakas May 5, 2019 •

edited

djowel May 6, 2019

cmazakas May 6, 2019

cmazakas commented May 7, 2019 •

edited

djowel commented May 7, 2019

djowel commented May 7, 2019

Unicode charset #504

Unicode charset #504

Conversation

cmazakas commented May 4, 2019

djowel commented May 4, 2019

djowel commented May 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kojoley May 5, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmazakas May 6, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kojoley May 5, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmazakas May 5, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmazakas commented May 7, 2019 • edited

djowel commented May 7, 2019

djowel commented May 7, 2019

Kojoley May 5, 2019 •

edited

cmazakas May 6, 2019 •

edited

Kojoley May 5, 2019 •

edited

cmazakas May 5, 2019 •

edited

cmazakas commented May 7, 2019 •

edited