Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[unicode-grant] Commit New version of GraphemeBreakTest.t #267

Merged
merged 3 commits into from May 11, 2017
Merged

Conversation

@samcv
Copy link
Contributor

@samcv samcv commented May 10, 2017

New script tests the contents of each grapheme individually from
the GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.

Previously we only checked the total number of ‘.chars’ for the
string as a whole. Here we actually check the string length as well
as that each grapheme contains the exact correct codepoints
in the correct order and correct graphemes.

This new test uses a grammar to parse the file and generally is much more
robust than the previous script.

Running the parse class generates an array of arrays where the index
of the outer array indicates which

[[10084, 776], [9757]] would indicate the 0th grapheme is made up of
cp's 10084 and 776 and the 1st grapheme is made up cp 9757.

@samcv
Copy link
Contributor Author

@samcv samcv commented May 10, 2017

Looks like github is trying to show a diff for a file that I deleted and put in the new one. To see the new test file go here, instead of seeing a diff against a really long file that was deleted: https://github.com/samcv/roast/blob/ce6eb28b17f3722cf93724fa768f42614d9b4d2e/S15-nfg/GraphemeBreakTest.t

Reworked it into multiple commits. Can view here: be6b376

@samcv samcv force-pushed the samcv:gcb-- branch from ce6eb28 to be6b376 May 10, 2017
samcv added 2 commits May 10, 2017
New script tests the contents of each grapheme individually from
the GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.

Previously we only checked the total number of ‘.chars’ for the
string as a whole. Here we actually check the string length as well
as that each grapheme contains the exact correct codepoints
in the correct order and correct graphemes.

This new test uses a grammar to parse the file and generally is much more
robust than the previous script.

Running the parse class generates an array of arrays where the index
of the outer array indicates which

[[10084, 776], [9757]] would indicate the 0th grapheme is made up of
cp's 10084 and 776 and the 1st grapheme is made up cp 9757.

* Add in UCD 9.0's GraphemeBreakTest.txt to
  3rdparty/Unicode/9.0.0/ucd/auxiliary/GraphemeBreakTest.txt

* Add Unicode license to 3rdparty/Unicode/LICENSE
@samcv samcv force-pushed the samcv:gcb-- branch from 829f88e to b4b72e8 May 11, 2017
@samcv samcv merged commit ad4ee6c into Raku:master May 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

1 participant