Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes #19, and also makes it much easier to implement grapheme iterators in Julia (JuliaLang/julia#9261) by adding a
bool utf8proc_grapheme_break(int32_t c1, int32_t c2)
function to check for a grapheme break between two codepoints. This allows us to iterate over graphemes in-place, without mapping to a separate string with0xFF
grapheme separators.Unfortunately, I had to break backwards compatibility by changing the
utf8proc_property_t
struct to replace theextend:1
field with aboundclass:4
field, where the latter is now read from Unicode's GraphemeBreakProperty.txt file by the updated generator script. I took this opportunity to rearrange the struct to put the bitfields at the end, so that C will not insert alignment padding into the struct; as a consequence, the struct actually got smaller by several bytes.Once this is merged, I will submit the corresponding patch to the utf8proc folks.
@jiahao, does it look okay to you? cc @StefanKarpinski