Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates:
data_generator.rb
script. This script now runs on a modern version ofruby
(>1.8) and has the hard-coded data tables replaced with file reads from the appropriate Unicode data (UNIDATA) files.Makefile
target,update
, which automatically downloads the relevant UNIDATA and runsdata_generator.rb
to produce the fileutf8proc_data.c.new
.utf8proc_data.c
to the output generated by runningmake update
against UNIDATA v7.0.0Observations:
#define
d constants inutf8proc.c
which may in principle have changed from v5.0 to v7.0, such as the constants marking the location of Hangul, Unihan, etc. I haven't checked them and it's probably not worth recomputing for each new Unicode version.utf8proc
implements an internal processing mode calledLUMP
, which is briefly described inlump.txt
. As far as I can tell, this is a custom normalization mode which is separate from the Unicode standard, but I think we'll want to use these.Ref: #1