Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data tables to Unicode 7.0.0 #6

Closed
wants to merge 5 commits into from
Closed

Update data tables to Unicode 7.0.0 #6

wants to merge 5 commits into from

Conversation

jiahao
Copy link
Collaborator

@jiahao jiahao commented Jul 17, 2014

Updates:

  1. Updates the data_generator.rb script. This script now runs on a modern version of ruby (>1.8) and has the hard-coded data tables replaced with file reads from the appropriate Unicode data (UNIDATA) files.
  2. Provides a new Makefile target, update, which automatically downloads the relevant UNIDATA and runs data_generator.rb to produce the file utf8proc_data.c.new.
  3. Updates utf8proc_data.c to the output generated by running make update against UNIDATA v7.0.0

Observations:

  1. There are #defined constants in utf8proc.c which may in principle have changed from v5.0 to v7.0, such as the constants marking the location of Hangul, Unihan, etc. I haven't checked them and it's probably not worth recomputing for each new Unicode version.
  2. It looks like utf8proc implements an internal processing mode called LUMP, which is briefly described in lump.txt. As far as I can tell, this is a custom normalization mode which is separate from the Unicode standard, but I think we'll want to use these.

Ref: #1

@jiahao jiahao changed the title XXX Marking data locations Update data tables to Unicode 7.0.0 Jul 18, 2014
@jiahao
Copy link
Collaborator Author

jiahao commented Jul 18, 2014

I managed to bork this PR.

@jiahao jiahao closed this Jul 18, 2014
@jiahao
Copy link
Collaborator Author

jiahao commented Jul 18, 2014

Replaced by #9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant