-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New data file generator with support for UCD 13 & 14 #227
Comments
With the release of Unicode 14.0 I have now also updated the make files. I also updated To build & test: Copy
This will download the UCD data files, generate a Alternatively, |
This sounds great, I'll try to take a look at it later. |
Can you convert this into a pull request? A PR is much easier to review than a tarball of changes. |
I’m sorry, I’m not a git user. I don’t know how to do that. |
There are hundreds of tutorials online — it's pretty indispensable for participating in any free/open-source software projects these days, not to mention a lot of commercial projects. (If you can write Python code with all of the features listed above, I'm sure you can learn git!) In a pinch, I can take the .tar.gz file you posted and make a pull request for you, though. |
I could, probably, learn git. I really don’t want to 🤡.
Did the above commits give you what you wanted? Also, I appear to have missed the changes in 610730f. |
You have to create a new pull request based on the commits: https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request |
I read that and thought that “To open a pull request in a public repository, you must have write access …” meant it wasn’t what I wanted. So I followed this https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue. But apparently you have to push a pull request! [All done without git 🤓.] |
Closed in favor of #258 |
Attached is
data_make.py
, a python3 script designed to combine & replace data/data_generator.rb & data/charwidths.jl and support both UCD 13 & 14. Alsoutf8proc.c.patch
, a small change to utf8proc.c needed to support UCD 14.Here are some of its features:
[Due to the increased size of UCD 14 data I have had to split utf8proc_sequences & added utf8proc_casemap to prevent index overflow. This requires a small patch to utf8proc.c.]
To build with (the still in development) UCD 14 requires a new Makefile. I haven’t supplied that here as the UCD 14 is still in a state of flux & the URLs are changing. (I can supply one if requested.)
UCD 14 has increased the size of the generated data. I have had to split utf8proc_sequences & added utf8proc_casemap to prevent index overflow. This requires the small patch to utf8proc.c contained in utf8proc.c.patch. With the patch applied utf8proc.c still works with the original utf8proc_data.c, and the new format UCD 13 & 14 data.
To use:
utf8proc-2.6.1.tar.gz
.data_make.py
&utf8proc.c.patch
into theutf8proc-2.6.1
dir.make -kC data
to download the UCD 13 data files. [It’s OK if CharWidths.txt is not made.]patch < utf8proc.c.patch
../data_make.py --verbose --format=1 --output=utf8proc_data.c
make check
.Usage is:
If unspecified the output file is
utf8proc_data.out.c
.If unspecified the input data-dir file is
./data
.If
--format=0
alone is used (the default) then the output file should be identical to the originalutf8proc_data.c
file.If
--fix26
is used then the fixes described in issue #226 are applied to the tables.If
--cmap
is used then theutf8proc_sequences
table is split & theutf8proc_casemap
table added. This requires the utf8proc.c.patch to be applied.If
--format=1
is used then--fix26
&--cmap
are implied and the output file uses the new compact source form.Using UCD 14 automatically forces
--format=1
(thus--fix26
&--cmap
too).Using
--verbose
reports the options in effect & successful generation of the output file.data_make.zip
The text was updated successfully, but these errors were encountered: