Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A couple of potential issues in data_generator.rb #226

Closed
chris0e3 opened this issue Jul 19, 2021 · 3 comments
Closed

A couple of potential issues in data_generator.rb #226

chris0e3 opened this issue Jul 19, 2021 · 3 comments

Comments

@chris0e3
Copy link

Hello,

Thanks for all your work on utf8proc. It’s much appreciated.

While rewriting data_generator.rb & charwidths.jl (v2.6.1) in Python I spotted a couple of potential issues.

  1. The following lines in data_generator.rb produce spurious 0s which are added to $exclusions and $excl_version. (This occurs because there are comment lines in the input.)
    134: $exclusions = $exclusions.chomp.split("\n").collect { |e| e.hex }
    ...
    137: $excl_version = $excl_version.chomp.split("\n").collect { |e| e.hex }

This results in utf8proc_property_struct.comp_exclusion = true for U+0000. Without the spurious 0s it is false.

  1. The following line in data_generator.rb looks wrong:
    250:    "#{%W[Zl Zp Cc Cf].include?(category) and not [0x200C, 0x200D].include?(category)}, " <<
                                                                                    ^^^^^^^^

should (probably) be:

    250:    "#{%W[Zl Zp Cc Cf].include?(category) and not [0x200C, 0x200D].include?(code)}, " <<
                                                                                    ^^^^

This results in utf8proc_property_struct.control_boundary = true for U+200C and U+200D. With the change it is false.

Can anyone definitively state if these property changes are correct?
I hope this info is helpful.

If there’s interest I will open a separate issue for my new Python data generator.

Regards,

CHRIS

@stevengj
Copy link
Member

stevengj commented Jan 4, 2024

cc @c42f — I think both of these issues were noted in #258 and will hopefully be fixed soon.

@c42f
Copy link
Contributor

c42f commented Jan 4, 2024

Huh. Completely correct IMO. I discovered both these fixes independently.

Luckily the consequences of these bugs seem quite minor, as documented in #259

@c42f
Copy link
Contributor

c42f commented Jan 5, 2024

I think this can be closed now :)

@stevengj stevengj closed this as completed Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants