Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Basic utf-8 support in rule language #4

Open
jbulow opened this issue Feb 13, 2020 · 1 comment
Open

[feature request] Basic utf-8 support in rule language #4

jbulow opened this issue Feb 13, 2020 · 1 comment

Comments

@jbulow
Copy link

jbulow commented Feb 13, 2020

When writing rules for a W3C standard like SPARQL using the notation specified in the XML standard it would be quite convenient if there was some support of converting unicode code points to machines matching utf-8 encoded code point.

E.g. (from the SPARQL specification):

#x00c0 converts to the state machine 0xc3 0x80
[#0x00c0-0x00d0]converts to the state machine 0xc3 0x80..0x96
[#x037F-#x1FFF]converts to 0xCD 0xBF | 0xCE..0xDF 0x80..0xBF | 0xE0..0xE1 0x80..0xBF 0x80..0xBF

(I have not yet verified the examples above, but I hope the idea is clear)

The syntax \u could be used as an alternative to the syntax #x for code points.

@jbulow
Copy link
Author

jbulow commented Feb 18, 2020

Some releated information: https://www.w3.org/2005/03/23-lex-U

@adrian-thurston adrian-thurston transferred this issue from adrian-thurston/colm Mar 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant