[feature request] Basic utf-8 support in rule language #4

jbulow · 2020-02-13T12:52:06Z

When writing rules for a W3C standard like SPARQL using the notation specified in the XML standard it would be quite convenient if there was some support of converting unicode code points to machines matching utf-8 encoded code point.

E.g. (from the SPARQL specification):

#x00c0 converts to the state machine 0xc3 0x80
[#0x00c0-0x00d0]converts to the state machine 0xc3 0x80..0x96
[#x037F-#x1FFF]converts to 0xCD 0xBF | 0xCE..0xDF 0x80..0xBF | 0xE0..0xE1 0x80..0xBF 0x80..0xBF

(I have not yet verified the examples above, but I hope the idea is clear)

The syntax \u could be used as an alternative to the syntax #x for code points.

The text was updated successfully, but these errors were encountered:

jbulow · 2020-02-18T07:47:57Z

Some releated information: https://www.w3.org/2005/03/23-lex-U

adrian-thurston transferred this issue from adrian-thurston/colm Mar 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Basic utf-8 support in rule language #4

[feature request] Basic utf-8 support in rule language #4

jbulow commented Feb 13, 2020

jbulow commented Feb 18, 2020

[feature request] Basic utf-8 support in rule language #4

[feature request] Basic utf-8 support in rule language #4

Comments

jbulow commented Feb 13, 2020

jbulow commented Feb 18, 2020