Support U+1234 notation for Unicode symbols? #24

kindaro · 2020-01-07T20:09:48Z

It is customary to denote Unicode characters by U+[character code], but it seems klfc does not support this notation:

% klfc --from-json x --xkb y
klfc: parse fail in x: Error in $.keys[0].letters[0]: ‘U+2002’ is not a valid letter

Note that the character in this example is a kind of a space, so I would like not to insert it verbatim: it will hardly be clear for the reader what kind of space it is.

I also propose that there were a flag that allows writing json files with U+... format. Most fonts only support a narrow range of characters, so in many cases the more unusual characters would not show in any meaningful way.

The text was updated successfully, but these errors were encountered:

39aldo39 · 2020-01-14T20:29:15Z

KLFC uses a normal JSON file, so you can use the syntax "\u2002" already.

kindaro · 2020-01-15T09:19:18Z

I find it to be somewhat «wrong» to let the format of serialization define the ways in which I may or may not define a symbol, for the following reasons:

Formats come and go. Today it is JSON, tomorrow YAML, then Dhall.
Escape sequences are transient. For instance, if I convert a JSON file with both "—" and "\u2014" to YAML and back, I will either get two dashes or two escape sequences. The distinction will be lost.
When I wish to write a Unicode number of a symbol instead of the symbol itself, it is because I have an intention. A program should recognize and intrinsically acknowledge that intention, rather than making it «accidentally supported».

I am sure this change is technically feasible. If someone were to make it, would you merge?

39aldo39 · 2020-01-27T19:15:37Z

I don't think it necessarily wrong to let the format decide it, but I understand that the intention may be lost as most parsers throw away that information. However, it is also not very elegant to basically make your own escape sequences. For example, ligatures can currently be written as lig:U+2002, which outputs the literal string "U+2002". If you would also allow other notations, this becomes ambiguous. I don't know a nice solution for that.

39aldo39 · 2022-05-25T15:51:07Z

I have now added explicit support for Unicode characters!

kindaro · 2022-05-25T19:32:48Z

I do observe that this feature works.

If in the future we want to make sure ligatures can include strings that resemble the notation for Unicode code points, we can allow for the specification of a key to include not only strings, but objects like {"type": "ligature", "contents": "U+2002"} which would output the literal string "U+2002", and even {"type": "ligature", "contents": ["U+2245", " is for isomorphose"]]} which would output the literal string "≅ is for isomorphose".

kindaro mentioned this issue May 25, 2022

Cannot bind ≅ when using X Keyboard Extension format, instead ≈ is bound to the desired key. #49

Closed

39aldo39 added a commit that referenced this issue May 25, 2022

Add support for explicit unicode characters like U+0061 (#24, #49)

e07375b

39aldo39 closed this as completed May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support U+1234 notation for Unicode symbols? #24

Support U+1234 notation for Unicode symbols? #24

kindaro commented Jan 7, 2020

39aldo39 commented Jan 14, 2020

kindaro commented Jan 15, 2020 •

edited

Loading

39aldo39 commented Jan 27, 2020

39aldo39 commented May 25, 2022

kindaro commented May 25, 2022 •

edited

Loading

Support U+1234 notation for Unicode symbols? #24

Support U+1234 notation for Unicode symbols? #24

Comments

kindaro commented Jan 7, 2020

39aldo39 commented Jan 14, 2020

kindaro commented Jan 15, 2020 • edited Loading

39aldo39 commented Jan 27, 2020

39aldo39 commented May 25, 2022

kindaro commented May 25, 2022 • edited Loading

kindaro commented Jan 15, 2020 •

edited

Loading

kindaro commented May 25, 2022 •

edited

Loading