Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support U+1234 notation for Unicode symbols? #24

Closed
kindaro opened this issue Jan 7, 2020 · 5 comments
Closed

Support U+1234 notation for Unicode symbols? #24

kindaro opened this issue Jan 7, 2020 · 5 comments

Comments

@kindaro
Copy link

kindaro commented Jan 7, 2020

It is customary to denote Unicode characters by U+[character code], but it seems klfc does not support this notation:

% klfc --from-json x --xkb y
klfc: parse fail in x: Error in $.keys[0].letters[0]: ‘U+2002’ is not a valid letter

Note that the character in this example is a kind of a space, so I would like not to insert it verbatim: it will hardly be clear for the reader what kind of space it is.

I also propose that there were a flag that allows writing json files with U+... format. Most fonts only support a narrow range of characters, so in many cases the more unusual characters would not show in any meaningful way.

@39aldo39
Copy link
Owner

KLFC uses a normal JSON file, so you can use the syntax "\u2002" already.

@kindaro
Copy link
Author

kindaro commented Jan 15, 2020

I find it to be somewhat «wrong» to let the format of serialization define the ways in which I may or may not define a symbol, for the following reasons:

  • Formats come and go. Today it is JSON, tomorrow YAML, then Dhall.
  • Escape sequences are transient. For instance, if I convert a JSON file with both "—" and "\u2014" to YAML and back, I will either get two dashes or two escape sequences. The distinction will be lost.
  • When I wish to write a Unicode number of a symbol instead of the symbol itself, it is because I have an intention. A program should recognize and intrinsically acknowledge that intention, rather than making it «accidentally supported».

I am sure this change is technically feasible. If someone were to make it, would you merge?

@39aldo39
Copy link
Owner

I don't think it necessarily wrong to let the format decide it, but I understand that the intention may be lost as most parsers throw away that information. However, it is also not very elegant to basically make your own escape sequences. For example, ligatures can currently be written as lig:U+2002, which outputs the literal string "U+2002". If you would also allow other notations, this becomes ambiguous. I don't know a nice solution for that.

@39aldo39
Copy link
Owner

I have now added explicit support for Unicode characters!

@kindaro
Copy link
Author

kindaro commented May 25, 2022

I do observe that this feature works.

If in the future we want to make sure ligatures can include strings that resemble the notation for Unicode code points, we can allow for the specification of a key to include not only strings, but objects like {"type": "ligature", "contents": "U+2002"} which would output the literal string "U+2002", and even {"type": "ligature", "contents": ["U+2245", " is for isomorphose"]]} which would output the literal string "≅ is for isomorphose".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants