Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please consider adding the generated markdown directly in this repo #2

Open
wez opened this issue Sep 20, 2023 · 2 comments
Open

Please consider adding the generated markdown directly in this repo #2

wez opened this issue Sep 20, 2023 · 2 comments

Comments

@wez
Copy link

wez commented Sep 20, 2023

I don't read tex natively and it's super inconvenient to download and read the markdown outside of just allowing github to render it here. The PDF format doesn't respect my dark mode preference either!

I'm going to cheekily paste in the current version of the markdown here so that I can read the spec in the meantime!


author:

  • Christian Parpart
    date: '2021-09-04 (draft, revision 1)'
    title: |
    Unicode in Terminals
    a proposal to standardizing basic Unicode features

History and current state

Historically, only 7-bit characters with C0 control codes were supported
by terminals and different languages by selecting their respective code
pages.

Later on this was extended to 8-bit ASCII and along with C1 control
codes.

With the introduction of Unicode there were no need to have codepages
anymore, but the Unicode spec was not explicitly designed to also cover
terminals, except that C0 and C1 codepoints were preserved.

With Unicode UTF-8 it was possible to at least pass Unicode characters
to the terminal, but rendering of a few characters as well as their
respective cursor placement is not defined in the Unicode standard.

Also, Unicode introduced codepoint sequences that are mapping to a
single user perceived character - so called grapheme clusters. The
terminal has never attempted any formalization on how to deal with
grapheme clusters, variation selectors, their east asian width, nor
emoji and emoji presentation handling.

This spec tries to address some of the problems terminals are suffering
with Unicode today.

Backwards Compatibility

basic points are: Everything is disabled by default, so legacy apps
don't break more than they used to break already.

Backwards compatibility is retained by leaving everything as undefined
as it is without this specification.

The application can test for the availability of this feature and has to
explicitly enable it in order to get the set of properties as defined in
this document guaranteed.

Future Compatibility and Stability

Unicode itself had a major breakage at version between version 8 and 9
with regards to some codepoints having their east asian width changed.

While this may happen any time again, we do not expect that to happen
that soon nor that frequent to address future incompatibilities as of
this spec and leave this for a later point.

Feature and Mode State Detection

[CSI ? 2027 $ p]{style="background-color: light-gray"}([ref:DECRQM]{reference-type="ref"
reference="ref:DECRQM"}) can be used for testing the availability of
this feature as well as the current mode the terminal is in with regards
to this specification, the
[CSI ? 2027 $ p]{style="background-color: light-gray"}reply will
indigate each state acurately enough not not need any new VT sequence
introduced.

Mode Switching

  • [CSI ? 2027 h]{style="background-color: light-gray"}
    ([ref:DECSM]{reference-type="ref"
    reference="ref:DECSM"}) for ensuring conformance to all rules as
    defined by this specification

  • [CSI ? 2027 l]{style="background-color: light-gray"}
    ([ref:DECRM]{reference-type="ref"
    reference="ref:DECRM"}) for undefined behavior

Semantics

The following set of semantics MUST be adhered to if this VT mode
[2027]{style="background-color: light-gray"} is enabled. If the VT
mode [2027]{style="background-color: light-gray"} is not set, then the
behavior is as undefined as if this specification was not implemented at
all in order to retain behavior of current terminals and their legacy
applications.

Grapheme Cluster

{#section .unnumbered}

With this mode enabled, the terminal MUST support grapheme clusters
in conformance to algorithm as described in UTS 29
[ref:UTS-29]{reference-type="ref"
reference="ref:UTS-29"}.

{#section-1 .unnumbered}

This implies that every consecutively written character on the terminal
stream that is non-breakable as per UTS 29
[ref:UTS-29]{reference-type="ref"
reference="ref:UTS-29"} will always end up in the same terminal's grid
cell.

{#section-2 .unnumbered}

Therefore, extending a grapheme cluster with consecutively added
codepoints will not move the cursor except for variation selector 16
(VS16) that may have caused the width of the grapheme cluster to change
to wide (2 grid cells).

{#section-3 .unnumbered}

When the cursor moves to a grid cell that contains a complete or
incomplete grapheme cluster, this grid cell's contents will be erased
and overwritten rather then textually concatinated.

{#section-4 .unnumbered}

Therefore cursor movement semantics of the terminal remain unchanged.

Emoji

{#section-5 .unnumbered}

Emoji symbols are always rendered in square aspect ratio (as proposed by
UTS 51 [ref:UTS-51]{reference-type="ref"
reference="ref:UTS-51"}), implying a East Asian Width of Wide, 2 grid
cells.

{#section-6 .unnumbered}

ZWJ emoji are required to be displayed as a single image with a width of
2 grid cells.

{#section-7 .unnumbered}

The alternate display of ZWJ emoji in a decomposed sequence of
sub-images must not be used as a fallback as it will break cursor
movemeent guarantees.

{#section-8 .unnumbered}

If a ZWJ emoji cannot be rendered the display behavior is undefined -
for example, a unicode replacement character
[U+FFFD]{style="background-color: light-gray"} could be displayed
instead.

{#section-9 .unnumbered}

In emoji emoji presentation, the cursor will always move by 2 grid
cells.

{#section-10 .unnumbered}

SGR attributes applied to a grid cell containing an emoji symbol are not
strictly defined and it is left to the terminal emulator to have
sensible meaningful semantics with regards to emoji symbols.

Variation Selector 16

VS16 promotes the grapheme cluster to emoji emoji presentation, implying
that this will force the grapheme cluster's width to be 2, which may
possibly cause reflowing of that symbol to the next line if on right
margin with AutoWrap mode is set.

Variation Selector 15

{#section-11 .unnumbered}

VS15 forces the grapheme cluster to emoji text presentation. This will
NOT change the underlying width but only change the display to
prefer textual non-colored presentation.

{#section-12 .unnumbered}

This matches the behavior of todays web browsers and should thus feel
most intuitive to users.

{#section-13 .unnumbered}

The cursor will move by columns if the symbol has the default
presentation of emoji.

Margins and AutoWrap with Emoji

Emoji written at the right margin with AutoWrap mode disabled may or may
not be rendered in half or not be displayed at all. This behavior is
undefined to ease implementation and adoption of this specification.

References

@christianparpart
Copy link
Member

Hey @wez,

all good. I now just don't know what to do with this ticket. For certain tasks I think LaTeX is just better suited, especially since it can render to Markdown as well.

I am always open to improve the publisher format, and therfore I am absolutely open to suggestions.

But apart from that, also feedback to the spec is very welcome. There are (in the end) not yet many terminals that do in fact try to move forward on the grapheme cluster end, and I think TUI/CLI apps need this kind of discoverability to gain trust and also start relying on the modern way of laying out complex graphemes in the terminal.

I think we can have the markdown uploaded to a github.io page upon push/merge to master branch, such that it is easy to read from there as well (should suit you for sure)

@wez wez changed the title Please considering adding the generated markdown directly in this repo Please consider adding the generated markdown directly in this repo Sep 21, 2023
@wez
Copy link
Author

wez commented Sep 21, 2023

Hi @christianparpart!

re: the spec, I think it sounds fine. FWIW, wezterm reports permanently-enabled for this setting and doesn't allow disabling it.

wez/wezterm#4223 was a request to offer application level control, but as part of looking into it, I decided that it was a lot of effort to undo what I was already doing :-p

re: the markdown and this issue, I don't have a strong preference on the implementation details, but I think the goal should be to make it as quick and easy as possible for someone to view it, without having extra steps to download or open a helper application. Personally, I would probably just check it in directly, but deploying it to GH pages is also OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants