Implement colon separated CSI parameters #22

VojtechStep · 2019-08-04T19:51:27Z

I was looking into implementing styled underlines into alacritty, like what you might see in kitty or gnome vte based terminals - that is normal underline, double underline, curly underline (undercurl), and have it have a different color than the foreground.

Looking at the kitty implementation (spec), the terminal emulator has to handle not only semicolon separated parameters (\e[0;4m), but also colon separated parameters (\e[4:3m).

I'm not an expert on the standard defining how those sequences should be handled, so I was going by kitty's source code. What's being done in their codebase is not applicable to this repo, because they have the entire sequence buffered up until the command byte ('m' in this case) and then parse it, dispatching every semicolon separated parameter as it's own sequence, with special cases for color parameters. In that way, a command like \e[0;4;58:2::186:93:0m would be split into three function calls: handle 0m, handle 4m, handle 58:2::186:93:0m.

This could be implemented in the vte crate (I think) by adding a dimension to the params array, and multiple calls to the Performer implementor. Not sure what the performance cost would be though.

The text was updated successfully, but these errors were encountered:

jwilm · 2019-08-13T01:34:25Z

Because control sequences sometimes need to look at multiple parameters together, we can't simply split into multiple function calls.

It looks like xterm implements subparams by basically sticking them into the params list but then also keeping a separate array which indicates which are subparams and which are top levels params. Since this naturally makes it possible to exceed the 16 param limit for normal use, xterm caps the params list at 30 instead.

A similar approach here would have comparable performance to the existing implementation. It would allow the Perform impl to know which are params and which are subparams (albeit not in the most accessible way), and do whatever it pleases. That is to say, it keeps things nicely generalized while exposing pertinent info.

VojtechStep · 2019-08-13T10:58:31Z

AFAIK the only sequences with subparameters in the standard are colors, which could be handled as a special case by the parser. From what I've seen, this is what other parsers do.

What you suggest is something along the lines of adding a parameter to the csi callback of type &[std::slice::SliceIndex], which could be used to index into the params array to get the ranges of the individual subsequences, right?

The first approach might allow us to keep backward compatibility, but the latter is certainly more generic.

jwilm · 2019-08-13T18:08:17Z

The parameter format is specified generically such that any control character could be associated with 1 or more params. Here's the relevant section from ECMA-48

My suggestion is that we would allow the params list to be larger and also have something like a &[bool] of the same length which indicates which params are actually subparams. I wouldn't actually call it a suggestion though -- this is just what xterm does. We could probably have a nicer data structure in Rust.

VojtechStep · 2019-08-13T19:36:30Z

A simple example of what I was thinking: Playground link.
The parameters are still passed as a linear slice, but we can index into it and get the subparameter groups by iterating over the range slice.

jwilm · 2019-08-13T20:16:21Z

Thanks for clarifying with an example. The slice ranges work for me as long as we can avoid dynamic allocations.

jerch · 2020-04-02T14:09:16Z

Stumbled over your issue here through GH suggestions lol.

Maybe I can share a few insights from our side (we implemented this a few months ago in xterm.js).

Imho the crucial part here is not to limit this to certain final functions or certain param values, as the spec clearly states, that any param can have those substring param extensions (as cited by @jwilm above). I think some emulators get this wrong. Following the spec, this a perfect legal example:

CSI 1:2:999999999 ; 3:4 ; 5: ; :6 <final byte>

Whether those extensive subparams make sense at all is up to the final sequence function to decide. Thus a GP sequence parser should pass subparams along as they occur without losing the associated param they are meant to extend.

Took me a while to find a suitable structure for that, in the end we went with a single parser wide data structure with certain limits (per sequence: up to 32 params, up to 32 subparams in total, a single value clamped to -1 .. 2³¹ - 1), that gets borrowed by the sequence functions to avoid re-allocations.

Since our parser is based on the VT500 parser described on VT100.net, this subparam handling also applies to DCS params (we did not introduce another action to distinguish between CSI_PARAM and DCS_PARAM). Well, the DCS parsing of the VT500 parser is not ECMA compatible in the first place, thus I think the subparam extension does not hurt anyone in the DCS field. Also all well-defined DCS sequences keep working.

Maybe this helps somewhat to get things sorted out for your parser.

VojtechStep · 2020-04-03T18:06:54Z

Thanks for the input!

I 100% agree that the parser should not in any way try to interpret the sequence, only provide the library user with information about how the parameters were read.

I guess the main issue I have with the implementation for now is what should the interface look like (I'm also sure this will become less of a problem once I actually formulate the requirements I expect from the interface in this comment, stop sitting on the issue for eight months and actually put together a PR 😄)

What I would like to achieve is to support these use cases:

Backwards compatibility - If a user doesn't want to support colon separated parameters or they already have some other heuristics that work with the raw byte slice, it should still be available for use without performance penalties or refactoring. (Note that this would probably still be a breaking change, because the signature of csi_dispatch will likely change)
Ease of use for the new approach - I'll probably start experimenting with the slice ranges and write some tests to get a feeling of how ergonomic it would be to use (see the playground link I provided earlier, except I later realized there is no value in the handler being generic). As for the allocation, I don't believe there is currently a way to make a fixed-length array of non-Copy types elegantly (Range is not Copy, because it implements Iterator).
Ease of use for interpreting inputs that use both colon separated params and semicolon separated params - There shouldn't be much of a cognitive overhead if the user has to accept for example colors in the semicolon separated format. What I mean by this is that if we were to change the API to expose a slice of slices, it might not be very ergonomic, because there wouldn't be a 1:1 correspondence between a slice and a command and the user would have to do some sort of lookahead over slices with one element each, or at least I can't imagine .

This leads me to believe that we have to pass the slice as we do now, and then some additional metadata, which in the end might not be the slice Range, but a custom struct that is Copy, Index and maybe something else.

I will experiment for a bit and open a PR once I feel I have a hint of a solution.

jerch · 2020-04-03T21:13:14Z

Some more remarks from my side:

Technically the params in the example above would translate into something like this (as "int arrays" in C):

{ { 1, 2, 2147483647 }, { 3, 4 }, { 5, -1 }, { -1, 6 } }

(we use -1 as placeholder for omitted values). While the old impl was just a simple array with index access to the values, this one needs to store the slice offsets somehow. After puzzling around and doing some benchmarks I ended up with these containers instead:

int params[32] = { 1, 3, 5, -1, ... };
int subparams[32] = { { 2, 2147483647 }, { 4 }, { -1 }, { 6 }, ... };

I basically pulled the first value of those subparam groups into the old params array we had before, and put excessive subparams in a new container (+ the slice offsets into a third one). I did not do this slightly more complicated model for backwards compatibility, still it would serve any sequence w'o subparams the old way. Main reason to do this was perf - reshaping everything to use slices was quite a dent in the perf, thus I optimized it for the usual case with no subparams.

I have not looked at your current impl, maybe you can restore parts of that idea as well. Note that the type exhibits the same params interface to functions as before, subparams are only "on top" (if the function cares at all for those).

VojtechStep · 2020-04-04T10:29:50Z

That seems like a good idea, I will try to take inspiration from it.
However I see that the subparams are a 2d array (of course they are, that's the main point) and in that case I don't think we can avoid constructing temporary arrays when calling the user provided callbacks, as opposed to only passing slices of already constructed memory.

I'll see what I can come up with.

jerch · 2020-04-04T14:18:26Z

However I see that the subparams are a 2d array (of course they are, that's the main point)...

Yepp, well C plays "nicely" here and would just store those values in linear memory as { 2, 2147483647, 4, -1, 6, ... }, which is hard to grasp (I added the superfluous braces for readability).

The slice indices are stored separately like this (yes in C everything has to be done by hand 😸):

unsigned short subparam_idx[32] = { 0<<8 | 2, 2<<8 | 3, 3<<8 | 4, 4<<8 | 5, ... };
// where a slice is formed by (subparams_indices at param_idx):
// [ subparam_idx[param_idx] >> 8 .. subparam_idx[param_idx] & 0xFF [   (right exclusive)

which is a quite compact memory layout (with a hard coded limit of 256 subparams due to the short width). This bitpacking makes it slightly worse than without, but gets more than compensated due to fitting into one cache line. The whole subparams accounting can be skipped for usual sequences w'o any subparams, and since they tend to be less than 8 params per sequence, again fit into one cache line, which makes it pretty fast in the end.

Ofc this is made with C in mind, not sure if you can do it similar in rust, which comes with much nicer default types out of the box. In C at least always doing the slicing descent is much worse in runtime than treating the first value special by pulling it "in front".

chrisduerr · 2020-04-04T16:32:28Z

I don't think we can avoid constructing temporary arrays when calling the user provided callbacks

Please note that VTE should work without std and this should have zero negative performance impact on existing Alacritty benchmarks.

VojtechStep · 2020-04-04T18:00:16Z

I know, that comment was a supposed to indicate that the implementation will probably go a different way, but apparently I wasn't clear enough 😄

This adds support for CSI subparameters like `\x1b[38:2:255:0:255m`, which allows the combination of truecolor SGR commands together with other SGR parameters like bold text, without any ambiguity. This implements subparameters by storing them in a list together with all other parameters and having a separate slice to indicate which parameter is a subparameter and how long the subparameter list is. This allows for static memory allocation and good performance while still having the option for dynamic sizing of the parameters. Since the subparameters are now also counted as parameters, the number of allowed parameters has been increased from `16` to `32`. Since the existing structures combine the handling of parameters for CSI and DCS escape sequences, it is now also possible for DCS parameters to have subparameters, even though that is currently never used. Considering that DCS is rarely supported by terminal emulators, handling these separately would likely just cause unnecessary issues. The performance should also be better by using this existing subparam structure rather than having two separate structures for DCS and CSI parameters. The only API provided for accessing the list of parameters is using an iterator, this is intentional to make the internal structure clear and allow for easy optimizations downstream. Since it makes little sense to access parameters out of order, this limitation should not have any negative effects on performance. The main drawback is that direct access to the first parameter while ignoring all other subparameters is less efficient, since it requires indexing a slice after iterating to the element. However while this is often useful, it's mostly done for the first few parameters which significantly reduces the overhead to a negligible amount. At the same time this forces people to support subparameters or at least consider their existence, which should make it more difficult to implement things improperly downstream. Fixes alacritty#22.

This adds support for CSI subparameters like `\x1b[38:2:255:0:255m`, which allows the combination of truecolor SGR commands together with other SGR parameters like bold text, without any ambiguity. This implements subparameters by storing them in a list together with all other parameters and having a separate slice to indicate which parameter is a subparameter and how long the subparameter list is. This allows for static memory allocation and good performance while still having the option for dynamic sizing of the parameters. Since the subparameters are now also counted as parameters, the number of allowed parameters has been increased from `16` to `32`. Since the existing structures combine the handling of parameters for CSI and DCS escape sequences, it is now also possible for DCS parameters to have subparameters, even though that is currently never used. Considering that DCS is rarely supported by terminal emulators, handling these separately would likely just cause unnecessary issues. The performance should also be better by using this existing subparam structure rather than having two separate structures for DCS and CSI parameters. The only API provided for accessing the list of parameters is using an iterator, this is intentional to make the internal structure clear and allow for easy optimizations downstream. Since it makes little sense to access parameters out of order, this limitation should not have any negative effects on performance. The main drawback is that direct access to the first parameter while ignoring all other subparameters is less efficient, since it requires indexing a slice after iterating to the element. However while this is often useful, it's mostly done for the first few parameters which significantly reduces the overhead to a negligible amount. At the same time this forces people to support subparameters or at least consider their existence, which should make it more difficult to implement things improperly downstream. Fixes #22.

chrisduerr added the enhancement label Aug 4, 2019

VojtechStep mentioned this issue Aug 5, 2019

Interpret a trailing semicolon as an implicit '0' parameter #23

Closed

chrisduerr self-assigned this Jul 28, 2020

chrisduerr mentioned this issue Aug 2, 2020

Add CSI subparameter support #65

Merged

chrisduerr closed this as completed in #65 Aug 5, 2020

imsnif mentioned this issue May 7, 2021

fix(compatibility): upgrade vte to support csi subparameters zellij-org/zellij#469

Merged

OnurKader mentioned this issue Jul 17, 2022

unicode grapheme segmentation zellij-org/zellij#1538

Open

fornwall mentioned this issue Sep 17, 2024

Fixed: Implement colon separated CSI parameters termux/termux-app#4154

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement colon separated CSI parameters #22

Implement colon separated CSI parameters #22

VojtechStep commented Aug 4, 2019

jwilm commented Aug 13, 2019

VojtechStep commented Aug 13, 2019

jwilm commented Aug 13, 2019

VojtechStep commented Aug 13, 2019

jwilm commented Aug 13, 2019

jerch commented Apr 2, 2020 •

edited

Loading

VojtechStep commented Apr 3, 2020

jerch commented Apr 3, 2020

VojtechStep commented Apr 4, 2020

jerch commented Apr 4, 2020 •

edited

Loading

chrisduerr commented Apr 4, 2020

VojtechStep commented Apr 4, 2020

Implement colon separated CSI parameters #22

Implement colon separated CSI parameters #22

Comments

VojtechStep commented Aug 4, 2019

jwilm commented Aug 13, 2019

VojtechStep commented Aug 13, 2019

jwilm commented Aug 13, 2019

VojtechStep commented Aug 13, 2019

jwilm commented Aug 13, 2019

jerch commented Apr 2, 2020 • edited Loading

VojtechStep commented Apr 3, 2020

jerch commented Apr 3, 2020

VojtechStep commented Apr 4, 2020

jerch commented Apr 4, 2020 • edited Loading

chrisduerr commented Apr 4, 2020

VojtechStep commented Apr 4, 2020

jerch commented Apr 2, 2020 •

edited

Loading

jerch commented Apr 4, 2020 •

edited

Loading