Skip to content
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.

URL code

FauxFaux edited this page Aug 2, 2012 · 2 revisions

Now

A terminal in PuTTY appears to be a two dimensional "array" of utf-16 entities, which can have attributes etc.

When the terminal contains only entities that fit in one utf-16 "unit" (octet pair) (i.e. code-points under U+10000), there is a one to one mapping of "unit position" to "screen position". The current URL code depends on this.

PuTTY does not seem to store the state of this "screen" array anywhere, so the URL code makes its own. There are various places that look like they have this data in them (screen arrays on the term object, the font code (tm), but I haven't been able to extract anything meaningful from them.

This breaks horribly when:

  1. There are codepoints above U+10000.
  2. There are codepoints that are "double width", i.e. require two grid squares to render, like most oriental symbols, e.g. "燷晼晋".
  3. There are codepoints that are "zero width", i.e. require no grid squares to render, like combining code points, e.g. "à", represented as 'a' (0+0061), then a combining grave accent (U+0300).
  4. When the terminal is redrawing (I have no idea why this happens).
  5. Probably loads of other cases.

Working case:

| 0 | 1 | 2 |
-------------
| a | b | c |
| e |   | f |

All low ascii, no double width characters.

  • term[0][0] == 'a'
  • term[0][2] == 'c'
  • screen[0][0] == 'a'
  • screen[0][2] == 'c', etc.

#1

term:
|                       0                       |     1     | 2 |
-----------------------------------------------------------------
|   very high numbered, but single width glyph  | surrogate | c |
|                       e                       |     f     |

screen:
|                       0                       |     1     |
-------------------------------------------------------------
| the very high numbered glyph, in its entirety |     c     |
|                       e                       |     f     |

Here, the screen is only two characters wide, but some of the terminal lines are greater than two characters wide.

  • term[0][2] == 'c', but screen[0][1] == 'c', etc.

This causes BOOOM.