Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal keyboard input support #163

Closed
sunfishcode opened this issue Nov 28, 2019 · 13 comments
Closed

Terminal keyboard input support #163

sunfishcode opened this issue Nov 28, 2019 · 13 comments
Labels
feature-request Requests for new WASI APIs

Comments

@sunfishcode
Copy link
Member

Along with ANSI escape sequences for display, we should also consider escape sequences for input, so that arrow keys, function keys, page-up/page-down/home/end/etc. can be used.

Assuming we take the approach in #162 of avoiding exposing the termcap/terminfo/TERM information to applications, it seems logical to the same thing for inputs, and just define escape sequences used by WASI, and have implementations translate into those sequences.

At the risk of being too cute, in the Unicode era, we could have quite descriptive sequences, something like this:

Escape sequence Meaning
␛↑ up arrow
␛← left arrow
␛↓ down arrow
␛→ right arrow
␛⁽¹⁾ F1
␛⁽¹²⁾ F12
␛⌦ Delete
␛⎀ Insert

and so on, with "␛" here representing an actual ESC control character, so we can distinguish between the user entering a literal unicode symbol and pressing one of these special keys.

@sunfishcode
Copy link
Member Author

Feel free to suggest better Unicode art for any of these, or for keys not covered yet!

FWIW, I looked at the circled numbers for function keys, but they only go up to ㊿, while terminfo function keys go up to 63. Superscript digits give us as many function keys as we could want, though of course that's not the only way to go.

@programmerjake
Copy link
Contributor

Why not just go all-out and do something like:
<JSON object>
(where ␛ and ␤ are the ESC and LF control characters)

and the JSON follows this template (derived from libSDL):

{
    // a SDL_Scancode value
    // from https://hg.libsdl.org/SDL/file/bc90ce38f1e2/include/SDL_scancode.h
    "scancode": 1234,

    // a SDL_Keycode value
    // from https://hg.libsdl.org/SDL/file/bc90ce38f1e2/include/SDL_keycode.h#l34
    "keycode": 5678,

    // bitmask of modifiers -- SDL_Keymod value
    // from https://hg.libsdl.org/SDL/file/bc90ce38f1e2/include/SDL_keycode.h#l322
    "modifiers": 1024,
}

Even if we don't use JSON, please pick a representation that can handle all modifier combinations.

@mash-graz
Copy link

all this terminal related questions are in fact much more complex than expected.
sure, we usually just imagine only this very simple concept of direct key input resp. key code handling, as it perhaps had some relevance in good old DOS days, but on most modern systems it's a much more complex task, which is usually mediated/preprocessed/determined by a whole system of different infrastructure components. that's why it also needs more careful kinds of system control and configuration instaed of just key code or escape sequence filtering.

just as an inspiration, you should perhaps take a look into this microsoft article series about the history and recent improvements of the windows console infrastructure: https://devblogs.microsoft.com/commandline/windows-command-line-backgrounder/

and there is also a nice introductory article available about the POSIX/linux side and all its mysteries: http://www.linusakesson.net/programming/tty/

if you are studding this writings, you'll immediately grasp, why the console handling interfaces look so different on both sides -- on this family, which has its roots in small compact PCs for local operation and in those other tradition, where remote access was more or less a requirement from day one on. especially the benefits of the latter approach can hardly be preserved, if we only provide an inadequate simple solution.

i wouldn't underestimate the complexity of this field. IMHO it makes much more sense, to just reuse the available infrastructure on the given systems and already available mature software components in a responsible manner (as also suggested in #161), instead of wasting to much energies by reinventing everything again (...and very likely trap into the same pitfalls as our predecessors ;)). sure, to some degree it's really necessary to make clear security related decisions to avoid obvious risks, but in general WASI shouldn't restrict or overly complicate the freedom and flexibility of practical utilization more than necessary.

@sunfishcode
Copy link
Member Author

My sense in this issue is to consider just low-level terminal input, and not try to design a general-purpose input-event system. A general-purpose input system would be a valuable thing to have, but I think terminal input is a sufficiently distinct domain that we don't need to unify them. Assuming that's reasonable, we can go with something much simpler than JSON for terminal input.

An SDL-style scancode vs keycode distinction is a good idea, but for terminal input, I don't know of any situations where we have SDL-style scancode information, so my sense is that we don't need to include it here.

Using SDL code numbers, which are based on the USB keyboard spec, are also a good idea, though terminal input doesn't usually use SDL, and it doesn't receive hardware or raw OS input values, so it doesn't have a strong affinity here.

Modifiers: If we did go with Unicode symbols, we could represent modifiers with symbols too -- ⎈, ⎇, ⇧, prepended to the main character. However, as I research this domain more, I'm less excited about using Unicode here. It is nice if up-arrow on input can send the same sequence as move-the-cursor-up on output, and for output, we probably want to follow the established ANSI sequences for basic cursor movement and such.

@programmerjake
Copy link
Contributor

programmerjake commented Dec 3, 2019

One of the benefits of JSON is easy extensibility and wide programming language support. If we restrict ourselves to a single object where all members have C identifier names and integer values, it should be quite easy to parse for those who don't want to use a general JSON parser. We should specify that all unknown names should be ignored, except for "version" which if it is present indicates a future incompatible revision (with a version number as the value) and the whole keystroke should be ignored.

Example parser:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a528ef1e98dc0763c31b1618d5cfd997

#[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub struct Keystroke {
    scancode: u32,
    keycode: u32,
    modifiers: u32,
}

#[derive(Clone, Debug)]
pub struct ParseError;

struct Parser<'a> {
    iter: std::str::Chars<'a>,
}

impl<'a> Parser<'a> {
    pub fn new(text: &'a str) -> Self {
        Self { iter: text.chars() }
    }
    fn peek(&self) -> Option<char> {
        self.iter.clone().next()
    }
    fn peek_some(&self) -> Result<char, ParseError> {
        self.peek().ok_or(ParseError)
    }
    fn expect(&mut self, ch: char) -> Result<(), ParseError> {
        if self.peek() == Some(ch) {
            self.iter.next();
            Ok(())
        } else {
            Err(ParseError)
        }
    }
    fn peek_digit(&self) -> Option<u32> {
        self.peek().and_then(|ch| ch.to_digit(10))
    }
    fn parse_int(&mut self) -> Result<u32, ParseError> {
        let mut retval = self.peek_digit().ok_or(ParseError)?;
        self.iter.next();
        while let Some(digit) = self.peek_digit() {
            if retval == 0 {
                // must be zero or start with non-zero digit
                return Err(ParseError);
            }
            retval = retval.checked_mul(10).ok_or(ParseError)?;
            retval = retval.checked_add(digit).ok_or(ParseError)?;
            self.iter.next();
        }
        Ok(retval)
    }
    fn parse_name(&mut self) -> Result<&'a str, ParseError> {
        self.expect('"')?;
        let initial_str = self.iter.as_str();
        match self.peek() {
            Some('_') => {}
            Some(ch) if ch.is_ascii_alphabetic() => {}
            _ => return Err(ParseError),
        }
        self.iter.next();
        while self.peek_some()? != '"' {
            match self.peek() {
                Some('_') => {}
                Some(ch) if ch.is_ascii_alphanumeric() => {}
                _ => return Err(ParseError),
            }
            self.iter.next();
        }
        let len = initial_str.len() - self.iter.as_str().len();
        let retval = &initial_str[..len];
        self.expect('"')?;
        Ok(retval)
    }
    fn skip_whitespace(&mut self) {
        while self.peek().map(char::is_whitespace) == Some(true) {
            self.iter.next();
        }
    }
    fn parse(mut self) -> Result<Option<Keystroke>, ParseError> {
        self.skip_whitespace();
        self.expect('{')?;
        self.skip_whitespace();
        let mut scancode = None;
        let mut keycode = None;
        let mut modifiers = None;
        let mut version = None;
        while self.peek() == Some('"') {
            let name = self.parse_name()?;
            self.skip_whitespace();
            self.expect(':')?;
            self.skip_whitespace();
            let value = self.parse_int()?;
            self.skip_whitespace();
            match name {
                "scancode" => scancode = Some(value),
                "keycode" => keycode = Some(value),
                "modifiers" => modifiers = Some(value),
                "version" => version = Some(value),
                _ => {}
            }
            if self.peek() == Some(',') {
                self.iter.next();
                self.skip_whitespace();
            } else {
                break;
            }
        }
        self.expect('}')?;
        self.skip_whitespace();
        if self.peek().is_some() {
            return Err(ParseError);
        }
        if version.is_some() {
            return Ok(None);
        }
        Ok(Some(Keystroke {
            scancode: scancode.ok_or(ParseError)?,
            keycode: keycode.ok_or(ParseError)?,
            modifiers: modifiers.ok_or(ParseError)?,
        }))
    }
}

impl Keystroke {
    pub fn parse(text: &str) -> Result<Option<Self>, ParseError> {
        Parser::new(text).parse()
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test() {
        fn test_case(text: &str, expected: Result<Option<Keystroke>, ()>) {
            assert_eq!(
                Keystroke::parse(text).map_err(|_| ()),
                expected,
                "text = {:?}",
                text
            );
        }
        test_case(r#"{"too_big": 12345674738384848}"#, Err(()));
        test_case(r#"{"version": 1}"#, Ok(None));
        test_case(
            r#"{"scancode": 123, "keycode": 456, "modifiers": 789, "ignored": 0}"#,
            Ok(Some(Keystroke {
                scancode: 123,
                keycode: 456,
                modifiers: 789,
            })),
        );
    }
}

@bjorn3
Copy link

bjorn3 commented Dec 3, 2019

Using json would cause excessive memory usage. For example if you want to paste 1mb of data into the terminal, that would send ~200mb of data to the program. All of which needs to be formatted and parsed again. It is also a lot of extra mandatory code. Using something binary like bincode would be nicer, as it is easier to parse, because every field is at a fixed location, so you just transmuting the raw bytes to a struct is possible. It is also much more compact.

An example parser would be:

#[repr(C)]
struct Keystroke {
    version: u32,
    scancode: u32,
    keycode: u32,
    modifiers: u32,
}

fn parse(b: &[u32; 4]) -> &Keystroke {
    unsafe { std::mem::transmute(b) }
}

One problem with your current choice of fields is that it makes it impossible to paste characters without keyboard equivalent, like emoji and control codes.

@sbc100
Copy link
Member

sbc100 commented Dec 3, 2019

@bjorn3 I agree JSON seems rather excessive this use case.

(as does bincode BTW which looks like its designed for use cases where work is done to serialize / de-serialize.).

@programmerjake
Copy link
Contributor

@bjorn3 JSON is only used for the things that don't have a normal UTF-8 representation (and for ESC itself), if the text you're pasting is UTF-8 without any ESC characters, then it can be sent to the input unmodified without increasing the input size. The JSON is only used instead of UTF-8 for keys like Ctrl+Shift+1 -- because that key combination doesn't have a defined UTF-8 representation.

@mash-graz
Copy link

mash-graz commented Dec 5, 2019

i really share @sunfishcode's POV concerning this topic:

An SDL-style scancode vs keycode distinction is a good idea, but for terminal input, I don't know of any situations where we have SDL-style scancode information, so my sense is that we don't need to include it here.

we simply have to differentiate between:

  • old fashioned POSIX-style terminal resp. stdio based input
  • local machine and GUI system related keyboard handling

SDL and its data representation may be seen as nice cross platform solution for the later case, but it isn't a suitable example, how to handle the requirements and capabilities of the first category,

i don't think we should waste to much efforts on inventing another translation/representation of control commands. it's more important to understand their original function in controlling editing operations on the local terminal and generating out-of-band signals. but most of this terminal features became highly configurable over time and provide processing on the other side of the communication line as well.

it's mainly this kind of terminal IO configuration, which i would see as the most important requirement, if we want to realize more satisfying input capabilities -- e.g. switch between canonical mode and non-canonical mode input processing.

some of you may argue, that i'm too much focused on nothing else than this very old fashioned POSIX terminal interface, but that's done by purpose, because it's in fact this group of systems, where terminal io and CLI based work still has most practical relevance in real world. and those few exceptions, which do not fall into this category (e.g. windows and web based solutions like xterm.js) usually provide very similar terminal control capabilities. that's why i still see a straightforward WASI POSIX termio configuration extension module as a more appreciable and much easier realizable and compatible solution than any more ambitious cross platform oriented compromise.

@sunfishcode
Copy link
Member Author

I also think we can get by without a "version" field as well. The key events we're talking about here are described by terminfo and are very stable.

And more broadly, while the kind of extensibility that JSON would bring would have advantages, I also think the main value in adding functionality like this to WASI is in compatibility with existing terminals. Existing terminals don't have extensibility at this layer, so we wouldn't gain much by making WASI extensible at this layer either, And, we'd risk introducing complexity and new error conditions. So while I appreciate the ideas, I don't think JSON turns out to be a good fit here.

FWIW, I'm also leaning away from the Unicode approach I suggested at the top of this issue. While this is a space where vt100-family terminals in use today differ significantly, leaving room for WASI to potentially also do something different, using more traditional-style escape codes seems good enough, and will simply some implementations.

@abitrolly
Copy link

abitrolly commented Nov 11, 2021

Linux terminal is absolutely horrible for all non-English users out there, because it doesn't transmit keycodes, so if you switch the layout, all shortcuts with letter keys stop working. As much as I like vim I just hate it when I need to edit anything in it in Russian.

Linux terminal is absolutely horrible for users of Desktop software, where you press Ctrl+S to save you work. Imagine what Linux terminal does when you want to save you work with Ctrl+S? Its acking freezes it. And what people do when their app freezes is wait and close, losing the work they've tried to save.

Linux terminal has absolutely horrible latency. Because for the parser to distinguish between Esc key and any other key encoded with "Escape sequence", it needs to wait.

Linux terminal is horrible stateful mess. In other OS having a buffered queue of input events was a norm - in deterministic TUIs I pressed my keyboard piano and go preparing some tea while the app processes shortcut after shortcut. In Linux I am waiting for each key combination to finish before I start the next one. Because Linux terminal needs a pause to know if it is an Esc key I am pressed or F1. Linux terminal can also mess its own state, so there is a reset commands for users to clean it up https://superuser.com/questions/122911/what-commands-can-i-use-to-reset-and-clear-my-terminal

The rhetorical question - do you really want to bring this ancient mess (which I admit was an extremely useful thing back in 1960s) into the post-COVID era to make people feel the pain as I felt it?


The bottom line - there is no other opportunity out there except in WASI to create an alternative 21st century protocol for (terminal) keyboard input, convenient for developers to debug and implement, with low latency, standardized physical key ids (because USB HID standard could not figure out how to add pictures to their PDF you need to guess key names on your keyboard). The use cases should include all applications that now run in GUI, such as games, so that hot seat Mortal Combat in terminal could become possible.

If WASI implements a good keyboard interface/abstraction (I dk, libinput?), then implementing all the POSIX cruft as a portability lib on top of that will not be a big problem. In the end the right name for Linux Terminal is already Linux Terminal Emulator, so no need to emulate emulator once more.

@sunfishcode
Copy link
Member Author

The most important thing needed for anything to happen in this space is for someone to volunteer to make it happen.

The main reason for considering a design within the "ANSI" family of terminal emulators is compatibility, both with existing widely-popular terminals, and with large amounts of existing application code which knows how to talk to these kinds of terminals. This family is so popular that even Windows has chosen to join it.

Also, I expect that we could scope a terminal input API such that it wouldn't be expected to be WASI's exclusive input method for all use cases. In particular, if WASI ever gains a GUI API, I wouldn't expect it to use a terminal input API for input. Programs that want to have both terminal and GUI UIs would usually need separate UI code for both in any case.

@sunfishcode
Copy link
Member Author

I'm going to close this discussion here and post some updates to #161. As I mentioned above my original idea here of using Unicode symbols wasn't practical, and this issue doesn't contain a more specific plan, so we can close it and continue in #161.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Requests for new WASI APIs
Projects
None yet
Development

No branches or pull requests

6 participants