Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keybind socket #252

Closed

Conversation

epetousis
Copy link

@epetousis epetousis commented Sep 10, 2022

Because Wayland forbids non-root processes from listening to system-wide keypresses, Discord and WebCord are unable to listen for keybinds when the window doesn't have focus. This pull request adds a socket server that allows for sending a payload to forward keybinds to WebCord, so that you may use a privileged process to handle shortcuts like sway or swhkd. An example of activating the push-to-talk key is:

echo -e 'PUSH_TO_TALK\x00\x01\x99\x00' | nc -U $XDG_RUNTIME_DIR/webcord-keybinds.sock

The payload begins with one of the keybind actions that Discord supports, which is terminated with a null (\x00) byte. The byte that follows is a boolean value - a true value (\x01) indicates a keyDown event, and a false value (\x00) indicates a keyUp event. Note that keybinds that aren't toggle-based, like PUSH_TO_TALK, generally only respond to keyUp events.

This feature works by pulling the user's keybinds, finding the key set for a particular command, and then simulating a keypress through setInputEvent on the main Electron window. While the implementation sounds weird, it's the best way to go about it without injecting code into the webpage to interface directly with the keybind handlers - something that I understand you want to avoid with WebCord for user safety reasons.

An extra package was added called keycode to translate the JavaScript keycodes into accelerators, that are then passed to setInputEvent. I've pinned this package to the latest version just in case - if you'd like this unpinned or removed entirely then let me know.

While the changes in this pull request allow for sending key events for any keybindings that have been set, of course there's no actual mechanism to set any of these keybinds apart from PUSH_TO_TALK, because Discord disables the keybinding interface on the web version. I was thinking of potentially adding an option to configure keybindings via the socket interface, but opted against it for this pull request. At least now, any future global keybinds interface can be made to work for both Wayland and X11.

If there's any issues or concerns with the socket payload format, or I missed something else in the implementation, feel free to suggest changes and I can sort.

@SpacingBat3
Copy link
Owner

SpacingBat3 commented Sep 10, 2022

To be honest, the current implementation of this is too complex to be pushed undocumented into master, I may work on providing documentation for it and maybe the documentation for Wayland in general. I also dislike announcing yet another dependency and that privileged process sends key code to the WebCord rather than the opposite. Personally, I would also change a packet structure (feel free to discuss about it through) for this reason and the way WebCord communicates with the clients.

Below are my suggestions, they're meant to regularly change – I'll add more details to it based on what I'll decide.


A currently proposed (generic, maybe incomplete) structure of the binary packets:

Offset (byte) Type Description
0 UTF-8 encoded string: WKB. Packet header
3 Raw byte Message type
4 Raw byte Command
5 Multiple / other Data (undecided yet how to encode; might only exist in server replies).

Message types:

  • 0x00reserved, possibly for some kind of the authentication (like verifying if the process is privileged?); it doesn't have to be implemented right now, but it is better to keep a space for it.
  • 0x01PUSH_TO_TALK

Commands:

  • 0x00 – query, for push-to-talk messages: it sends back to client the keybind (either as a string or key code, depending how we would decide the returned data should be encoded).
  • 0x01 – post, for push-to-talk messages: it sends an empty message to server that will communicate back and reply with the result (0x00 means success, other codes mean error).

A proposed human-friendly communication diagram (I hope it clearly shows what I want to achieve):

sequenceDiagram
    participant C as Socket client;
    participant S as Socket server;
    participant W as Website (IPC);
    activate C;
    activate S;
    Note over C,S: Authentication (maybe?);
    deactivate S;
    C->>S: What is your keybind for N?;
    activate S;
    S->>W: What is registered keybind for N?
    activate W;
    Note over W: Query localStorage
    W->>S: The registered keybind is XYZ.;
    deactivate W;
    S->>C: The registered keybind is XYZ.;
    deactivate S;
    Note over C: Listen to keybind
    C->>S: Emit keybind for N.;
    activate S;
    Note over S: Emit key event
    S->>C: Done! Result code: X.;
    deactivate S;
    deactivate C;
ℹ️ The communication between socket server (main process) and website (renderer process) is transmitted via Electron's IPC and it won't be encoded in the same way as packets sent between socket server and client.

Copy link
Owner

@SpacingBat3 SpacingBat3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're also some issues with the checks whenever socket should be opened. I think WebCord should not approach starting it on non Freedesktop spec compilant platforms and non-Wayland servers.

console.error("[Keybinds] $XDG_RUNTIME_DIR is not defined. Will not bind socket.");
return;
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a platform check, obviously we don't want to open UNIX socket on Windows and maybe even on the macOS. Maybe a check for Wayland flags should also be done (i.e. --ozone-platform) – I don't think X11 users will benefit from the socket if WebCord could just register the keybind on its own.

Copy link
Author

@epetousis epetousis Sep 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oversight on my part. Have added a check for --ozone-platform=wayland and a check for Win32/Darwin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to add a check for --ozone-platform-hint=auto.

sources/code/main/modules/keybind-socket.ts Outdated Show resolved Hide resolved
@epetousis
Copy link
Author

epetousis commented Sep 11, 2022

I'm a fan of the proposed packet structure. In regards to the communication spec, to clarify - when actually triggering a keybind, is this still done by simply sending a keybind type? If that's the case, it would make sense to use the data bytes to specify "keyUp" or "keyDown" - while it's probably good to have a way to simply toggle a keybind with one packet as well, setting keybind in programs like sway or swhkd support a key release event, and I'd want to make sure that the key press and key release events correspond to the same events in WebCord.

If your proposal is that Discord's keybinds are triggered by sending the corresponding keycodes (rather than the name of a keybind e.g PUSH_TO_TALK) to WebCord, I'm not sure if I feel great about the user being able to forward arbitrary keys to WebCord. But I'm not totally sure if this is what you meant, so forgive me if I'm off track.

Apart from that, this makes much more sense, and creating an enum based off of keycode types from Discord would be smart as well, something like:

enum DiscordKeybinds {
  TOGGLE_PRIORITY_SPEAKER = 1,
  PUSH_TO_TALK,
  PUSH_TO_TALK_PRIORITY,
  PUSH_TO_MUTE,
  TOGGLE_MUTE,
  TOGGLE_DEAFEN,
  TOGGLE_VOICE_MODE,
  TOGGLE_OVERLAY,
  TOGGLE_OVERLAY_INPUT_LOCK,
  OVERLAY_ACTIVATE_REGION_TEXT_WIDGET,
  TOGGLE_GO_LIVE_STREAMING,
  TOGGLE_STREAMER_MODE,
  NAVIGATE_BACK,
  NAVIGATE_FORWARD,
  SOUNDBOARD,
  SOUNDBOARD_HOLD
}

I also dislike announcing yet another dependency and that privileged process sends key code to the WebCord rather than the opposite.

I understand the dependency being an issue, I'm not sure if there's a better way to take JavaScript keycodes and convert them into the accelerators that sendInputEvent requires, but if we can do it library free then I'll happily go for that. In terms of the privileged process sending keycodes to WebCord, I'm not quite sure what you mean here, could you clarify? A process sends a keybind "type" (e.g PUSH_TO_TALK) to WebCord, then WebCord pulls the specific keybind and its corresponding bound keycode(s) from the user's localStorage, and then uses sendInputEvent. The external process currently doesn't even know what the corresponding keycode(s) are (a limitation you've addressed, of course) - I'm not sure if this is the privileged process you were referring to.

Sorry for the massive response, and I'll mark this pull request as a draft for the time being until we're both happy with the way we want this socket implementation to look.

@epetousis epetousis marked this pull request as draft September 11, 2022 02:43
@SpacingBat3
Copy link
Owner

I'm a fan of the proposed packet structure. In regards to the communication spec, to clarify - when actually triggering a keybind, is this still done by simply sending a keybind type?

When "registering", WebCord knows the keybind that user specified in localStorage. When client wants to register the same keybind as in Discord, it would query WebCord for that keybind to be actually capable of doing that. However, I actually forgot to show there how socket should behave when the keybing changes. That adds yet another command, sync: WebCord would send a message back to connected client each time when the user changes the keybind in Discord.

But going back to question: I want from sent packets to the server/socket to represent information: I want you to emit a keybind for PUSH_TO_TALK, whatever it is rather than Emit CTRL+V in the renderer. The format of the keybind sent back to client is undecided, yet I think it might be dependant on the arrays I've found in localStorage.

I understand the dependency being an issue, I'm not sure if there's a better way to take JavaScript keycodes and convert them into the accelerators that sendInputEvent requires, but if we can do it library free then I'll happily go for that.

With my communication structure, that should be possible, as Discord saves keybinds as array of tuples of three values, where the second value actually seems to be a keycode (not sure about two others through, which stands for a question if we really need to care about them).

Because Wayland forbids processes from listening to system-wide
keypresses, Discord and WebCord are unable to listen for keybinds
when the window doesn't have focus. This commit adds a socket server
that allows for sending a payload to forward keybinds to WebCord. An
example of activating the push-to-talk key is:

`echo -e 'PUSH_TO_TALK\x00\x01\x99\x00' | nc -U $XDG_RUNTIME_DIR/webcord-keybinds.sock`

The payload begins with one of the keybind actions that Discord
supports, which is terminated with a null (\x00) byte. The byte that
follows is a boolean value - a true value (\x01) indicates a keyDown
event, and a false value (\x00) indicates a keyUp event. Note that
keybinds that aren't toggle-based, like PUSH_TO_TALK, generally only
respond to keyUp events.
@bgkillas
Copy link

would it not be better to just use flatpak/xdg-desktop-portal#711 ? like we gotta build of master branch but better this is quite alot

@epetousis
Copy link
Author

@bgkillas last time I took a look at the initial proposal for this, it seemed that the Flatpak maintainers weren't interested in adding separate shortcuts for key releasing - but it looks like this may have changed at some point? If so, then it definitely renders this pull request obsolete.

@epetousis
Copy link
Author

epetousis commented Sep 24, 2022

Took another look at the pull request, the author added "activated" and "deactivated" signals for shortcuts which allow us to determine when the PTT key is pressed and released. @SpacingBat3 I'll be closing this pull request, I see no reason to work on this any further since a much cleaner solution can be implemented with xdg-desktop-portal (not by us, to be clear - this is something that should be done in Electron/Chromium). Let me know if you disagree and would prefer to pursue this option.

@epetousis epetousis closed this Sep 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants