Skip to content

Add voxtype modifier suppression during transcription#4178

Closed
konnsim wants to merge 6 commits into
basecamp:devfrom
konnsim:fix/voxtype-modifier-suppression
Closed

Add voxtype modifier suppression during transcription#4178
konnsim wants to merge 6 commits into
basecamp:devfrom
konnsim:fix/voxtype-modifier-suppression

Conversation

@konnsim
Copy link
Copy Markdown

@konnsim konnsim commented Jan 9, 2026

Summary

Adds a Hyprland submap that blocks modifier keys (SUPER, CTRL, ALT) while voxtype is typing transcribed text. This prevents held modifier keys from triggering window manager shortcuts during output.

Problem

When using voxtype with push-to-talk (SUPER+CTRL+X), if the user releases keys slowly or in the wrong order, modifiers may still be held while voxtype types the transcription. This causes typed characters to trigger shortcuts instead of inserting text.

Solution

  • Add a voxtype_suppress submap to Hyprland bindings that blocks modifier keys
  • Configure voxtype to activate/deactivate the submap via pre_output_command/post_output_command hooks
  • Include F12 as emergency escape if voxtype fails to reset the submap
  • Migration for existing voxtype installations

Dependencies

⚠️ This PR depends on voxtype 0.4.10+ which includes the pre_output_command/post_output_command hooks.

Changes

  • default/hypr/bindings/utilities.conf - Add voxtype_suppress submap
  • default/voxtype/config.toml - Add hook configuration
  • migrations/1767939322.sh - Add hooks to existing voxtype configs

Test plan

  • Voxtype 0.4.10 released with hooks feature
  • Test with SUPER+CTRL+X hold, release in various orders
  • Verify modifiers don't trigger shortcuts during transcription
  • Verify F12 escapes the submap if stuck

Related: #4159

Adds a Hyprland submap that blocks modifier keys (SUPER, CTRL, ALT)
while voxtype is typing transcribed text. This prevents held modifier
keys from triggering window manager shortcuts during output.

The fix uses voxtype's pre_output_command/post_output_command hooks
to activate/deactivate the submap automatically.

Includes F12 as emergency escape if voxtype fails to reset the submap.

Fixes: basecamp#4159
@peteonrails
Copy link
Copy Markdown
Contributor

I am about to tag 0.4.10 which will have the upstream fix. I added a setup command that might be useful: although it writes to hypr/conf.d instead of hypr/bindings. It does take a --show switch that just prints the submap out.

@konnsim
Copy link
Copy Markdown
Author

konnsim commented Jan 9, 2026

Thanks for the heads up on 0.4.10! I'll keep the submap in utilities.conf rather than using voxtype setup hyprland since omarchy uses explicit source statements for each config file instead of the conf.d pattern. Happy to change the approach if the maintainers prefer something different though.

@konnsim konnsim marked this pull request as ready for review January 9, 2026 06:47
@peteonrails
Copy link
Copy Markdown
Contributor

Thanks for the heads up on 0.4.10! I'll keep the submap in utilities.conf rather than using voxtype setup hyprland since omarchy uses explicit source statements for each config file instead of the conf.d pattern. Happy to change the approach if the maintainers prefer something different though.

Yep that sounds good! I would consider adding a voxtype setup hyprland --omarchy flag that did the right thing, but it seems like just getting the packaging right here on the distro is the better answer.

@peteonrails
Copy link
Copy Markdown
Contributor

@konnsim 0.4.10 is in the AUR, should reach all of the mirrors soon. Thanks for your collaboration!

@dhh
Copy link
Copy Markdown
Member

dhh commented Jan 9, 2026

@peteonrails Do we have to change anything on the Omarchy side? Or do the same bindd + binddr combos work?

@peteonrails
Copy link
Copy Markdown
Contributor

peteonrails commented Jan 9, 2026

@peteonrails Do we have to change anything on the Omarchy side? Or do the same bindd + binddr combos work?

@dhh The same bindd + bindr combos will work. Long term, I won't break backward compatibility without a discussion and coordination with distro owners.

As far as the "typing has started but I am holding the SUPER key" problem goes: the submap introduced in this PR does the trick. Since @konnsim and I coordinated on this response, there's nothing else that needs to change on Omarchy beyond this PR.

However, there is a related issue that is not addressed by this PR: #4159

The workaround suggests that using bindrn helps with the case where the user releases SUPER+CTRL and is still holding X -- I have not been fully through testing it though, so I would consider that early signal.

@konnsim
Copy link
Copy Markdown
Author

konnsim commented Jan 10, 2026

I had a good crack at fixing the issue of releasing CTRL/SUPER before X not triggering the stop binding and could get it to work if CTRL was released first, but not SUPER and it was flaky even then.

In the end I decided that it's actually kind of a hidden feature that allows you to do "hold to dictate" if you hold all 3 then release X, and "toggle dictation" if you hold all 3 and release one of the modifiers first then toggle off with SUPER + CTRL + X again, all with the same bind.

Comment thread default/hypr/bindings/utilities.conf Outdated
bind = , Control_R, exec, true
bind = , Alt_L, exec, true
bind = , Alt_R, exec, true
bind = , ESCAPE, submap, reset # Emergency escape if voxtype fails to call post_output_command
Copy link
Copy Markdown
Contributor

@peteonrails peteonrails Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a heads up - Voxtype 0.4.11 will include a voxtype record cancel command that aborts a recording or transcription in progress without injecting any text.

In my test build environment I have it bound to ESC, and it also resets the submap. Food for thought.

Copy link
Copy Markdown
Author

@konnsim konnsim Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does pre_output_command fire right before wtype/ydotool starts outputting?

I could update the submap binding to bind = , ESCAPE, exec, voxtype record cancel; hyprctl dispatch submap reset

That way ESCAPE works for intentional cancels and also as an emergency escape in case of mid-output crash.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does pre_output_command fire right before wtype/ydotool starts outputting?

I could update the submap binding to bind = , ESCAPE, exec, voxtype record cancel; hyprctl dispatch submap reset

That way ESCAPE works for intentional cancels and also as an emergency escape in case of mid-output crash.

It does, you can see the chain here if you are curious: https://github.com/peteonrails/voxtype/blob/9dd3a3c635b0d0585280c8ed2973698888efa987/src/output/mod.rs#L115

Copy link
Copy Markdown
Author

@konnsim konnsim Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even add another hook for on_recording_start that could be used to drop into a voxtype_recording submap that binds only ESCAPE so it can be used to cancel the dictation anywhere in the pipeline from recording -> transcription -> output without affecting modifiers until output begins.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that could be handy. I'm working on testing 0.4.11 but a couple extra hooks are a pretty light lift.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rush ofc I'm happy to open it as an enhancement in another PR if it doesn't make it into this.

Copy link
Copy Markdown
Author

@konnsim konnsim Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought on this some more, adding the voxtype record cancel command to my submap in this PR was pointless as it only gets entered into immediately before output begins (which that new command doesn't cancel).

The idea of a pre_recording_command hook (or whatever you wanted to call it) with it's own voxtype_recording submap with an ESCAPE binding calling voxtype record cancel is still valid though.

We would just need to move from the new voxtype_recording submap to the existing voxtype_suppress submap on the pre_output_command hook. Otherwise the changes in this PR would all stand as-is.

See issue peteonrails/voxtype#59 for details.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea quite a bit - I wasn't able to get it into 0.4.11, but I'm going to work on it this weekend for a possible 0.4.12.

@konnsim
Copy link
Copy Markdown
Author

konnsim commented Jan 10, 2026

tested changes with AUR voxtype 0.4.10, all working as expected.

@konnsim konnsim force-pushed the fix/voxtype-modifier-suppression branch from 372be24 to e54022a Compare January 10, 2026 09:12
@peteonrails
Copy link
Copy Markdown
Contributor

peteonrails commented Jan 10, 2026

tested changes with AUR voxtype 0.4.10, all working as expected.

@konnsim I'm tracking a related issue here peteonrails/voxtype#61 that a user with the sub map reported. They report that during text injection the first character is getting dropped when using the voxtype-suppress submap. I also ran into this last night while I was testing 0.4.11. I was able to reproduce it in 0.4.10, so I'm concerned that we might have a race condition.

I'm going to spend a little bit more time getting good data on the problem this weekend.

@peteonrails
Copy link
Copy Markdown
Contributor

tested changes with AUR voxtype 0.4.10, all working as expected.

@konnsim I'm tracking a related issue here peteonrails/voxtype#61

@konnsim

Kurtis,

Heads up: the issue turned out to be with binding the ESC key into the sub map. So I led you a little bit astray there. Sorry about that! I'm updating the submap output in a bug fix release and I think we might want to coordinate changing it on this PR so that it doesn't present to Omarchy users across the board.

--Pete

Binding ESCAPE in the submap causes wtype to drop the first character
of transcribed text. ESCAPE appears to clear compositor state during
submap transitions.

See: peteonrails/voxtype#61
@konnsim
Copy link
Copy Markdown
Author

konnsim commented Jan 11, 2026

tested changes with AUR voxtype 0.4.10, all working as expected.

@konnsim I'm tracking a related issue here peteonrails/voxtype#61

@konnsim

Kurtis,

Heads up: the issue turned out to be with binding the ESC key into the sub map. So I led you a little bit astray there. Sorry about that! I'm updating the submap output in a bug fix release and I think we might want to coordinate changing it on this PR so that it doesn't present to Omarchy users across the board.

--Pete

All good I just took a look and yes I could reproduce the first char dropping, but interestingly it was not happening when I continued to hold SUPER or CTRL as output starts, only when I quickly released all 3 binds.

I've just adjusted back to F12 to match your PR and we can look to go back to ESCAPE if/when a fix lands at the hyprland level.

…order quirk

- Add pre_recording_command hook to enter voxtype_recording submap
- Move dictation bindings to user's bindings.conf as adjusting binds will require adjusting new submap (install + migration)
- Document hold-to-record (release X to stop) and toggle modes through key release ordering
- ESCAPE cancels recording while in submap
@konnsim konnsim force-pushed the fix/voxtype-modifier-suppression branch from af69575 to 908dc9d Compare January 12, 2026 10:37
@konnsim
Copy link
Copy Markdown
Author

konnsim commented Jan 12, 2026

I've added support for the voxtype recording cancel command at recording/transcription phases through the pre_recording_command hook via a voxtype_recording submap.

Unfortunately because we need to drop into a submap to bind ESCAPE to cancel, voxtype record stop now needs to be bound inside that submap. To account for users that want to adjust the default binds I've lifted the voxtype record start binding and the new submap up to the .config/hypr/bindings.conf file with a migration + adjustment to the voxtype install script.
This might not be a tradeoff you're willing to make @dhh as it does clutter the user's bindings file, let me know your thoughts and I can revert to just the binding suppression fix on output if you don't like it.

I also added a comment to document how the release order of SUPER CTRL X effects stopping dictations as outlined here: #4159 and how it can be leveraged for a "toggle mode".

Comment thread bin/omarchy-voxtype-install Outdated
voxtype setup systemd

# Add voxtype bindings to hyprland config if not present
if [[ -f ~/.config/hypr/bindings.conf ]] && ! grep -q "voxtype_recording" ~/.config/hypr/bindings.conf; then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this. The default utilities.conf is included for everyone.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 54-56 of utilities.conf I remove the voxtype bindings.

I've lifted it up to bindings.conf so it can be changed wihout modifying omarchy files as the binddr needs to be in the voxtype_record submap and that can't be changed just by a top level unbind/binddr, it needs to be changed inside the submap.

@dhh
Copy link
Copy Markdown
Member

dhh commented Jan 12, 2026

@konnsim Yeah, I don't like the idea of exposing this big lump of bindings directly. Folks can always map whatever they want in there, but we have to find a setup that works great out of the box and that can live inside the default bindings.

Remove voxtype_recording submap and associated bindings - the added
complexity doesn't fit well in defaults and users wanting different
keybindings would need to modify the submap too.

Keep only the voxtype_suppress submap for modifier key suppression
during text output, which is the core fix for the original issue.
@konnsim
Copy link
Copy Markdown
Author

konnsim commented Jan 12, 2026

@peteonrails I can't think of another way to get the voxtype record cancel command usable aside from a submap. If you have any thoughts I'm happy to try something different in a new PR.

I'll leave this at just the modifier suppression fix now so it can get merged in to solve the immediate issue.

@peteonrails
Copy link
Copy Markdown
Contributor

peteonrails commented Jan 12, 2026

@peteonrails I can't think of another way to get the voxtype record cancel command usable aside from a submap. If you have any thoughts I'm happy to try something different in a new PR.

I'll leave this at just the modifier suppression fix now so it can get merged in to solve the immediate issue.

Fair enough -- the cancel button may not be something a broad swath of users need or want. I'll leave the feature in voxtype, but am happy to set this detail down until we figure out whether Omarchy users will or will not need it.

I'm going to focus on improving the documentation of 'troubleshooting potential keybinding issues' so that when/if an Omarchy user runs in to one of the issues we've seen, they have a playbook to follow.

If you need any other support in getting this PR approved by the maintainers, I'm ready to help.

@konnsim
Copy link
Copy Markdown
Author

konnsim commented Jan 12, 2026

Turns out there's an n flag for bind that makes the keypress non-consuming so we can just do binddn = , ESCAPE, Cancel dictation, exec, voxtype record cancel in utilities.conf alongside the other two voxtype binds while maintaining all the other functionality of ESCAPE in nvim etc.

Just one catch though (and this is likely related to the ESCAPE in submaps issue), a non-consumed ESCAPE passed through to a terminal window starts an escape sequence, so the next character typed is consumed as part of that sequence. So if you go dictate -> cancel -> dictate the output of the second dictation has its first char cut off (which is instead interpreted as an escape sequence so can have strange side effects). In non-terminal windows this doesn't happen and it works flawlessly (OpenCode/Claude Code work without issues too).

I think that's the best we're going to get with ESCAPE as the cancel bind, of course we could always use a non-conflicting dedicated cancel bind and avoid any potential pass-through weirdness. What's your thoughts @peteonrails?

@peteonrails
Copy link
Copy Markdown
Contributor

Just one catch though (and this is likely related to the ESCAPE in submaps issue), a non-consumed ESCAPE passed through to a terminal window starts an escape sequence, so the next character typed is consumed as part of that sequence. So if you go dictate -> cancel -> dictate the output of the second dictation has its first char cut off (which is instead interpreted as an escape sequence so can have strange side effects).

I think we can't do this to users. I think it's better to just leave cancel out of the mapping than it is to ship a "cancel" key that is guaranteed to screw up the next transcription. We should solve this with good documentation instead - if people need a CANCEL mapping, I think we let them map their own preferred key and warn them against using ESC.

@konnsim
Copy link
Copy Markdown
Author

konnsim commented Jan 16, 2026

I agree, this is gtg then, did you want me to squash my commits or happy to do it on merge?

@dhh dhh closed this in 91470cb Feb 20, 2026
@konnsim
Copy link
Copy Markdown
Author

konnsim commented Feb 26, 2026

@dhh just a heads up that changing to toggle keybinds doesn't resolve the issue this PR is solving.
There was a lot of back and forth here so just to make it really clear, this PR prevents the output of voice dictation (wtype) from being effected by held modifier keys, something that still occurs when using toggle dictation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants