iOS Platform Support - Keyboard Extension #967

that-lucas · 2026-03-05T10:37:50Z

that-lucas
Mar 5, 2026

Hi! I'm interested in contributing iOS support for Handy as a custom keyboard extension.

The idea is essentially what Wispr Flow does on iOS, but fully offline and open-source: a system-wide keyboard that you install once, enable in Settings, and then use from any app. You tap the mic button on the keyboard, speak, and text appears directly in the input field. No app-switching, no clipboard gymnastics.

This means building an iOS keyboard extension (UIInputViewController) with Full Access enabled for microphone access, powered by transcribe-rs for on-device transcription.

Prior mobile discussion for context:

[ Question ] Any plans to make android app? #171 -- original mobile app question
Android app? #195 -- Android discussion (open)
Android Platform Support - Implementation Tracker #496 -- Android implementation tracker (closed, never implemented)
@notune's standalone Android app using transcribe-rs

The codebase already has some Tauri 2.x mobile markers (mobile_entry_point in lib.rs, platform-conditional deps in Cargo.toml), but a keyboard extension is a fundamentally different architecture from the desktop app.

Repo structure:

Given how different this is from the desktop app (native Swift/SwiftUI, keyboard extension lifecycle, no Tauri webview), would you prefer:

A separate repo depending on transcribe-rs as a library
An iOS target within this repo

Any thoughts on feasibility or interest?

Thanks!

cjpais · 2026-03-05T11:50:26Z

cjpais
Mar 5, 2026
Maintainer

It will not be part of this project, however if you want the start of some source code I have a full swift project built out that is partially working already

I didn't use transcribe rs and ended up building native bindings in swift to the models I wanted

0 replies

that-lucas · 2026-03-05T12:19:13Z

that-lucas
Mar 5, 2026
Author

Thanks for the quick reply! Good to know on the repo structure.

I'd love to take a look at your Swift project if you're open to sharing it, even partially. The architecture decisions alone would be really valuable, specifically:

How did you handle the keyboard extension memory limits (~30-40 MB dirty)? Two-process design with the containing app running inference?
Were you able to get microphone access working inside the extension process itself (with Full Access), or does audio capture happen in the containing app too?
Which models / bindings did you land on? (WhisperKit, direct CoreML, whisper.cpp?)

Happy to collaborate or continue independently, just want to avoid re-discovering the same dead ends.

0 replies

cjpais · 2026-03-05T12:29:16Z

cjpais
Mar 5, 2026
Maintainer

memory limits i believe are partially solved by the answer to the next question. ie. two process design (this is also what flow does)
nope. has to be done in the containing app, leaving an audio session running as far as i can tell
i used moonshine, and the onnx bindings, as you cant use the GPU in a background thread (or at least when i tried i got a crash pointing at the gpu)

let me put up the repo and ill edit this reply

1 reply

cjpais Mar 5, 2026
Maintainer

ok instead of edit, here: https://github.com/cjpais/slop-handy-keyboard-ios

I basically didn't write any of this code myself. I barely even reviewed it. It's messy and the project barely works, but it does demonstrate the possibility. Also, in hindsight, because you can't run on the GPU, I probably would have picked different models because I was trying to stay within that memory limit, but now the fact is that the memory limit doesn't apply. So using slightly bigger models probably makes sense as long as the transcription performance is still relatively fast, and on faster Apple devices, it seems to be okay, but maybe not great. It would be much more beneficial if Apple just let us use the GPU.

that-lucas · 2026-03-05T13:56:18Z

that-lucas
Mar 5, 2026
Author

Thanks a lot for sharing this, really generous of you, especially given how detailed your answers were on the architecture constraints. The two-process design, the GPU limitation in background threads, and the Moonshine/ONNX choice all saved me from going down dead ends.

I've cloned the repo and read through the full codebase. The architecture is solid, DictationBridge state machine, the session keepalive, audio pipeline with sample rate conversion, it's a great reference even if you call it slop.

Quick question on licensing: the repo doesn't have a license file. Would you be open to adding one (MIT or similar)? I'd like to either fork this as a starting point or rebuild from scratch using the same architecture, and I want to make sure I'm doing it right. Either way I'd credit this repo as the original reference.

Also curious, you mentioned you'd pick different (bigger) models in hindsight since the memory limit doesn't apply to the containing app. Have you looked at Parakeet TDT via ONNX, or are you sticking with Moonshine's newer streaming models (medium-streaming looks interesting at 245M params, lower WER than Whisper Large v3)?

5 replies

cjpais Mar 5, 2026
Maintainer

No problem. The dead ends were annoying for sure. It's mostly called slop because I didn't write any of the code nor review it, it was all Opus 4.6. I lightly guided it and stopped it from getting sucked into some loops.

I would probably look at using Parakeet since it's what I use primarily on the desktop. However there's probably smaller models that might be better at this point. I haven't used Moonshine's models too much, but the speed of them is maybe leaving a little bit to be desired sometimes relative to their size. Parakeet despite the size seems to run very efficiently on the whole. Otherwise I'd look towards sherpa-onnx and the whole collection of models supported by that team

that-lucas Mar 5, 2026
Author

Thanks again for sharing the repo, and for adding the MIT license.

Your notes were super helpful, especially the architecture constraints and model tradeoffs. Also appreciated the honesty around authorship, basically AI-generated end-to-end, but well, it won't be any different here.

My plan from here:

If I fork your repo directly, I’ll keep the license intact.
If I build a new repo and reuse parts of this code, I’ll keep proper attribution and include the original MIT notice where applicable.
I should also add a clear reference to you and the original repo in my README.

If you’re good with that, I’ll proceed and share results back in this thread, but going offline for the next 4 weeks, so this might get some traction circa mid April.

cjpais Mar 7, 2026
Maintainer

@ankitchouhan1020 please do not use the handy brand.

ankitchouhan1020 Mar 8, 2026

Sure, deleted the repo for now. I'll try to rebuild without handy branding.

that-lucas Apr 7, 2026
Author

Sure, deleted the repo for now. I'll try to rebuild without handy branding.

Hi @ankitchouhan1020,

Just came back from long hols so lots to catch up before dedicating some time to this but I'll be interested to know more when you put this up again, please let me know!

Uh oh!

iOS Platform Support - Keyboard Extension #967

Uh oh!

that-lucas Mar 5, 2026

Replies: 4 comments · 6 replies

Uh oh!

Uh oh!

cjpais Mar 5, 2026 Maintainer

Uh oh!

that-lucas Mar 5, 2026 Author

Uh oh!

cjpais Mar 5, 2026 Maintainer

Uh oh!

Uh oh!

cjpais Mar 5, 2026 Maintainer

Uh oh!

that-lucas Mar 5, 2026 Author

Uh oh!

Uh oh!

cjpais Mar 5, 2026 Maintainer

Uh oh!

Uh oh!

that-lucas Mar 5, 2026 Author

Uh oh!

cjpais Mar 7, 2026 Maintainer

Uh oh!

Uh oh!

ankitchouhan1020 Mar 8, 2026

Uh oh!

that-lucas Apr 7, 2026 Author

that-lucas
Mar 5, 2026

Replies: 4 comments 6 replies

cjpais
Mar 5, 2026
Maintainer

that-lucas
Mar 5, 2026
Author

cjpais
Mar 5, 2026
Maintainer

cjpais Mar 5, 2026
Maintainer

that-lucas
Mar 5, 2026
Author

cjpais Mar 5, 2026
Maintainer

that-lucas Mar 5, 2026
Author

cjpais Mar 7, 2026
Maintainer

that-lucas Apr 7, 2026
Author