墨声 MoSheng

Voice, into ink. — 声音，化为笔墨。

What is MoSheng?

MoSheng (墨声) is a local voice input tool. Hold a hotkey, speak, release — your words are instantly transcribed and pasted into any application.

Powered by Qwen3-ASR. Runs 100% offline. No cloud, no data leaves your machine.

Features

🎤 Dual Hotkeys — Push-to-talk / toggle mode with customizable keys
⚡ Progressive Input — auto-injects text on speech pauses, no need to wait until you finish
✂️ Text Processing — removes filler words (嗯/呃/um/uh…) and converts pauses to commas in progressive mode
🔒 Speaker Verification — SpeechBrain ECAPA-TDNN two-stage verification, responds only to your voice
🔄 Multiple Models — Qwen3-ASR-1.7B (accurate) / 0.6B (lightweight), switch in settings
📖 Custom Vocabulary — import CSV/TXT word lists to boost recognition of domain terms
🎨 GPU Shader Overlay — real-time audio visualization with QML + GLSL fragment shader
🪟 Native UI — dark theme with platform-native effects
🌐 Bilingual — Chinese / English interface, auto-detected
🚀 Autostart — one-click startup toggle
📦 One-click Install — download, run, done

Platform Support

	Windows	macOS
OS	Windows 10 / 11	macOS 13+ (Apple Silicon)
GPU	NVIDIA CUDA 12.8	Apple MPS (Metal)
Default Hotkey (PTT)	`CapsLock`	`Right ⌘`
Default Hotkey (Toggle)	`Right Ctrl`	`Fn + F5`
Autostart	Registry	launchd
UI Effect	DWM Acrylic	Native Qt

Requirements

Windows

Item	Requirement
GPU	NVIDIA GPU with CUDA 12.8 (RTX 30 series or above recommended)
VRAM	~4 GB for 1.7B model / ~2 GB for 0.6B model
Disk	~5 GB (model + dependencies)

macOS

Item	Requirement
Chip	Apple Silicon (M1 / M2 / M3 / M4)
RAM	16 GB recommended
Disk	~5 GB (model + dependencies)
Permissions	Accessibility + Microphone

⚠️ macOS users: Use the 1.7B model. Our benchmarks show 1.7B is 5-10× faster than 0.6B on Apple Silicon MPS.

Installation

Windows

Option A: Distribution Package (Recommended)

Download MoSheng-vX.X.X-win64.zip from Releases
Extract to any directory
Run MoSheng.exe
First launch auto-installs Python environment and dependencies (~5 min)
First launch downloads the ASR model (~3.4 GB)

Option B: From Source

git clone https://github.com/bensenx/MoSheng.git
cd MoSheng
uv run python main.py

macOS

Option A: DMG Install (Recommended)

Download MoSheng-v1.1.0-macOS.dmg from Releases
Open the DMG and drag MoSheng to Applications
First launch: Right-click MoSheng.app → Open → click Open (Gatekeeper one-time prompt)
Grant Accessibility permission when prompted (System Settings → Privacy & Security → Accessibility)
Grant Microphone permission when prompted
First launch auto-installs Python dependencies via uv (~3 min)
First launch downloads the ASR model (~3.4 GB)

Gatekeeper note: MoSheng is not notarized with Apple. On first launch, macOS will show a security warning. Right-click → Open bypasses this. Alternatively, run in Terminal:
xattr -cr /Applications/MoSheng.app

Option B: From Source

git clone https://github.com/bensenx/MoSheng.git
cd MoSheng
git checkout macos
uv run python main.py

Option C: Install Script

curl -fsSL https://raw.githubusercontent.com/bensenx/MoSheng/macos/scripts/install_macos.sh | bash

Configuration

Right-click the system tray icon → Settings to open the settings window.

Hotkeys

Mode	Windows Default	macOS Default	Description
Push-to-talk (PTT)	`CapsLock`	`Right ⌘`	Hold to record, release to transcribe
Toggle	`Right Ctrl`	`Fn + F5`	Press to start, press again to stop

Hotkeys are fully customizable. PTT mode has a 300 ms long-press threshold to prevent accidental triggers.

Progressive Input

When enabled, text is automatically injected after a silence of 0.8 seconds — no need to wait until you finish speaking. Great for long paragraphs.

Text Processing

Two optional post-processing steps applied after transcription:

Remove filler words — strips common hesitation words: Chinese (嗯、呃、哦、那个、然后…) and English (um, uh, hmm, er…)
Smart punctuation — in progressive mode, intermediate pauses become commas instead of periods; the final period is injected only at the end, producing flowing prose rather than choppy sentences

Both can be toggled independently in Settings → Text Processing.

Speaker Verification

When enabled, only your registered voice is recognized. Go to Settings → Enroll Voice and record 3 audio segments.

Note: Very short words (<1 second) may be incorrectly filtered when speaker verification is enabled, as there is insufficient audio for confident speaker identification.

Custom Vocabulary

Add domain terms or names to ~/.mosheng/vocabulary.csv (one per line) to improve recognition accuracy.

Benchmark

Real-world voice test on Apple M4 / 16GB / macOS 15.3.1:

Test	1.7B	0.6B
Simple Chinese	1.06s ✅	3.43s ✅
Daily conversation	1.26s ✅	3.73s ✅
Numbers & time	1.76s ✅	17.35s ⚠️
Mixed CN/EN	1.58s ✅	16.40s ⚠️
Technical terms	1.78s ✅	19.58s ⚠️
Long sentence	3.64s ✅	32.50s 🐌
English	1.22s ✅	12.01s ⚠️
Punctuation & tone	1.20s ✅	9.45s ⚠️

→ On Apple Silicon, 1.7B is 5-10× faster than 0.6B with identical accuracy.

Full benchmark: results/benchmark.md

Tech Stack

Layer	Technology	Notes
ASR	Qwen3-ASR	1.7B / 0.6B
Speaker Verification	SpeechBrain ECAPA-TDNN	192-dim embeddings
UI	PySide6 (Qt 6)	Cross-platform
Audio Overlay	QML + GLSL Shader	GPU-rendered visualization
GPU	PyTorch (CUDA / MPS)	NVIDIA or Apple Silicon
Package Manager	UV	Fast dependency resolution

Building

Windows

uv run python scripts/build_dist.py

macOS

uv run python scripts/build_macos.py

Produces dist/MoSheng.app. To create DMG:

hdiutil create -volname MoSheng -srcfolder dist/MoSheng.app -ov -format UDZO dist/MoSheng.dmg

Project Structure

main.py                  Entry point
config.py                Default configuration
i18n.py                  Internationalization (zh/en)
settings_manager.py      Settings persistence
core/
  asr_qwen.py            Qwen3-ASR engine
  audio_recorder.py      Audio recording (sounddevice)
  speaker_verifier.py    Speaker verification (SpeechBrain)
  text_injector.py       Text injection (Ctrl+V / Cmd+V)
  hotkey_manager.py      Hotkey management (Win32 / CGEventTap)
  key_suppression_hook.py  Key suppression (Windows only)
  model_downloader.py    Model download manager
ui/
  app.py                 App coordinator (tray + worker)
  overlay_window.py      Audio overlay (QML Shader)
  overlay.qml            QML scene
  settings_window.py     Settings window
  splash_screen.py       Splash screen
  styles.py              Theme styles
  enrollment_dialog.py   Voice enrollment dialog
utils/
  autostart.py           Autostart (Registry / launchd)
  logger.py              Logging config
scripts/
  build_dist.py          Windows distribution build
  build_macos.py         macOS .app + DMG build
  benchmark_models.py    Model benchmark script
assets/
  shaders/smoke.frag     GLSL fragment shader
results/
  benchmark.md           Performance benchmark report

简介

墨声 (MoSheng) 是一款本地智能语音输入工具，支持 Windows 和 macOS。

按住快捷键说话 → 松手 → 文字自动粘贴到任意应用。

基于 Qwen3-ASR，100% 本地运行，无需网络，隐私安全。

功能亮点

🎤 双快捷键 — 按住录音 (PTT) / 按键切换，可自定义
⚡ 渐进式输入 — 说话停顿时自动注入已识别文本，无需等待说完
🔒 声纹识别 — SpeechBrain ECAPA-TDNN 两级验证，只响应注册用户
🔄 多模型选择 — Qwen3-ASR-1.7B（精准）/ 0.6B（轻量）
📖 自定义词汇表 — CSV/TXT 导入 + 预置术语，提高专业词识别率
🎨 GPU Shader 可视化 — QML + GLSL 实时音频频谱动画
🪟 原生 UI — 深色主题，平台原生特效
🌐 中英双语 — 界面语言自动检测
🚀 开机自启 — 一键开关
📦 一键安装 — 下载即用

平台支持

	Windows	macOS
系统	Windows 10 / 11	macOS 13+ (Apple Silicon)
GPU	NVIDIA CUDA 12.8	Apple MPS (Metal)
默认快捷键 (PTT)	`CapsLock`	`右 ⌘`
默认快捷键 (Toggle)	`Right Ctrl`	`Fn + F5`
开机自启	注册表	launchd

系统要求

Windows

项目	要求
GPU	NVIDIA GPU，支持 CUDA 12.8（推荐 RTX 30 系以上）
显存	1.7B 模型 ~4GB / 0.6B 模型 ~2GB
磁盘	~5GB（含模型和依赖）

macOS

项目	要求
芯片	Apple Silicon (M1 / M2 / M3 / M4)
内存	建议 16GB
磁盘	~5GB（含模型和依赖）
权限	辅助功能 + 麦克风

⚠️ macOS 用户请使用 1.7B 模型。 我们的性能测试显示，在 Apple Silicon MPS 上 1.7B 比 0.6B 快 5-10 倍。

安装

Windows

方式 A：下载分发包（推荐）

从 Releases 下载最新的 MoSheng-vX.X.X-win64.zip
解压到任意目录
双击 MoSheng.exe
首次运行自动安装 Python 环境和依赖（需联网，约 5 分钟）
首次运行自动下载 ASR 模型（~3.4GB）

方式 B：源码运行

git clone https://github.com/bensenx/MoSheng.git
cd MoSheng
uv run python main.py

macOS

方式 A：DMG 安装（推荐）

从 Releases 下载 MoSheng-v1.1.0-macOS.dmg
打开 DMG，将 MoSheng 拖入 Applications（应用程序）
首次启动： 右键点击 MoSheng.app → 打开 → 点击打开（Gatekeeper 一次性提示）
授权辅助功能权限（系统设置 → 隐私与安全性 → 辅助功能）
授权麦克风权限
首次启动自动通过 uv 安装 Python 依赖（约 3 分钟）
首次启动自动下载 ASR 模型（~3.4GB）

安全提示： MoSheng 未经 Apple 公证。首次打开时 macOS 会显示安全警告，右键 → 打开即可绕过。也可在终端执行：
xattr -cr /Applications/MoSheng.app

方式 B：源码运行

git clone https://github.com/bensenx/MoSheng.git
cd MoSheng && git checkout macos
uv run python main.py

方式 C：安装脚本

curl -fsSL https://raw.githubusercontent.com/bensenx/MoSheng/macos/scripts/install_macos.sh | bash

性能测试

在 Apple M4 / 16GB / macOS 15.3.1 上的真实语音测试：

测试场景	1.7B	0.6B
简单中文	1.06秒 ✅	3.43秒 ✅
日常对话	1.26秒 ✅	3.73秒 ✅
数字和时间	1.76秒 ✅	17.35秒 ⚠️
中英混合	1.58秒 ✅	16.40秒 ⚠️
技术术语	1.78秒 ✅	19.58秒 ⚠️
长句子	3.64秒 ✅	32.50秒 🐌
纯英文	1.22秒 ✅	12.01秒 ⚠️
语气标点	1.20秒 ✅	9.45秒 ⚠️

→ 在 Apple Silicon 上，1.7B 比 0.6B 快 5-10 倍，准确率完全一致。

完整报告：results/benchmark.md

配置

右键系统托盘图标 → 「设置」打开设置窗口。

快捷键

模式	Windows 默认	macOS 默认	说明
按住录音 (PTT)	`CapsLock`	`右 ⌘`	按住说话，松手识别
切换录音	`Right Ctrl`	`Fn + F5`	按一次开始，再按一次停止

快捷键可在设置中自定义。PTT 模式有 300ms 长按阈值，避免误触。

渐进式输入

启用后，说话停顿超过 0.8 秒自动注入已识别文本，无需等待说完。适合长段落输入。

声纹识别

启用后，只识别注册用户的声音。在设置中点击「注册声纹」，录制 3 段语音。

自定义词汇表

在 ~/.mosheng/vocabulary.csv 中添加专业术语，每行一个，帮助提高识别率。

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
assets		assets
configs		configs
core		core
docs/plans		docs/plans
results		results
scripts		scripts
ui		ui
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
_setup.cmd		_setup.cmd
config.py		config.py
i18n.py		i18n.py
launcher.py		launcher.py
main.py		main.py
pyproject.toml		pyproject.toml
settings_manager.py		settings_manager.py
uv.lock		uv.lock

License

bensenx/MoSheng

Folders and files

Latest commit

History

Repository files navigation

墨声 MoSheng

What is MoSheng?

Features

Platform Support

Requirements

Windows

macOS

Installation

Windows

Option A: Distribution Package (Recommended)

Option B: From Source

macOS

Option A: DMG Install (Recommended)

Option B: From Source

Option C: Install Script

Configuration

Hotkeys

Progressive Input

Text Processing

Speaker Verification

Custom Vocabulary

Benchmark

Tech Stack

Building

Windows

macOS

Project Structure

简介

功能亮点

平台支持

系统要求

Windows

macOS

安装

Windows

方式 A：下载分发包（推荐）

方式 B：源码运行

macOS

方式 A：DMG 安装（推荐）

方式 B：源码运行

方式 C：安装脚本

性能测试

配置

快捷键

渐进式输入

声纹识别

自定义词汇表

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Uh oh!

Languages

Packages