Automated process documentation for Windows. Perform a process once — OpenScribe captures your clicks, screenshots, and voice narration, then uses AI to generate a polished step-by-step Word document.
1. CAPTURE 2. PROCESS 3. AI ANALYSIS 4. DOCUMENT
Click-triggered OCR text from GPT-4o describes Annotated .docx
screenshots, screenshots, each step from with numbered
mouse/keyboard transcribe audio, screenshots + steps, red-box
hooks, mic audio build timeline metadata highlights
- Start a capture session — OpenScribe installs low-level hooks to detect your clicks and keystrokes
- Perform your process — every click triggers a full-resolution screenshot and logs coordinates, window title, and UI element info
- Stop recording — OpenScribe runs OCR, transcribes audio, and sends each step to Azure OpenAI (GPT-4o with vision)
- Export — generates a professional
.docxwith numbered steps, annotated screenshots, and clear instructions
- Click-triggered screenshots with UI element detection (via Windows UI Automation)
- Global mouse and keyboard capture with scope filtering (entire screen, single monitor, or specific window)
- Optional microphone recording with device picker and real-time level meter
- Optional screen video recording
- On-device OCR (Windows.Media.Ocr) for extracting button labels and field names
- Azure Speech transcription for voice narration
- AI-powered step description generation (Azure OpenAI / OpenAI GPT-4o vision)
- Annotated screenshots with red highlight boxes and step number badges (SkiaSharp)
- Word document generation with customizable templates (Open XML SDK)
- Session review and step editing before export
- Floating overlay toolbar during capture (Record / Pause / Stop)
- Windows 10 (1903+) or Windows 11
- .NET 8 SDK (download)
- Azure OpenAI Service or OpenAI API access for AI-powered step generation
- Azure Speech Service (optional) for voice transcription
git clone https://github.com/YOUR-USERNAME/OpenScribe.git
cd OpenScribe
dotnet builddotnet run --project src/OpenScribe.Appcd build
.\Build-OpenScribe.ps1 -Version "1.0.1" -SkipSignThis produces a self-contained installer in build/artifacts/. Use -Architecture arm64 for ARM builds.
On first launch, go to Settings and configure:
- AI Provider — Azure OpenAI endpoint + deployment name (or OpenAI API key)
- Azure Speech (optional) — subscription key and region for voice transcription
- General — organization name, default author, crop region size
Settings are stored in %LocalAppData%\OpenScribe\usersettings.json.
| Project | Purpose |
|---|---|
OpenScribe.App |
WinUI 3 desktop shell (MVVM with CommunityToolkit.Mvvm) |
OpenScribe.Capture |
Screen capture, global input hooks, audio recording (NAudio) |
OpenScribe.Processing |
OCR, audio transcription, timeline building |
OpenScribe.AI |
Azure OpenAI / OpenAI integration, prompt templates |
OpenScribe.DocGen |
Word document generation (Open XML SDK), screenshot annotation (SkiaSharp) |
OpenScribe.Core |
Shared interfaces, models, configuration |
OpenScribe.Data |
SQLite persistence via EF Core |
See PLAN.md for detailed architecture and design decisions.
- .NET 8 + WinUI 3 (Windows App SDK)
- NAudio for microphone recording
- Win32 low-level hooks (SetWindowsHookEx) for input capture
- Windows.Media.Ocr for on-device text extraction
- Azure.AI.OpenAI SDK for GPT-4o vision calls
- Microsoft.CognitiveServices.Speech for transcription
- DocumentFormat.OpenXml for .docx generation
- SkiaSharp for screenshot annotation
- SQLite + EF Core for local data storage