⌜ ScreenScribe ⌟

AI-powered video review automation for screencast commentary analysis. Transform your voice descriptions into actionable findings for engineering teams.

Overview

ScreenScribe extracts actionable insights from screencast recordings by transcribing audio commentary, detecting mentions of bugs, changes, and UI issues, capturing relevant screenshots, and generating comprehensive reports with AI-powered semantic analysis.

Status: v0.1.9 — HTML Pro reports + Interactive Review Server + separated review window + ZIP export.

Read the Full Showcase

API Provider

ScreenScribe uses LibraxisAI API by default, but is fully compatible with any provider supporting the Responses API (OpenAI, Anthropic, etc.).

Multi-Provider Setup (Recommended)

Use different providers for different tasks — e.g., LibraxisAI for cheaper STT, OpenAI for VLM:

# ~/.config/screenscribe/config.env

# Per-endpoint API keys (hybrid setup)
LIBRAXIS_API_KEY=your-libraxis-key              # → STT (cheaper transcription)
OPENAI_API_KEY=sk-proj-xxx              # → VLM (unified analysis)

# Explicit endpoints (full URLs - recommended)
SCREENSCRIBE_STT_ENDPOINT=https://api.libraxis.cloud/v1/audio/transcriptions
SCREENSCRIBE_LLM_ENDPOINT=https://api.openai.com/v1/responses
SCREENSCRIBE_VISION_ENDPOINT=https://api.openai.com/v1/responses

Features

Interactive Review Server (review): local web app with synchronized player, browser-based STT, manual frame capture, and disk persistence.
Separated Review Window: detach the findings/review surface into its own synced browser window while keeping the player open.
HTML Pro Report: Interactive report with video player, subtitle sync, annotation tools (pen/rect/arrow), and ZIP export.
Transcript-First Workflow (preprocess): Build an artifact bundle with stable transcripts before running expensive AI analysis.
Unified VLM Pipeline: Single VLM call analyzes screenshot + full transcript together (~45% faster than separate LLM+VLM).
Batch Mode: Process multiple videos with shared context — VLM remembers findings across videos.
Audio Quality Validation: Detects silent recordings and warns about missing microphone input.
Resumable Pipeline: Checkpoint system allows resuming interrupted processing.
i18n Support: Prompts adapt to selected language (Polish, English).

Tech Stack

Component	Technology	Purpose
CLI Framework	Typer	Modern, type-hinted CLI
HTTP Client	httpx	Async-ready, long timeout support
Terminal UI	Rich	Progress bars, panels, tables
Media Processing	FFmpeg / FFprobe	Audio extraction, frame capture
AI Backend	LibraxisAI / OpenAI	STT, LLM, Vision models
Package Manager	uv	Fast, modern Python packaging

Installation

Prerequisites

Python 3.11+
uv package manager
FFmpeg (for audio/video processing)
LibraxisAI or OpenAI API key

Install ScreenScribe

# Clone the repository
git clone https://github.com/VetCoders/ScreenScribe.git
cd Screenscribe

# Install globally using uv (recommended)
make install

# Verify installation
screenscribe version

Quick Start

# Full analysis and launch interactive review server
screenscribe review path/to/video.mov

# Fast mode: keyword-based detection only (no LLM pre-filter)
screenscribe review video.mov --keywords-only

# Transcript-first artifact bundle for downstream review
screenscribe preprocess video.mov

Configure API Key

# Initialize config and set API key
screenscribe config --init
screenscribe config --set-key YOUR_LIBRAXIS_API_KEY

# Or manually edit ~/.config/screenscribe/config.env

In HTML Pro review mode, use Open Review Window to detach the findings panel into a separate synced window.

More Review Commands

# See time estimate before processing
screenscribe review video.mov --estimate

# Dry run: transcribe + detect only (no AI, no screenshots)
screenscribe review video.mov --dry-run

# Output to specific directory
screenscribe review video.mov -o ./my-review

# Skip vision analysis (faster)
screenscribe review video.mov --no-vision

# Resume interrupted processing
screenscribe review video.mov --resume

# Use custom keywords
screenscribe review video.mov --keywords-file my_keywords.yaml

# English language (affects transcription + AI prompts)
screenscribe review video.mov --lang en

# Transcription only
screenscribe transcribe video.mov -o transcript.txt

How It Works

flowchart TD
    A[Video File] --> B[Audio Extraction]
    B --> C[Transcription]
    C --> D{Detection Mode}
    D -->|Default| D1[Semantic Pre-Filter]
    D -->|--keywords-only| D2[Keyword Matching]
    D1 --> E[Screenshots]
    D2 --> E
    E --> F[Semantic Analysis]
    F --> G[Vision Analysis]
    G --> H[Report Generation]

    B -.- B1[FFmpeg extracts audio track]
    C -.- C1[LibraxisAI STT with timestamps]
    D1 -.- D1a[LLM analyzes full transcript]
    D2 -.- D2a[Regex keyword matching - PL/EN]
    E -.- E1[FFmpeg frame extraction]
    F -.- F1[LLM severity + action items]
    G -.- G1[VLM screenshot analysis]
    H -.- H1[JSON + Markdown output]

Detection Modes

Mode	Flag	Description
Semantic (default)	—	LLM analyzes entire transcript before frame extraction. Finds more issues, including those without explicit keywords.
Keywords	`--keywords-only`	Fast regex-based detection using predefined keyword patterns. Lower API costs, faster processing.

Output Structure

{video}_review/
├── {video}_transcript.txt  # Full transcription
├── {video}_report.json     # Machine-readable report
├── {video}_report.md       # Human-readable Markdown
├── {video}_report.html     # HTML Pro report (only with --pro)
└── screenshots/
    ├── 01_bug_01-23.jpg
    ├── 02_change_02-45.jpg
    └── ...

Report Contents

Each report includes:

Executive Summary: AI-generated overview of key issues and priorities
Statistics: Breakdown by category (bugs, changes, UI) and severity
Detailed Findings: For each detected issue:
- Timestamp and category
- Original transcript text
- Context (surrounding dialogue)
- AI Analysis:
  - Issue detection (is_issue: true/false) - distinguishes real problems from confirmations
  - Sentiment (problem/positive/neutral) - tone of user's statement
  - Severity rating (critical/high/medium/low/none)
  - Summary
  - Affected components
  - Action items
  - Suggested fix
- Screenshot

Configuration

Config file location: ~/.config/screenscribe/config.env

# API Key (pick one)
SCREENSCRIBE_API_KEY=your-api-key
# OPENAI_API_KEY=sk-proj-xxx
# LIBRAXIS_API_KEY=xxx

# Explicit Endpoints (full URLs - recommended)
SCREENSCRIBE_STT_ENDPOINT=https://api.openai.com/v1/audio/transcriptions
SCREENSCRIBE_LLM_ENDPOINT=https://api.openai.com/v1/responses
SCREENSCRIBE_VISION_ENDPOINT=https://api.openai.com/v1/responses

# Alternative: Base URL (auto-derives /v1/... paths)
# SCREENSCRIBE_API_BASE=https://api.libraxis.cloud

# Models
SCREENSCRIBE_STT_MODEL=whisper-1
SCREENSCRIBE_LLM_MODEL=gpt-4o
SCREENSCRIBE_VISION_MODEL=gpt-4o

# Processing Options
SCREENSCRIBE_LANGUAGE=pl
SCREENSCRIBE_SEMANTIC=true
SCREENSCRIBE_VISION=true

CLI Reference

`screenscribe review`

Full video analysis pipeline.

screenscribe review VIDEO [OPTIONS]

Options:
  -o, --output PATH         Output directory (default: VIDEO_review/)
  -l, --lang TEXT           Language code for transcription and AI prompts (default: pl)
  -k, --keywords-file PATH  Custom keywords YAML file
  --keywords-only           Use fast keyword-based detection instead of semantic pre-filter
  --local                   Use local STT server instead of cloud
  --semantic/--no-semantic  Enable/disable LLM analysis (default: enabled)
  --vision/--no-vision      Enable/disable vision analysis (default: enabled)
  --json/--no-json          Save JSON report (default: enabled)
  --markdown/--no-markdown  Save Markdown report (default: enabled)
  --resume                  Resume from previous checkpoint if available
  --estimate                Show time estimate without processing
  --dry-run                 Run transcription and detection only, then stop

`screenscribe transcribe`

Transcription only, without analysis.

screenscribe transcribe VIDEO [OPTIONS]

Options:
  -o, --output PATH       Output file for transcript
  -l, --lang TEXT         Language code (default: pl)
  --local                 Use local STT server

`screenscribe preprocess`

Transcript-first preprocessing bundle without semantic or vision analysis.

screenscribe preprocess VIDEO [OPTIONS]

Options:
  -o, --output PATH       Output directory for preprocess artifacts
  -l, --lang TEXT         Language code (default: pl)
  --local                 Use local STT server
  --audio/--no-audio      Include extracted audio in the bundle (default: enabled)
  --force                 Reuse output directory if preprocess artifacts already exist

`screenscribe config`

Manage configuration.

screenscribe config [OPTIONS]

Options:
  --show                  Show current configuration
  --init                  Create default config file
  --init-keywords         Create keywords.yaml for customization
  --set-key TEXT          Set API key in config

`screenscribe version`

Show version information.

Detected Keywords

ScreenScribe detects issues based on keywords in both Polish and English:

Bugs: bug, błąd, nie działa, crash, error, broken, failed, exception...

Changes: zmiana, zmienić, poprawić, update, modify, refactor, rename...

UI Issues: UI, interfejs, wygląd, layout, design, button, margin, padding...

Custom Keywords

Create a custom keywords file for your project:

# Generate default keywords.yaml
screenscribe config --init-keywords

# Or use with review command
screenscribe review video.mov --keywords-file my_keywords.yaml

Keywords file format (YAML):

bug:
  - "nie działa"
  - "broken"
  - "crash"
change:
  - "trzeba zmienić"
  - "should fix"
ui:
  - "button"
  - "layout"

ScreenScribe automatically searches for keywords.yaml in the current directory.

Performance

Typical processing times for a 15-minute video:

Step	Duration
Audio extraction	~5s
Transcription	~30s
Issue detection	<1s
Screenshot extraction	~10s
Semantic analysis (44 issues)	~8-10 min
Vision analysis (optional)	~20+ min

Development

# Clone and setup
git clone https://github.com/VetCoders/ScreenScribe.git
cd ScreenScribe
make dev

# Run from source
uv run screenscribe review video.mov

# Quality checks
make lint       # ruff check
make typecheck  # mypy
make check      # all quality checks

# Testing
make test              # unit tests (fast, no API needed)
make test-integration  # integration tests (requires LIBRAXIS_API_KEY)
make test-all          # all tests

# Formatting
make format     # ruff format + fix

Makefile Targets

make install          # uv tool install . (global CLI)
make dev              # Install dev dependencies
make test             # Run unit tests
make test-integration # Run integration tests (requires API key)
make lint             # Run ruff linter
make format           # Format code with ruff
make typecheck        # Run mypy type checker
make check            # All quality checks
make clean            # Remove caches and artifacts

Architecture

screenscribe/
├── __init__.py            # Version info
├── cli.py                 # Typer CLI interface
├── config.py              # Configuration management (per-endpoint keys)
├── audio.py               # FFmpeg audio extraction
├── transcribe.py          # LibraxisAI STT integration
├── detect.py              # Keyword-based issue detection
├── semantic_filter.py     # Semantic pre-filtering pipeline
├── keywords.py            # Custom keywords loading (YAML)
├── screenshots.py         # Frame extraction
├── unified_analysis.py    # Unified VLM pipeline (replaces semantic+vision)
├── image_utils.py         # Shared image encoding utilities
├── semantic.py            # LLM semantic analysis (legacy)
├── vision.py              # Vision model analysis (legacy)
├── report.py              # Report generation (JSON/Markdown)
├── prompts.py             # i18n prompt templates (PL/EN)
├── api_utils.py           # Retry logic, API utilities
├── checkpoint.py          # Pipeline checkpointing
├── validation.py          # Model availability validation (fail-fast)
└── default_keywords.yaml  # Default detection keywords

tests/
├── test_detect.py         # Detection unit tests
├── test_semantic_filter.py # Semantic filter unit tests (54 tests)
├── test_validation.py     # Validation unit tests
└── test_integration.py    # API integration tests

API Integration

ScreenScribe uses LibraxisAI's unified API:

STT: POST /v1/audio/transcriptions (OpenAI-compatible)
LLM: POST /v1/responses (Responses API format)
Vision: POST /v1/responses with input_image (auto-routed to VLM)

Code Quality

Tool	Purpose	Config
mypy	Type checking	Strict mode enabled
Ruff	Linting + formatting	E, W, F, I, B, C4, UP, S, RUF rules
Bandit	Security linting	Pre-commit hook
Semgrep	Static analysis	Pre-commit hook
detect-secrets	Secret detection	Baseline tracking

All code is fully type-hinted and passes strict mypy checks.

Roadmap

Implemented ✓

Planned

Local model support for LLM/Vision
Batch processing / queue system
More languages for prompts
Web UI

Known Limitations

Vision analysis can be slow (~20+ min for many issues)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
docs		docs
screenscribe		screenscribe
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
USAGE.md		USAGE.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

⌜ ScreenScribe ⌟

Overview

API Provider

Multi-Provider Setup (Recommended)

Features

Tech Stack

Installation

Prerequisites

Install ScreenScribe

Quick Start

Configure API Key

More Review Commands

How It Works

Detection Modes

Output Structure

Report Contents

Configuration

CLI Reference

screenscribe review

screenscribe transcribe

screenscribe preprocess

screenscribe config

screenscribe version

Detected Keywords

Custom Keywords

Performance

Development

Makefile Targets

Architecture

API Integration

Code Quality

Roadmap

Implemented ✓

Planned

Known Limitations

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`screenscribe review`

`screenscribe transcribe`

`screenscribe preprocess`

`screenscribe config`

`screenscribe version`

Packages