solid

A single-binary screen-watcher that detects on-screen questions and answers them with an LLM, exposing the result over HTTP.

solid polls the screen, identifies any quiz / homework / interview question, sends it to an LLM, and serves the answer over a tiny HTTP API (REST + Server-Sent Events). It runs as one binary on Windows 10/11 and macOS 14+.

Features

Two strategies, one binary
- Vision (default for LLM_PROVIDER=anthropic) — sends the screenshot directly to Claude Sonnet; no OCR step, handles "All of the above" / "Both A and B" catch-alls correctly.
- OCR + text (for LLM_PROVIDER=ollama) — Tesseract extracts the question, then a local Ollama model answers it.
Question typing — multiple_choice, code, short_answer, plain markers tune the prompt so MCQ replies stay one-letter-plus-justification while code replies stay in fenced blocks.
Live updates over SSE — GET /question/stream pushes a new event the moment the worker writes a fresh answer.
Smart de-duplication — perceptual content hash skips unchanged screens; normalized Levenshtein collapses near-identical questions caused by OCR jitter.
Fast capture paths — DXGI Output Duplication on Windows (GDI fallback), ScreenCaptureKit on macOS 14+ (CGDisplay fallback).
Self-contained release bundles — the *-ocr archives ship Tesseract + Leptonica + eng.traineddata so you can extract and run with no extra install.
G2 HUD companion app — mirrors the current Q/A onto Even Realities G2 smart glasses (see G2 companion app).

How it works

Every 2 seconds, the worker captures the primary display.
A perceptual hash skips frames where nothing changed.
Depending on the strategy, the frame is either:
- shipped to Claude vision, which returns {question, answer} or {none: true}, or
- OCR'd, classified into a QuestionType, and forwarded to Ollama.
Near-duplicate questions (normalized Levenshtein ≥ 0.90 vs the previous one) are dropped.
Fresh answers are written to SQLite, cached in memory, and broadcast to any open SSE subscribers.

Quick start

The fastest path is the pre-built *-ocr archive from the latest release — Tesseract + Leptonica are linked in and eng.traineddata is staged under tessdata/ next to the binary.

# 1. extract the archive, then:
export ANTHROPIC_API_KEY=sk-ant-...
./solid               # macOS
solid.exe             # Windows

# 2. while a question is on screen
curl http://127.0.0.1:8088/question

# 3. live stream
curl -N http://127.0.0.1:8088/question/stream

The slim archives (without -ocr) are smaller but require a local Tesseract install and your own eng.traineddata per the discovery rules in Tesseract data files.

Installation

From source

git clone https://github.com/0x000NULL/solid.git
cd solid
cargo build --release
./target/release/solid          # macOS
./target/release/solid.exe      # Windows

Requirements:

Requirement	Source
Rust 1.75+	https://rustup.rs/
Windows 10/11 (x86-64) or macOS 14+ (Apple Silicon or Intel)	—
Tesseract `eng.traineddata` (when using OCR)	https://github.com/tesseract-ocr/tessdata_best
`ANTHROPIC_API_KEY` (when `LLM_PROVIDER=anthropic`)	https://console.anthropic.com/
Ollama (when `LLM_PROVIDER=ollama`)	https://ollama.ai/

SQLite ships via rusqlite with the bundled feature — no separate install. To build without OCR support: cargo build --release --no-default-features.

Tesseract data files (`tessdata`)

Only required when using the OCR strategy (LLM_PROVIDER=ollama). solid resolves the directory in this order:

TESSDATA_PREFIX — if set, libtesseract uses this path. Point it at the directory that contains tessdata/ (or directly at the tessdata/ directory, depending on your Tesseract version).
tessdata/ next to the executable — drop eng.traineddata into a tessdata/ directory beside the binary.

Windows:

target/release/
├── solid.exe
└── tessdata/
    └── eng.traineddata

macOS (via Homebrew):

brew install tesseract pkgconf
export TESSDATA_PREFIX="$(brew --prefix tesseract)/share/"

macOS Screen Recording permission

ScreenCaptureKit and CGDisplayCreateImage both require Screen Recording permission. The first capture attempt prompts the user; grant the permission to the launching app (Terminal, iTerm2, your IDE, or the solid binary itself), then restart solid. If permission is missing, the SCK path is disabled for the process lifetime and the binary falls back to a black/empty CG capture.

Configuration

All configuration is via environment variables. A .env file is loaded automatically if present.

Variable	Purpose	Default	Required
`LLM_PROVIDER`	`anthropic` (vision) or `ollama` (OCR + text)	`anthropic`	No
`ANTHROPIC_API_KEY`	Claude API key	—	When `LLM_PROVIDER=anthropic`
`OLLAMA_MODEL`	Model name for Ollama	`llama3.1`	No
`DATABASE_PATH`	Path to SQLite file	`conversation.db`	No
`BIND_ADDR`	HTTP server bind address	`0.0.0.0:8088`	No
`TESSDATA_PREFIX`	Override Tesseract data lookup	—	No
`SOLID_FORCE_GDI`	Skip DXGI, use GDI capture only (Windows)	unset	No
`SOLID_FORCE_CG`	Skip ScreenCaptureKit, use CG capture only (macOS)	unset	No
`RUST_LOG`	Tracing filter (`error`, `warn`, `info`, `debug`, `trace`)	`info`	No

HTTP API

Default bind: 0.0.0.0:8088. CORS is permissive. All timestamps are UTC, ISO 8601 (Unix seconds in JSON).

Method	Endpoint	Description	Success	Other statuses
GET	`/question`	Most recent question + answer	`200` JSON `Answer`	`204` when none detected yet
GET	`/history`	Up to 50 most recent records, newest first	`200` JSON `Answer[]`	`500` on DB error
GET	`/question/stream`	Server-Sent Events feed; emits an `Answer` event each time a new answer is recorded	`200 text/event-stream`	—

Answer shape:

{
  "question": "What is the capital of France?",
  "answer": "Paris.",
  "confidence": 0.9,
  "timestamp": 1745500000
}

Example SSE consumer:

const es = new EventSource("http://127.0.0.1:8088/question/stream");
es.onmessage = (ev) => {
  const a = JSON.parse(ev.data);
  console.log(`Q: ${a.question}\nA: ${a.answer}`);
};

G2 companion app

g2-app/ is a Vite + TypeScript app that runs on Even Realities G2 smart glasses via the EvenHub SDK. It subscribes to the local solid server's /question/stream and renders the current Q/A onto the HUD, with scroll-up / scroll-down events from the temple touch surface.

cd g2-app
npm install
npm run dev          # local browser dev
npm run simulate     # run against the EvenHub simulator
npm run pack         # package as solid.ehpk for the glasses

Configure the host (e.g. http://192.168.1.42:8088) on the settings screen the first time the app starts.

Architecture

  ┌──────────────┐     ┌────────────────────────────────┐     ┌──────────────┐
  │ Screen       │────▶│ Strategy                       │────▶│ LLM          │
  │ Capture      │     │  ├── Vision (Claude Sonnet)    │     │ (Anthropic   │
  │ DXGI / SCK   │     │  └── OCR (Tesseract) + text    │     │  or Ollama)  │
  └──────────────┘     └────────────────────────────────┘     └──────┬───────┘
        │                       │                                    │
        │ perceptual hash       │ qtype + dedup                      │ answer
        ▼                       ▼                                    ▼
  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐  ┌──────────────┐
  │ skip if      │     │ AppState     │◀────│ Worker Loop  │─▶│ SQLite       │
  │ unchanged    │     │ (latest)     │     │ tick 2 s     │  │ conversation │
  └──────────────┘     └──────┬───────┘     └──────┬───────┘  └──────────────┘
                              │ read                │ broadcast
                              ▼                     ▼
                      ┌──────────────────────────────────┐
                      │ HTTP API                         │
                      │  /question  /history  /question/ │
                      │                       stream     │
                      └──────────────────────────────────┘

All capture / OCR / LLM work happens on a single Tokio task with per-stage timeouts (capture 3 s, OCR 8 s, LLM 30 s, vision 20 s). The HTTP server runs on the same runtime; the worker pushes new answers over a tokio::sync::broadcast channel that the SSE handler subscribes to.

Project structure

solid/
├─ Cargo.toml
├─ src/
│  ├─ main.rs              # entry point, worker loop, server bind
│  ├─ lib.rs               # crate root for tests
│  ├─ state.rs             # in-memory state + JSON model
│  ├─ capture/             # platform-specific screen capture
│  │  ├─ windows.rs        # DXGI Output Duplication + GDI fallback
│  │  └─ macos.rs          # ScreenCaptureKit + CG fallback
│  ├─ ocr/                 # Tesseract wrapper + question typing
│  ├─ llm.rs               # Anthropic + Ollama clients (text + vision)
│  ├─ http.rs              # actix-web handlers, SSE
│  ├─ db.rs                # SQLite persistence
│  ├─ perceptual_hash.rs   # frame-change detection
│  └─ text.rs              # normalize + dedup helpers
├─ tests/                  # integration tests (wiremock + font8x8 fixtures)
├─ g2-app/                 # Even Realities G2 HUD companion (Vite + TS)
└─ spec.md                 # original design notes

Development

cargo fmt --all
cargo clippy --all-targets -- -D warnings
cargo test
cargo test --no-default-features    # build without OCR

Notable test fixtures:

tests/ uses wiremock to stub the Anthropic / Ollama HTTP endpoints.
An OCR round-trip test renders "What is 2+2?" via font8x8 and feeds it to extract_question (gated on the ocr feature, #[ignore]d by default).

CI runs fmt, clippy, and cargo test on Windows + macOS. Tagging vX.Y.Z triggers release.yml, which builds slim and OCR-bundled archives for x86_64-pc-windows-msvc and aarch64-apple-darwin.

Troubleshooting

Symptom	Likely cause	Fix
`failed to open database`	Bad `DATABASE_PATH` or unwritable directory	Use an absolute, writable path
`Tesseract::Error` at startup	`tessdata` not found	Set `TESSDATA_PREFIX` or place `tessdata/` next to the binary
macOS: empty / black captures	Screen Recording permission missing	System Settings → Privacy & Security → Screen Recording → enable for the launching app, then restart `solid`
`401 Unauthorized` from Anthropic	Missing or invalid `ANTHROPIC_API_KEY`	Re-check, regenerate if expired
`/question` always `204`	No question detected yet	Wait for a question on screen, or check `RUST_LOG=debug` for OCR / vision output
Vision strategy never fires	Wrong provider	Confirm `LLM_PROVIDER` is unset or `anthropic`
Ollama connection refused	Service not running	`ollama serve`; confirm `:11434` is listening
Windows: TLS errors on large captures	Schannel limit	The vision path already downsizes to 1568 px; force GDI capture with `SOLID_FORCE_GDI=1` if DXGI is misbehaving

Roadmap & extending

Goal	How
Add an LLM backend	New variant on `LlmProvider`, new branch in `query_llm`
Swap OCR engine	Replace `ocr/`; keep the `extract_question` signature
More metadata	Add columns to `conversation`; update `Db::insert` and `Answer`
Web UI	Mount static files in actix, consume `/question/stream`
Windows service	Wrap with the `windows-service` crate
`?limit=N` on `/history`	Parse via `web::Query<Limit>`, pass to `Db::recent`

See spec.md for the original design doc; some details (binary name, polling interval, port) have evolved since v0.1 — this README is the source of truth for the current behavior.

License

MIT. Fork, modify, distribute. Attribution appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
db		db
g2-app		g2-app
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
manual-checklist.md		manual-checklist.md
needs-review.md		needs-review.md
run.err		run.err
spec.md		spec.md
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

solid

Table of Contents

Features

How it works

Quick start

Installation

From source

Tesseract data files (`tessdata`)

macOS Screen Recording permission

Configuration

HTTP API

G2 companion app

Architecture

Project structure

Development

Troubleshooting

Roadmap & extending

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

solid

Table of Contents

Features

How it works

Quick start

Installation

From source

Tesseract data files (tessdata)

macOS Screen Recording permission

Configuration

HTTP API

G2 companion app

Architecture

Project structure

Development

Troubleshooting

Roadmap & extending

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Tesseract data files (`tessdata`)

Packages