Skip to content

08. Inference & AI

Nicolás Baier Quezada edited this page Jun 5, 2026 · 1 revision

08. Inference & AI

DIRD+ runs two kinds of AI entirely on-device:

  1. Computer vision — ONNX detection + segmentation models (WebAssembly, in the WebView).
  2. Language — an optional local LLM (llama.cpp, in the Rust backend) that only polishes report prose.

Neither sends data off the device. Models are downloaded once and stored locally.

Vision pipeline (ONNX)

Implemented in src/lib/ai/ (InferenceService, ONNXModelManager, ModelDownloader) and src/lib/analysis/.

Fundus image (any camera)
  │ 1. Preprocess     resize to 640×640, letterbox, normalize (÷255), RGB
  │ 2. ONNX inference detection → boxes (class + confidence); segmentation → masks
  │                   (ONNX Runtime Web, SIMD + multi-thread, Intel/AMD/ARM profiles)
  │ 3. Post-process   Non-Maximum Suppression (IoU 0.45), confidence threshold
  │ 4. Spatial        quadrant distribution (4 zones + center), macular-edema detection,
  │    analysis       cup/disc ratio (OpenCV.js), spatial calibration from the optic disc
  │ 5. Classify       apply the active clinical guideline (see page 10)
  ▼ 6. Persist        store detections, segmentations, classification, measurements

Detectable classes (reference models): optic_disc, fovea, hard_exudate, hemorrhage, cotton_wool_spot, microhemorrhages, edema, microaneurysm, neovascularization, venous_beading, IRMA.

Models live in the Debaq/dird_models repository (AGPL-3.0) and are downloaded on demand from Settings → AI Models. You can also plug in your own ONNX model — see 09. Model Interface.

Performance metrics

Per-inference timings (preprocess / inference / post-process / NMS / total) are recorded locally, persist across sessions, and can be exported to JSON for benchmarking across devices.

Language model (local LLM)

Implemented in src-tauri/src/llm.rs (Rust, via the llama-cpp-2 crate) and surfaced in Settings → AI Models → Local assistant (src/components/settings/LocalLLMSection.tsx).

  • Curated catalog of small open-weight GGUF models (SmolLM2, TinyLlama, Llama-3.2, Qwen2.5, Gemma-2, Phi-3.5; ~230 MB–2.4 GB, Q4_K_M).
  • User-driven download with resumable progress events (llm:download_progress). No weights are bundled with the app.
  • In-process inference — generation runs inside the Tauri process; per-family chat templating (TinyLlama / ChatML / Llama-3 / Phi-3 / Gemma).
  • Only used for report prose. See 07. Report Pipeline. It never classifies or diagnoses.

100% local: once a model is downloaded, no clinical text ever leaves the device.

Clone this wiki locally