Parse lab results from PDF, CSV, and text into structured biomarker JSON
Every lab report uses a different format. Different names for the same marker. Different units. Different layouts entirely. You end up writing one-off scripts or copying values into a spreadsheet by hand.
labparse reads any of them and gives you clean JSON -- standardized names, units, confidence scores. 256 markers across 27 categories, matched through a 7-step fuzzy normalization pipeline. PDF extraction runs a local vision model on your machine. Your medical data never leaves your device.
Install | How It Works | Quick Start | Features | Contributing
Lab reports come in dozens of formats. Quest prints things one way, LabCorp another, and international labs do whatever they want. The marker names are all over the place too -- "Hemoglobin A1c" vs "HbA1c" vs "Glycated Hemoglobin" all mean the same thing. Getting structured data out of any of these means tedious manual work that nobody wants to do twice.
labparse handles it. Point it at a PDF or CSV, or paste the text straight in. You get back standardized JSON with every marker resolved to a canonical name -- under 2 milliseconds for text input.
brew tap 199-biotechnologies/tap
brew install labparsecargo install --git https://github.com/199-biotechnologies/labparse-cli.gitgit clone https://github.com/199-biotechnologies/labparse-cli.git
cd labparse-cli
cargo install --path .PDF extraction uses a local Qwen3.5-9B vision model. One-time setup:
brew install ollama
ollama pull qwen3.5:9b ┌──────────────────────────┐
Lab PDF ──────┐ │ labparse │
│ │ │
CSV file ──────┼─────▶│ 7-step normalization │──────▶ Biomarker JSON v2
│ │ 256 markers, 1000+ aliases│ (standardized names,
Raw text ──────┘ │ 27 categories │ units, confidence)
└──────────────────────────┘
│
▼
Local vision model
(PDFs only, on-device)
Text and CSV go through a regex-based parser. PDFs get rendered to images and read by a local vision model (Qwen3.5-9B via Ollama). Every extracted marker name runs through a 7-step normalization pipeline: lowercase, strip specimen prefix, strip method suffix, remove parentheticals, British-to-American spelling, CamelCase split, noise removal, then exact catalog lookup.
The output is JSON v2 with resolved, confidence, and resolution_method fields on each marker. Unmatched markers land in a separate unresolved[] array -- nothing gets silently dropped.
Parse text directly:
labparse --text "HbA1c 5.8%, ApoB 95 mg/dL, LDL-C 118 mg/dL"Parse a lab PDF:
labparse bloodwork.pdfPipe from stdin:
echo "Fasting Glucose 92 mg/dL, Triglycerides 68 mg/dL" | labparse --stdinGet JSON output and pipe to another tool:
labparse --text "HbA1c 5.8%" --json | labassess --sex male --age 45Output auto-switches to JSON when piped. Human-readable tables show up on the terminal.
List all known biomarkers:
labparse biomarkers
labparse biomarkers --category lipid| Feature | Detail |
|---|---|
| 256 biomarkers | 1000+ aliases across 27 clinical categories |
| PDF extraction | Local Qwen3.5-9B vision model, no cloud API calls |
| 7-step fuzzy matching | Handles OCR errors, alternate spellings, international naming |
| JSON v2 output | Confidence scores, resolution method, unresolved marker array |
| Dual output mode | Human-readable tables on TTY, JSON when piped |
| Fast | ~2ms text parsing, ~5MB memory footprint |
| Agent-friendly | agent-info subcommand for AI tool discovery |
| Composable | Unix-style piping to labassess, labstore, and other tools |
Supported input formats: PDF, CSV, TSV, and free-form text (pasted lab results, OCR output, clinical notes).
Supported categories: metabolic, lipid, inflammation, hematology, iron, kidney, liver, electrolytes, thyroid, hormone, nutritional, cardiac, cancer markers, immunology, cardiovascular, neurological, coagulation, urinalysis, body composition, functional, sleep, cardiovascular imaging, pulmonary, toxicology, viral serology, cytokine, digestive.
labparse is one tool in a set of composable Rust CLIs for clinical biomarker analysis:
Lab PDF/CSV/text → labparse → Biomarker JSON
├→ labstore → SQLite patient database
└→ labassess → Longevity-scored assessment
Each CLI does one job and pipes JSON to the next. Built by 199 Biotechnologies.
Pull requests are welcome, especially for the biomarker catalog in data/biomarkers.toml -- adding new markers or aliases is the fastest way to contribute. For anything bigger, open an issue first so we can talk it through.
Proprietary -- Copyright (c) 2026 Boris Djordjevic, 199 Biotechnologies & Paperfoot AI