labparse

Parse lab results from PDF, CSV, and text into structured biomarker JSON

Every lab report uses a different format. Different names for the same marker. Different units. Different layouts entirely. You end up writing one-off scripts or copying values into a spreadsheet by hand.

labparse reads any of them and gives you clean JSON -- standardized names, units, confidence scores. 256 markers across 27 categories, matched through a 7-step fuzzy normalization pipeline. PDF extraction runs a local vision model on your machine. Your medical data never leaves your device.

Install | How It Works | Quick Start | Features | Contributing

Why This Exists

Lab reports come in dozens of formats. Quest prints things one way, LabCorp another, and international labs do whatever they want. The marker names are all over the place too -- "Hemoglobin A1c" vs "HbA1c" vs "Glycated Hemoglobin" all mean the same thing. Getting structured data out of any of these means tedious manual work that nobody wants to do twice.

labparse handles it. Point it at a PDF or CSV, or paste the text straight in. You get back standardized JSON with every marker resolved to a canonical name -- under 2 milliseconds for text input.

Install

Homebrew (macOS)

brew tap 199-biotechnologies/tap
brew install labparse

Cargo

cargo install --git https://github.com/199-biotechnologies/labparse-cli.git

From source

git clone https://github.com/199-biotechnologies/labparse-cli.git
cd labparse-cli
cargo install --path .

PDF vision setup (optional)

PDF extraction uses a local Qwen3.5-9B vision model. One-time setup:

brew install ollama
ollama pull qwen3.5:9b

How It Works

                          ┌──────────────────────────┐
   Lab PDF  ──────┐      │        labparse           │
                  │      │                            │
   CSV file ──────┼─────▶│  7-step normalization      │──────▶  Biomarker JSON v2
                  │      │  256 markers, 1000+ aliases│        (standardized names,
   Raw text ──────┘      │  27 categories             │         units, confidence)
                          └──────────────────────────┘
                                     │
                                     ▼
                           Local vision model
                          (PDFs only, on-device)

Text and CSV go through a regex-based parser. PDFs get rendered to images and read by a local vision model (Qwen3.5-9B via Ollama). Every extracted marker name runs through a 7-step normalization pipeline: lowercase, strip specimen prefix, strip method suffix, remove parentheticals, British-to-American spelling, CamelCase split, noise removal, then exact catalog lookup.

The output is JSON v2 with resolved, confidence, and resolution_method fields on each marker. Unmatched markers land in a separate unresolved[] array -- nothing gets silently dropped.

Quick Start

Parse text directly:

labparse --text "HbA1c 5.8%, ApoB 95 mg/dL, LDL-C 118 mg/dL"

Parse a lab PDF:

labparse bloodwork.pdf

Pipe from stdin:

echo "Fasting Glucose 92 mg/dL, Triglycerides 68 mg/dL" | labparse --stdin

Get JSON output and pipe to another tool:

labparse --text "HbA1c 5.8%" --json | labassess --sex male --age 45

Output auto-switches to JSON when piped. Human-readable tables show up on the terminal.

List all known biomarkers:

labparse biomarkers
labparse biomarkers --category lipid

Features

Feature	Detail
256 biomarkers	1000+ aliases across 27 clinical categories
PDF extraction	Local Qwen3.5-9B vision model, no cloud API calls
7-step fuzzy matching	Handles OCR errors, alternate spellings, international naming
JSON v2 output	Confidence scores, resolution method, unresolved marker array
Dual output mode	Human-readable tables on TTY, JSON when piped
Fast	~2ms text parsing, ~5MB memory footprint
Agent-friendly	`agent-info` subcommand for AI tool discovery
Composable	Unix-style piping to labassess, labstore, and other tools

Supported input formats: PDF, CSV, TSV, and free-form text (pasted lab results, OCR output, clinical notes).

Supported categories: metabolic, lipid, inflammation, hematology, iron, kidney, liver, electrolytes, thyroid, hormone, nutritional, cardiac, cancer markers, immunology, cardiovascular, neurological, coagulation, urinalysis, body composition, functional, sleep, cardiovascular imaging, pulmonary, toxicology, viral serology, cytokine, digestive.

Part of the Longevity CLI Suite

labparse is one tool in a set of composable Rust CLIs for clinical biomarker analysis:

Lab PDF/CSV/text → labparse → Biomarker JSON
                                ├→ labstore  → SQLite patient database
                                └→ labassess → Longevity-scored assessment

Each CLI does one job and pipes JSON to the next. Built by 199 Biotechnologies.

Contributing

Pull requests are welcome, especially for the biomarker catalog in data/biomarkers.toml -- adding new markers or aliases is the fastest way to contribute. For anything bigger, open an issue first so we can talk it through.

License

Built by Boris Djordjevic at 199 Biotechnologies | Paperfoot AI

If this is useful to you:

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.claude		.claude
data		data
src		src
tests		tests
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
LONGEVX_DESIGN.md		LONGEVX_DESIGN.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

labparse

Why This Exists

Install

Homebrew (macOS)

Cargo

From source

PDF vision setup (optional)

How It Works

Quick Start

Features

Part of the Longevity CLI Suite

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

labparse

Why This Exists

Install

Homebrew (macOS)

Cargo

From source

PDF vision setup (optional)

How It Works

Quick Start

Features

Part of the Longevity CLI Suite

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages