clinical-rs

Composable Rust crates for clinical data engineering.

What is this?

clinical-rs is a Cargo workspace containing three independent crates for working with clinical healthcare data in Rust:

Crate	Purpose	Status
`medcodes`	Medical code ontologies, hierarchy traversal, and cross-system mapping (ICD-10, ATC, LOINC, SNOMED CT, etc.)	🚧 Pre-release
`mimic-etl`	MIMIC-III/IV CSV parser → Apache Arrow RecordBatches with memory-mapped I/O and parallel processing	🚧 Pre-release
`clinical-tasks`	Task windowing engine — transforms clinical event streams into ML-ready (features, label) Arrow tables	🚧 Pre-release

Each crate publishes independently to crates.io and can be used standalone. Together, they form an end-to-end pipeline from raw clinical data to model-ready datasets.

Why?

Clinical ML data pipelines are bottlenecked by data loading, not model training. Python-based tools like PyHealth and pandas struggle with memory pressure and parallelism on large datasets like MIMIC-IV (300K+ patients, tens of millions of events).

clinical-rs targets that bottleneck:

Arrow-native — every crate speaks Apache Arrow as its interchange format. Zero-copy interop with PyArrow, Polars, DataFusion, DuckDB, and Spark.
Streaming-first — all ETL crates emit RecordBatch iterators, not materialized collections. Same code path works for batch (collect → Parquet) and streaming (process → infer → emit).
Parallel by default — rayon-based work-stealing parallelism without Python's GIL. Memory-mapped I/O via memmap2 for datasets larger than RAM.
Composable, not monolithic — use medcodes alone for code lookups, mimic-etl alone for data loading, or wire them together through clinical-tasks.

Quick Start

Add the crate(s) you need:

# Cargo.toml
[dependencies]
medcodes = "0.1"         # medical code ontologies
mimic-etl = "0.1"        # MIMIC-III/IV → Arrow
clinical-tasks = "0.1"   # task windowing for ML

Medical code lookup

use medcodes::icd10cm::Icd10Cm;

let code = Icd10Cm::lookup("A41.9")?;       // Sepsis, unspecified organism
let ancestors = code.ancestors();             // ["A41", "A30-A49", "A00-B99"]
let description = code.description();         // "Sepsis, unspecified organism"

Cross-system mapping

use medcodes::crossmap::CrossMap;

let icd_to_ccs = CrossMap::load(System::Icd10Cm, System::CcsCm)?;
let mapped = icd_to_ccs.map("A41.9")?;      // ["2"]  (CCS category: Septicemia)

MIMIC-IV to Arrow

use mimic_etl::Mimic4Dataset;

let dataset = Mimic4Dataset::open("path/to/mimic-iv/")?;
let batches = dataset
    .tables(&["diagnoses_icd", "prescriptions", "labevents"])
    .into_event_stream()?;  // Iterator<Item = RecordBatch>

// Write to Parquet
mimic_etl::to_parquet(batches, "output/events.parquet")?;

Task windowing

use clinical_tasks::{MortalityPrediction, TaskConfig};
use arrow::ipc::reader::FileReader;

let events = FileReader::try_new(File::open("events.arrow")?)?;
let task = MortalityPrediction::new(TaskConfig {
    observation_window: Duration::hours(48),
    prediction_window: Duration::hours(24),
    ..Default::default()
});

let samples = task.apply(events)?;  // Iterator<Item = RecordBatch> with features + label columns

Note: API examples are illustrative and will evolve before 0.1.0 release.

Design Principles

Arrow is the contract. Crates communicate via Arrow RecordBatches. No custom serialization formats, no framework lock-in.
Each crate stands alone. medcodes has zero dependencies on mimic-etl. A consumer building a FHIR pipeline can use medcodes + clinical-tasks without ever touching MIMIC data.
Correctness over cleverness. Medical code mappings are validated against official source files (CMS, WHO, NLM). Wrong mappings in clinical contexts cause harm.
No model training. This project handles everything before and after the GPU. Train models in PyTorch/JAX, export to ONNX, run inference in Rust via the ort crate.

Project Structure

clinical-rs/
├── crates/
│   ├── medcodes/             # Medical code ontologies + cross-mapping
│   │   ├── src/
│   │   ├── data/             # Embedded code tables (build-time)
│   │   └── Cargo.toml
│   ├── mimic-etl/            # MIMIC-III/IV → Arrow ETL
│   │   ├── src/
│   │   └── Cargo.toml
│   └── clinical-tasks/       # Task windowing engine
│       ├── src/
│       └── Cargo.toml
├── ARCHITECTURE.md
├── TODO.md
├── CONTRIBUTING.md
├── LICENSE-MIT
├── LICENSE-APACHE
└── Cargo.toml                # Workspace manifest

Relationship to Existing Tools

Tool	Language	Focus	How clinical-rs differs
PyHealth	Python	End-to-end clinical ML toolkit (data + models + training)	We do data only — faster, Arrow-native, no model training
MedModels	Rust + Python	Graph-based RWE analysis (treatment effects, propensity matching)	We use columnar/Arrow, not graph. ML data loading, not RWE analytics
MEDS	Python	Medical event data standard	Complementary — we could emit MEDS-compatible schemas

Requirements

Rust 1.94+ (2024 edition)
MIMIC-III/IV access via PhysioNet credentialed access (for mimic-etl)

License

Dual-licensed under MIT and Apache 2.0, at your option.

Citation

If you use clinical-rs in academic work, please cite:

@software{clinical_rs,
  author       = {Kresna Sucandra},
  title        = {clinical-rs: Composable Rust crates for clinical data engineering},
  url          = {https://github.com/SHA888/clinical-rs},
  license      = {MIT OR Apache-2.0},
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
crates		crates
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
SECURITY.md		SECURITY.md
TODO.md		TODO.md
cliff.toml		cliff.toml
deny.toml		deny.toml
release.toml		release.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clinical-rs

What is this?

Why?

Quick Start

Medical code lookup

Cross-system mapping

MIMIC-IV to Arrow

Task windowing

Design Principles

Project Structure

Relationship to Existing Tools

Requirements

License

Citation

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

clinical-rs

What is this?

Why?

Quick Start

Medical code lookup

Cross-system mapping

MIMIC-IV to Arrow

Task windowing

Design Principles

Project Structure

Relationship to Existing Tools

Requirements

License

Citation

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages