A universal mass spectrometry file reader. Fast, cross-platform, no .NET required.
Oxion reads mass spectrometry files from 7 vendor formats directly from their binary formats, achieving up to 700x faster scan decoding than the official .NET RawFileReader library. It provides a CLI tool, a desktop GUI converter, and Python bindings with NumPy integration.
Supported vendors: Thermo, Bruker, Waters, Agilent, Shimadzu, Sciex, plus the open mzML standard.
Download the latest binary for your platform from Releases:
| Platform | File | Install |
|---|---|---|
| Linux x86_64 | oxion-x86_64-unknown-linux-gnu.tar.gz |
tar xzf oxion-*.tar.gz |
| Linux aarch64 | oxion-aarch64-unknown-linux-gnu.tar.gz |
tar xzf oxion-*.tar.gz |
| macOS Intel | oxion-x86_64-apple-darwin.tar.gz |
tar xzf oxion-*.tar.gz |
| macOS Apple Silicon | oxion-aarch64-apple-darwin.tar.gz |
tar xzf oxion-*.tar.gz |
| Windows x64 | oxion-x86_64-pc-windows-msvc.zip |
Extract zip |
After extracting, optionally move oxion to a directory on your PATH.
pip install oxionRequires Python 3.11+. Pre-built wheels are available for Linux (x86_64, aarch64), macOS (Intel, Apple Silicon), and Windows (x64). Wheels can also be downloaded from Releases for offline install:
pip install oxion-*.whlDownload the desktop converter from Releases:
| Platform | File |
|---|---|
| macOS (Intel + Apple Silicon) | oxion-gui-*.tar.gz |
| Linux | oxion-gui-*.deb |
| Windows | oxion-gui-*.msi or *-setup.exe |
| Format | Extension | Read | Convert to mzML | Notes |
|---|---|---|---|---|
| Thermo RAW | .raw |
Full | Yes | v57-66, all Orbitrap/LTQ/Astral instruments |
| Bruker TDF/TSF | .d |
Full | Yes | timsTOF (4D ion mobility), QTOF |
| Waters .raw | .raw (directory) |
Full | Yes | MassLynx SQD2, ZQ, SIR/MRM |
| Agilent .d | .d (directory) |
Full | Yes | MassHunter + ChemStation |
| Shimadzu LCD | .lcd |
Full | Yes | MRM, triple-quad |
| Sciex WIFF | .wiff |
Metadata | Planned | OLE2 metadata extraction |
| mzML | .mzml, .mzml.gz |
Full | N/A | Indexed + non-indexed, gzip |
- Thermo RAW: Format versions 57-66, centroid/profile/FT/LT decoders, trailer metadata (86+ fields). No .NET required.
- Bruker .d: SQLite + zstd-compressed binary blobs, TOF-to-m/z quadratic calibration, full ion mobility (1/K0), ddaPASEF and diaPASEF support.
- Waters .raw: All 4 MassLynx binary encodings (2/4/6/8-byte), polynomial m/z calibration, multi-function support.
- Agilent .d: MassHunter (MSScan.bin + MSPeak.bin/MSProfile.bin) and ChemStation (DATA.MS big-endian) sub-formats.
- Sciex WIFF: OLE2 metadata extraction; scan data decoding pending (format is not publicly documented).
oxion <COMMAND> [OPTIONS]
Display instrument model, scan count, RT range, mass range, and trailer field names.
oxion info sample.raw
oxion info data.mzMLExport a single scan as JSON with m/z and intensity arrays.
oxion scan sample.raw -n 1 # First scan
oxion scan sample.raw -n 5000 # Scan number 5000Export the TIC as a two-column CSV (rt, intensity). Sub-millisecond extraction from the scan index (no scan decoding needed).
oxion tic sample.raw # Print to stdout
oxion tic sample.raw -o tic.csv # Save to fileExtract one or more XIC traces from a file.
# Single target at 5 ppm (default)
oxion xic sample.raw --mz 524.2644
# Multiple targets in one pass (shared scan iteration)
oxion xic sample.raw --mz 524.2644 --mz 445.12 --mz 302.05
# MS1 only — skips MS2 scans, much faster for DDA data
oxion xic sample.raw --mz 524.2644 --ms1-only
# Custom tolerance
oxion xic sample.raw --mz 524.2644 --ppm 10.0
# Save to file
oxion xic sample.raw --mz 524.2644 -o xic.csvExtract XIC traces across multiple files, align to a common RT grid, and output a CSV matrix. Uses a memory-bounded two-pass pipeline (mmap prescan → chunked extraction) that scales to hundreds of files on NAS.
oxion batch-xic \
-f file1.raw -f file2.raw -f file3.raw \
--mz 524.2644 --mz 445.12 \
--ppm 5.0 \
--rt-resolution 0.01 \
-o batch_output.csv
# With RT range filter
oxion batch-xic \
-f *.raw \
--mz 524.2644 \
--rt-range "2.0,15.0" \
-o filtered.csv
# Control parallelism and timeout for NAS/network storage
oxion batch-xic \
-f *.raw --mz-file targets.txt \
--max-concurrent 4 \
--timeout 120 \
-o output.csv| Option | Default | Description |
|---|---|---|
--max-concurrent |
4 (auto) | Max files processed in parallel. Lower = less memory. |
--timeout |
120 | Per-file read timeout in seconds. Skips stalled NAS reads. 0 = disable. |
Benchmark file-open (I/O) vs XIC extraction (CPU) at different concurrency levels to find the optimal --max-concurrent for your storage.
oxion bench-concurrency /path/to/raw/files \
--targets 2500 \
--concurrency 1,2,4,8,16 \
--max-files 20Find all MS2 scans matching a precursor m/z and export their fragment spectra.
oxion ms2-spectra sample.raw --mz 524.2644 --ppm 10.0 -o fragments.csvConvert Thermo RAW files to indexed mzML format.
# Single file (output: sample.mzML in same directory)
oxion convert sample.raw
# Specify output path
oxion convert sample.raw -o output.mzML
# Folder conversion (parallel, all .raw files)
oxion convert ./raw_files/ -o ./mzml_output/
# Options
oxion convert sample.raw \
--mz-bits 32 \
--intensity-bits 32 \
--compression zlib \
--ms1-only \
--min-intensity 100 \
--no-indexDisplay raw trailer metadata for a specific scan (Thermo RAW only).
oxion trailer sample.raw -n 1Benchmark scan decoding speed.
oxion benchmark sample.raw --mmap # Sequential decode
oxion benchmark sample.raw --mmap --parallel # Parallel decode
oxion benchmark sample.raw --mmap --xic # XIC extraction benchmarkimport oxion
# Auto-detect format from extension (works for all 7 formats)
raw = oxion.open("sample.raw") # Thermo RAW
mzml = oxion.open("data.mzML") # mzML
lcd = oxion.open("sample.lcd") # Shimadzu LCD
bruker = oxion.open("data.d") # Bruker TDF/TSF
waters = oxion.open("data.raw") # Waters .raw directory
agilent = oxion.open("data.D") # Agilent .d directory
wiff = oxion.open("data.wiff") # Sciex WIFF
# RAW file with memory-mapped I/O (faster for large files)
raw = oxion.open("sample.raw", mmap=True)
# Or use the format-specific class directly
raw = oxion.RawFile("sample.raw", mmap=True)raw = oxion.RawFile("sample.raw")
raw.n_scans # Total number of scans
raw.first_scan # First scan number
raw.last_scan # Last scan number
raw.start_time # Start RT in minutes
raw.end_time # End RT in minutes
raw.instrument_model # Instrument model string
raw.sample_name # Sample name from acquisition
raw.version # RAW format version (57-66)# Get scan data as NumPy arrays
mz, intensity = raw.scan(1)
# Get scan metadata (no array decoding)
info = raw.scan_info(1)
info.scan_number # 1
info.rt # Retention time in minutes
info.ms_level # 1, 2, 3, ...
info.polarity # "positive" or "negative"
info.tic # Total ion current
info.base_peak_mz # Base peak m/z
info.base_peak_intensity # Base peak intensity
info.filter_string # e.g. "FTMS + p NSI Full ms [100.00-1000.00]"
info.precursor_mz # Precursor m/z (MS2+ only, None for MS1)
info.precursor_charge # Charge state (MS2+ only, None for MS1)
# Read all MS1 scans in parallel
all_ms1 = raw.all_ms1_scans(progress=True) # list of (mz, intensity) tuples# TIC (sub-millisecond, from scan index)
rt, intensity = raw.tic()
# XIC (single target)
rt, intensity = raw.xic(524.2644, ppm=5.0)
# XIC restricted to MS1 scans (faster for DDA)
rt, intensity = raw.xic_ms1(524.2644, ppm=5.0)
# Batch XIC (multiple targets, single scan pass)
targets = [(524.2644, 5.0), (445.12, 5.0), (302.05, 10.0)]
results = raw.xic_batch_ms1(targets, progress=True)
for rt, intensity in results:
print(f" {len(rt)} points")# Acquisition type detection
raw.acquisition_type() # "dda", "dia", "ms1_only", or "mixed"
# MS level queries
raw.ms_level_of_scan(100) # 1 or 2
raw.is_ms2_scan(100) # True/False
raw.scan_numbers_by_level(2) # [4, 5, 6, 8, ...]
# Precursor information
precursors = raw.precursor_list() # Unique precursor m/z as NumPy array
parent = raw.parent_ms1_scan(5000) # Parent MS1 scan number for scan 5000
# Find MS2 scans for a precursor
ms2_scans = raw.ms2_scans_for_precursor(524.2644, tolerance_ppm=10.0)
for s in ms2_scans:
print(f" Scan {s.scan_number}, RT={s.rt:.2f}, CE={s.collision_energy}")
# All MS2 scan metadata (fast, no decoding)
all_ms2 = raw.all_ms2_scan_info()# Get unique isolation windows
windows = raw.isolation_windows()
for w in windows:
print(w) # IsolationWindow(center_mz=500.0, width=25.0, ce=30.0, activation=HCD)
# Get MS2 scans for a window
scans = raw.scans_for_window(windows[0])
# XIC within a DIA window
rt, intensity = raw.xic_ms2_window(524.2644, ppm=5.0, window=windows[0])# Available trailer field names
fields = raw.trailer_fields() # ["Charge State", "Ion Injection Time (ms)", ...]
# Trailer data for a specific scan
data = raw.trailer_extra(1) # dict: {"Charge State": "2", "Ion Injection Time (ms)": "35.00", ...}Extract aligned chromatograms across many files with bounded memory.
import oxion
files = ["sample1.raw", "sample2.raw", "sample3.raw"]
targets = [(524.2644, 5.0), (445.12, 5.0)]
# Standard batch — returns 3D tensor (samples × targets × timepoints)
tensor, rt_grid, sample_names = oxion.batch_xic(
files, targets,
progress=True, # tqdm progress bar
max_concurrent=4, # parallel files (tune for NAS)
timeout_secs=120, # skip stalled files after 2 min
)
# tensor.shape: (3, 2, n_timepoints), dtype=float64
# Half-memory mode — f32 intensities (sufficient for XIC data)
tensor, rt_grid, names = oxion.batch_xic(
files, targets,
use_f32=True, # returns float32 tensor
max_concurrent=4,
)
# tensor.dtype: float32For maximum memory control (hundreds of files × thousands of targets), use the two-step streaming API:
import oxion
import numpy as np
files = [f"sample_{i}.raw" for i in range(270)]
targets = [(mz, 5.0) for mz in np.linspace(70, 1050, 2500)]
# Step 1: Prescan — lightweight RT grid computation (mmap, ~1s for 270 files)
rt_grid, valid_files, sample_names = oxion.prescan_batch_xic(files)
# Step 2: Process one file at a time — you control memory
for i, path in enumerate(valid_files):
data = oxion.extract_xic_onto_grid(path, targets, rt_grid)
# data.shape: (2500, n_timepoints), dtype=float32
# Save incrementally, build sparse matrix, stream to HDF5, etc.
# Only ~1 GB in memory at a time instead of ~40 GBMost long-running operations support tqdm progress bars:
# Install tqdm for progress bar support
# pip install tqdm
rt, intensity = raw.xic(524.2644, progress=True)
results = raw.xic_batch_ms1(targets, progress=True)
scans = raw.all_ms1_scans(progress=True)The desktop application provides drag-and-drop RAW-to-mzML conversion with a visual interface.
- Drag one or more
.rawfiles onto the window - Configure output settings (precision, compression, MS1-only filter)
- Click convert and monitor progress
- Output mzML files are written alongside the source files or to a chosen directory
Benchmarked with a 796 MB Thermo Orbitrap Astral file (228,790 scans). The CLI is a 2 MB static binary with no runtime dependencies.
Tested on Apple M5 Pro (macOS), Windows x86_64, and Linux EPYC 9554.
Measured with hyperfine (wall-clock time including process startup and file open).
| Operation | Oxion (macOS) | .NET (macOS) | Oxion (Windows) | .NET (Windows) | Oxion (Linux) | .NET (Linux) |
|---|---|---|---|---|---|---|
| Full scan decode (228K scans) | 84 ms | — | 804 ms | — | 393 ms | — |
| RAW → mzML conversion | 5.5 s | 12.1 s | 9.4 s | 9.8 s | 13.8 s | 22.8 s |
Internal timing after file open — measures only the operation itself.
| Operation | Oxion (macOS) | .NET (macOS) | Oxion (Linux) | .NET (Linux) |
|---|---|---|---|---|
| Full scan decode (228K) | 40 ms | 396 ms | 213 ms | 772 ms |
| TIC (228K scans) | 0.4 ms | 57 ms | 0.3 ms | 109 ms |
| XIC (single target) | 0.9 ms | 154 ms | 3.0 ms | 250 ms |
| XIC (2,000 targets) | 9.6 ms | 2,589 ms | 94 ms | 3,044 ms |
| Oxion | .NET RawFileReader | |
|---|---|---|
| Binary size | 2 MB | ~500 MB (.NET SDK) |
| Runtime dependencies | None | .NET 8 runtime |
| Platform | Linux, macOS, Windows | Windows (native), Linux (Mono) |
- Direct binary parsing — reads the RAW format natively, no .NET runtime
- Memory-mapped I/O — zero-copy file access via OS virtual memory
- Zero-allocation hot paths — XIC reads raw bytes in-place, no heap allocations
- Buffer reuse — pre-allocated decode buffers, eliminating ~220K allocations per file
- Parallel processing — work-stealing across all cores for scan decode and folder conversion
- Bounded-memory batch pipeline — two-pass chunked extraction scales to 270+ files on NAS without exceeding available RAM
- LTO + codegen-units=1 — whole-program link-time optimization
Proprietary. Distributed as pre-compiled binaries and Python wheels.
For questions or bug reports, please open an issue.
