Oxion

A universal mass spectrometry file reader. Fast, cross-platform, no .NET required.

Oxion reads mass spectrometry files from 7 vendor formats directly from their binary formats, achieving up to 700x faster scan decoding than the official .NET RawFileReader library. It provides a CLI tool, a desktop GUI converter, and Python bindings with NumPy integration.

Supported vendors: Thermo, Bruker, Waters, Agilent, Shimadzu, Sciex, plus the open mzML standard.

Installation

CLI

Download the latest binary for your platform from Releases:

Platform	File	Install
Linux x86_64	`oxion-x86_64-unknown-linux-gnu.tar.gz`	`tar xzf oxion-*.tar.gz`
Linux aarch64	`oxion-aarch64-unknown-linux-gnu.tar.gz`	`tar xzf oxion-*.tar.gz`
macOS Intel	`oxion-x86_64-apple-darwin.tar.gz`	`tar xzf oxion-*.tar.gz`
macOS Apple Silicon	`oxion-aarch64-apple-darwin.tar.gz`	`tar xzf oxion-*.tar.gz`
Windows x64	`oxion-x86_64-pc-windows-msvc.zip`	Extract zip

After extracting, optionally move oxion to a directory on your PATH.

Python

pip install oxion

Requires Python 3.11+. Pre-built wheels are available for Linux (x86_64, aarch64), macOS (Intel, Apple Silicon), and Windows (x64). Wheels can also be downloaded from Releases for offline install:

pip install oxion-*.whl

GUI

Download the desktop converter from Releases:

Platform	File
macOS (Intel + Apple Silicon)	`oxion-gui-*.tar.gz`
Linux	`oxion-gui-*.deb`
Windows	`oxion-gui-.msi` or `-setup.exe`

Supported Formats

Format	Extension	Read	Convert to mzML	Notes
Thermo RAW	`.raw`	Full	Yes	v57-66, all Orbitrap/LTQ/Astral instruments
Bruker TDF/TSF	`.d`	Full	Yes	timsTOF (4D ion mobility), QTOF
Waters .raw	`.raw` (directory)	Full	Yes	MassLynx SQD2, ZQ, SIR/MRM
Agilent .d	`.d` (directory)	Full	Yes	MassHunter + ChemStation
Shimadzu LCD	`.lcd`	Full	Yes	MRM, triple-quad
Sciex WIFF	`.wiff`	Metadata	Planned	OLE2 metadata extraction
mzML	`.mzml`, `.mzml.gz`	Full	N/A	Indexed + non-indexed, gzip

Vendor Format Details

Thermo RAW: Format versions 57-66, centroid/profile/FT/LT decoders, trailer metadata (86+ fields). No .NET required.
Bruker .d: SQLite + zstd-compressed binary blobs, TOF-to-m/z quadratic calibration, full ion mobility (1/K0), ddaPASEF and diaPASEF support.
Waters .raw: All 4 MassLynx binary encodings (2/4/6/8-byte), polynomial m/z calibration, multi-function support.
Agilent .d: MassHunter (MSScan.bin + MSPeak.bin/MSProfile.bin) and ChemStation (DATA.MS big-endian) sub-formats.
Sciex WIFF: OLE2 metadata extraction; scan data decoding pending (format is not publicly documented).

CLI Reference

oxion <COMMAND> [OPTIONS]

`info` - File Information

Display instrument model, scan count, RT range, mass range, and trailer field names.

oxion info sample.raw
oxion info data.mzML

`scan` - Single Scan Export

Export a single scan as JSON with m/z and intensity arrays.

oxion scan sample.raw -n 1          # First scan
oxion scan sample.raw -n 5000       # Scan number 5000

`tic` - Total Ion Chromatogram

Export the TIC as a two-column CSV (rt, intensity). Sub-millisecond extraction from the scan index (no scan decoding needed).

oxion tic sample.raw                # Print to stdout
oxion tic sample.raw -o tic.csv     # Save to file

`xic` - Extracted Ion Chromatogram

Extract one or more XIC traces from a file.

# Single target at 5 ppm (default)
oxion xic sample.raw --mz 524.2644

# Multiple targets in one pass (shared scan iteration)
oxion xic sample.raw --mz 524.2644 --mz 445.12 --mz 302.05

# MS1 only — skips MS2 scans, much faster for DDA data
oxion xic sample.raw --mz 524.2644 --ms1-only

# Custom tolerance
oxion xic sample.raw --mz 524.2644 --ppm 10.0

# Save to file
oxion xic sample.raw --mz 524.2644 -o xic.csv

`batch-xic` - Multi-File Batch XIC

Extract XIC traces across multiple files, align to a common RT grid, and output a CSV matrix. Uses a memory-bounded two-pass pipeline (mmap prescan → chunked extraction) that scales to hundreds of files on NAS.

oxion batch-xic \
    -f file1.raw -f file2.raw -f file3.raw \
    --mz 524.2644 --mz 445.12 \
    --ppm 5.0 \
    --rt-resolution 0.01 \
    -o batch_output.csv

# With RT range filter
oxion batch-xic \
    -f *.raw \
    --mz 524.2644 \
    --rt-range "2.0,15.0" \
    -o filtered.csv

# Control parallelism and timeout for NAS/network storage
oxion batch-xic \
    -f *.raw --mz-file targets.txt \
    --max-concurrent 4 \
    --timeout 120 \
    -o output.csv

Option	Default	Description
`--max-concurrent`	4 (auto)	Max files processed in parallel. Lower = less memory.
`--timeout`	120	Per-file read timeout in seconds. Skips stalled NAS reads. 0 = disable.

`bench-concurrency` - I/O vs CPU Profiling

Benchmark file-open (I/O) vs XIC extraction (CPU) at different concurrency levels to find the optimal --max-concurrent for your storage.

oxion bench-concurrency /path/to/raw/files \
    --targets 2500 \
    --concurrency 1,2,4,8,16 \
    --max-files 20

`ms2-spectra` - DDA Fragment Extraction

Find all MS2 scans matching a precursor m/z and export their fragment spectra.

oxion ms2-spectra sample.raw --mz 524.2644 --ppm 10.0 -o fragments.csv

`convert` - RAW to mzML Conversion

Convert Thermo RAW files to indexed mzML format.

# Single file (output: sample.mzML in same directory)
oxion convert sample.raw

# Specify output path
oxion convert sample.raw -o output.mzML

# Folder conversion (parallel, all .raw files)
oxion convert ./raw_files/ -o ./mzml_output/

# Options
oxion convert sample.raw \
    --mz-bits 32 \
    --intensity-bits 32 \
    --compression zlib \
    --ms1-only \
    --min-intensity 100 \
    --no-index

`trailer` - Trailer Extra Data

Display raw trailer metadata for a specific scan (Thermo RAW only).

oxion trailer sample.raw -n 1

`benchmark` - Performance Test

Benchmark scan decoding speed.

oxion benchmark sample.raw --mmap              # Sequential decode
oxion benchmark sample.raw --mmap --parallel    # Parallel decode
oxion benchmark sample.raw --mmap --xic         # XIC extraction benchmark

Python API

Opening Files

import oxion

# Auto-detect format from extension (works for all 7 formats)
raw = oxion.open("sample.raw")           # Thermo RAW
mzml = oxion.open("data.mzML")           # mzML
lcd = oxion.open("sample.lcd")            # Shimadzu LCD
bruker = oxion.open("data.d")             # Bruker TDF/TSF
waters = oxion.open("data.raw")           # Waters .raw directory
agilent = oxion.open("data.D")            # Agilent .d directory
wiff = oxion.open("data.wiff")            # Sciex WIFF

# RAW file with memory-mapped I/O (faster for large files)
raw = oxion.open("sample.raw", mmap=True)

# Or use the format-specific class directly
raw = oxion.RawFile("sample.raw", mmap=True)

File Metadata

raw = oxion.RawFile("sample.raw")

raw.n_scans              # Total number of scans
raw.first_scan           # First scan number
raw.last_scan            # Last scan number
raw.start_time           # Start RT in minutes
raw.end_time             # End RT in minutes
raw.instrument_model     # Instrument model string
raw.sample_name          # Sample name from acquisition
raw.version              # RAW format version (57-66)

Reading Scans

# Get scan data as NumPy arrays
mz, intensity = raw.scan(1)

# Get scan metadata (no array decoding)
info = raw.scan_info(1)
info.scan_number         # 1
info.rt                  # Retention time in minutes
info.ms_level            # 1, 2, 3, ...
info.polarity            # "positive" or "negative"
info.tic                 # Total ion current
info.base_peak_mz        # Base peak m/z
info.base_peak_intensity # Base peak intensity
info.filter_string       # e.g. "FTMS + p NSI Full ms [100.00-1000.00]"
info.precursor_mz        # Precursor m/z (MS2+ only, None for MS1)
info.precursor_charge    # Charge state (MS2+ only, None for MS1)

# Read all MS1 scans in parallel
all_ms1 = raw.all_ms1_scans(progress=True)  # list of (mz, intensity) tuples

Chromatograms

# TIC (sub-millisecond, from scan index)
rt, intensity = raw.tic()

# XIC (single target)
rt, intensity = raw.xic(524.2644, ppm=5.0)

# XIC restricted to MS1 scans (faster for DDA)
rt, intensity = raw.xic_ms1(524.2644, ppm=5.0)

# Batch XIC (multiple targets, single scan pass)
targets = [(524.2644, 5.0), (445.12, 5.0), (302.05, 10.0)]
results = raw.xic_batch_ms1(targets, progress=True)
for rt, intensity in results:
    print(f"  {len(rt)} points")

DDA / DIA Analysis

# Acquisition type detection
raw.acquisition_type()     # "dda", "dia", "ms1_only", or "mixed"

# MS level queries
raw.ms_level_of_scan(100)  # 1 or 2
raw.is_ms2_scan(100)       # True/False
raw.scan_numbers_by_level(2)  # [4, 5, 6, 8, ...]

# Precursor information
precursors = raw.precursor_list()     # Unique precursor m/z as NumPy array
parent = raw.parent_ms1_scan(5000)    # Parent MS1 scan number for scan 5000

# Find MS2 scans for a precursor
ms2_scans = raw.ms2_scans_for_precursor(524.2644, tolerance_ppm=10.0)
for s in ms2_scans:
    print(f"  Scan {s.scan_number}, RT={s.rt:.2f}, CE={s.collision_energy}")

# All MS2 scan metadata (fast, no decoding)
all_ms2 = raw.all_ms2_scan_info()

DIA Windows

# Get unique isolation windows
windows = raw.isolation_windows()
for w in windows:
    print(w)  # IsolationWindow(center_mz=500.0, width=25.0, ce=30.0, activation=HCD)

# Get MS2 scans for a window
scans = raw.scans_for_window(windows[0])

# XIC within a DIA window
rt, intensity = raw.xic_ms2_window(524.2644, ppm=5.0, window=windows[0])

Trailer Metadata

# Available trailer field names
fields = raw.trailer_fields()  # ["Charge State", "Ion Injection Time (ms)", ...]

# Trailer data for a specific scan
data = raw.trailer_extra(1)    # dict: {"Charge State": "2", "Ion Injection Time (ms)": "35.00", ...}

Multi-File Batch XIC

Extract aligned chromatograms across many files with bounded memory.

import oxion

files = ["sample1.raw", "sample2.raw", "sample3.raw"]
targets = [(524.2644, 5.0), (445.12, 5.0)]

# Standard batch — returns 3D tensor (samples × targets × timepoints)
tensor, rt_grid, sample_names = oxion.batch_xic(
    files, targets,
    progress=True,          # tqdm progress bar
    max_concurrent=4,       # parallel files (tune for NAS)
    timeout_secs=120,       # skip stalled files after 2 min
)
# tensor.shape: (3, 2, n_timepoints), dtype=float64

# Half-memory mode — f32 intensities (sufficient for XIC data)
tensor, rt_grid, names = oxion.batch_xic(
    files, targets,
    use_f32=True,           # returns float32 tensor
    max_concurrent=4,
)
# tensor.dtype: float32

Streaming Batch XIC

For maximum memory control (hundreds of files × thousands of targets), use the two-step streaming API:

import oxion
import numpy as np

files = [f"sample_{i}.raw" for i in range(270)]
targets = [(mz, 5.0) for mz in np.linspace(70, 1050, 2500)]

# Step 1: Prescan — lightweight RT grid computation (mmap, ~1s for 270 files)
rt_grid, valid_files, sample_names = oxion.prescan_batch_xic(files)

# Step 2: Process one file at a time — you control memory
for i, path in enumerate(valid_files):
    data = oxion.extract_xic_onto_grid(path, targets, rt_grid)
    # data.shape: (2500, n_timepoints), dtype=float32
    
    # Save incrementally, build sparse matrix, stream to HDF5, etc.
    # Only ~1 GB in memory at a time instead of ~40 GB

Progress Bars

Most long-running operations support tqdm progress bars:

# Install tqdm for progress bar support
# pip install tqdm

rt, intensity = raw.xic(524.2644, progress=True)
results = raw.xic_batch_ms1(targets, progress=True)
scans = raw.all_ms1_scans(progress=True)

GUI Converter

The desktop application provides drag-and-drop RAW-to-mzML conversion with a visual interface.

Drag one or more .raw files onto the window
Configure output settings (precision, compression, MS1-only filter)
Click convert and monitor progress
Output mzML files are written alongside the source files or to a chosen directory

Performance

Benchmarked with a 796 MB Thermo Orbitrap Astral file (228,790 scans). The CLI is a 2 MB static binary with no runtime dependencies.

Tested on Apple M5 Pro (macOS), Windows x86_64, and Linux EPYC 9554.

End-to-end operations

Measured with hyperfine (wall-clock time including process startup and file open).

Operation	Oxion (macOS)	.NET (macOS)	Oxion (Windows)	.NET (Windows)	Oxion (Linux)	.NET (Linux)
Full scan decode (228K scans)	84 ms	—	804 ms	—	393 ms	—
RAW → mzML conversion	5.5 s	12.1 s	9.4 s	9.8 s	13.8 s	22.8 s

Data access operations

Internal timing after file open — measures only the operation itself.

Operation	Oxion (macOS)	.NET (macOS)	Oxion (Linux)	.NET (Linux)
Full scan decode (228K)	40 ms	396 ms	213 ms	772 ms
TIC (228K scans)	0.4 ms	57 ms	0.3 ms	109 ms
XIC (single target)	0.9 ms	154 ms	3.0 ms	250 ms
XIC (2,000 targets)	9.6 ms	2,589 ms	94 ms	3,044 ms

Runtime comparison

	Oxion	.NET RawFileReader
Binary size	2 MB	~500 MB (.NET SDK)
Runtime dependencies	None	.NET 8 runtime
Platform	Linux, macOS, Windows	Windows (native), Linux (Mono)

Why is Oxion fast?

Direct binary parsing — reads the RAW format natively, no .NET runtime
Memory-mapped I/O — zero-copy file access via OS virtual memory
Zero-allocation hot paths — XIC reads raw bytes in-place, no heap allocations
Buffer reuse — pre-allocated decode buffers, eliminating ~220K allocations per file
Parallel processing — work-stealing across all cores for scan decode and folder conversion
Bounded-memory batch pipeline — two-pass chunked extraction scales to 270+ files on NAS without exceeding available RAM
LTO + codegen-units=1 — whole-program link-time optimization

License

Proprietary. Distributed as pre-compiled binaries and Python wheels.

For questions or bug reports, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Oxion

Table of Contents

Installation

CLI

Python

GUI

Supported Formats

Vendor Format Details

CLI Reference

info - File Information

scan - Single Scan Export

tic - Total Ion Chromatogram

xic - Extracted Ion Chromatogram

batch-xic - Multi-File Batch XIC

bench-concurrency - I/O vs CPU Profiling

ms2-spectra - DDA Fragment Extraction

convert - RAW to mzML Conversion

trailer - Trailer Extra Data

benchmark - Performance Test

Python API

Opening Files

File Metadata

Reading Scans

Chromatograms

DDA / DIA Analysis

DIA Windows

Trailer Metadata

Multi-File Batch XIC

Streaming Batch XIC

Progress Bars

GUI Converter

Performance

End-to-end operations

Data access operations

Runtime comparison

Why is Oxion fast?

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors

Uh oh!

`info` - File Information

`scan` - Single Scan Export

`tic` - Total Ion Chromatogram

`xic` - Extracted Ion Chromatogram

`batch-xic` - Multi-File Batch XIC

`bench-concurrency` - I/O vs CPU Profiling

`ms2-spectra` - DDA Fragment Extraction

`convert` - RAW to mzML Conversion

`trailer` - Trailer Extra Data

`benchmark` - Performance Test

Packages