Skip to content

EstrellaXD/oxion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

Oxion

Oxion

A universal mass spectrometry file reader. Fast, cross-platform, no .NET required.

Oxion reads mass spectrometry files from 7 vendor formats directly from their binary formats, achieving up to 700x faster scan decoding than the official .NET RawFileReader library. It provides a CLI tool, a desktop GUI converter, and Python bindings with NumPy integration.

Supported vendors: Thermo, Bruker, Waters, Agilent, Shimadzu, Sciex, plus the open mzML standard.

Table of Contents


Installation

CLI

Download the latest binary for your platform from Releases:

Platform File Install
Linux x86_64 oxion-x86_64-unknown-linux-gnu.tar.gz tar xzf oxion-*.tar.gz
Linux aarch64 oxion-aarch64-unknown-linux-gnu.tar.gz tar xzf oxion-*.tar.gz
macOS Intel oxion-x86_64-apple-darwin.tar.gz tar xzf oxion-*.tar.gz
macOS Apple Silicon oxion-aarch64-apple-darwin.tar.gz tar xzf oxion-*.tar.gz
Windows x64 oxion-x86_64-pc-windows-msvc.zip Extract zip

After extracting, optionally move oxion to a directory on your PATH.

Python

pip install oxion

Requires Python 3.11+. Pre-built wheels are available for Linux (x86_64, aarch64), macOS (Intel, Apple Silicon), and Windows (x64). Wheels can also be downloaded from Releases for offline install:

pip install oxion-*.whl

GUI

Download the desktop converter from Releases:

Platform File
macOS (Intel + Apple Silicon) oxion-gui-*.tar.gz
Linux oxion-gui-*.deb
Windows oxion-gui-*.msi or *-setup.exe

Supported Formats

Format Extension Read Convert to mzML Notes
Thermo RAW .raw Full Yes v57-66, all Orbitrap/LTQ/Astral instruments
Bruker TDF/TSF .d Full Yes timsTOF (4D ion mobility), QTOF
Waters .raw .raw (directory) Full Yes MassLynx SQD2, ZQ, SIR/MRM
Agilent .d .d (directory) Full Yes MassHunter + ChemStation
Shimadzu LCD .lcd Full Yes MRM, triple-quad
Sciex WIFF .wiff Metadata Planned OLE2 metadata extraction
mzML .mzml, .mzml.gz Full N/A Indexed + non-indexed, gzip

Vendor Format Details

  • Thermo RAW: Format versions 57-66, centroid/profile/FT/LT decoders, trailer metadata (86+ fields). No .NET required.
  • Bruker .d: SQLite + zstd-compressed binary blobs, TOF-to-m/z quadratic calibration, full ion mobility (1/K0), ddaPASEF and diaPASEF support.
  • Waters .raw: All 4 MassLynx binary encodings (2/4/6/8-byte), polynomial m/z calibration, multi-function support.
  • Agilent .d: MassHunter (MSScan.bin + MSPeak.bin/MSProfile.bin) and ChemStation (DATA.MS big-endian) sub-formats.
  • Sciex WIFF: OLE2 metadata extraction; scan data decoding pending (format is not publicly documented).

CLI Reference

oxion <COMMAND> [OPTIONS]

info - File Information

Display instrument model, scan count, RT range, mass range, and trailer field names.

oxion info sample.raw
oxion info data.mzML

scan - Single Scan Export

Export a single scan as JSON with m/z and intensity arrays.

oxion scan sample.raw -n 1          # First scan
oxion scan sample.raw -n 5000       # Scan number 5000

tic - Total Ion Chromatogram

Export the TIC as a two-column CSV (rt, intensity). Sub-millisecond extraction from the scan index (no scan decoding needed).

oxion tic sample.raw                # Print to stdout
oxion tic sample.raw -o tic.csv     # Save to file

xic - Extracted Ion Chromatogram

Extract one or more XIC traces from a file.

# Single target at 5 ppm (default)
oxion xic sample.raw --mz 524.2644

# Multiple targets in one pass (shared scan iteration)
oxion xic sample.raw --mz 524.2644 --mz 445.12 --mz 302.05

# MS1 only — skips MS2 scans, much faster for DDA data
oxion xic sample.raw --mz 524.2644 --ms1-only

# Custom tolerance
oxion xic sample.raw --mz 524.2644 --ppm 10.0

# Save to file
oxion xic sample.raw --mz 524.2644 -o xic.csv

batch-xic - Multi-File Batch XIC

Extract XIC traces across multiple files, align to a common RT grid, and output a CSV matrix. Uses a memory-bounded two-pass pipeline (mmap prescan → chunked extraction) that scales to hundreds of files on NAS.

oxion batch-xic \
    -f file1.raw -f file2.raw -f file3.raw \
    --mz 524.2644 --mz 445.12 \
    --ppm 5.0 \
    --rt-resolution 0.01 \
    -o batch_output.csv

# With RT range filter
oxion batch-xic \
    -f *.raw \
    --mz 524.2644 \
    --rt-range "2.0,15.0" \
    -o filtered.csv

# Control parallelism and timeout for NAS/network storage
oxion batch-xic \
    -f *.raw --mz-file targets.txt \
    --max-concurrent 4 \
    --timeout 120 \
    -o output.csv
Option Default Description
--max-concurrent 4 (auto) Max files processed in parallel. Lower = less memory.
--timeout 120 Per-file read timeout in seconds. Skips stalled NAS reads. 0 = disable.

bench-concurrency - I/O vs CPU Profiling

Benchmark file-open (I/O) vs XIC extraction (CPU) at different concurrency levels to find the optimal --max-concurrent for your storage.

oxion bench-concurrency /path/to/raw/files \
    --targets 2500 \
    --concurrency 1,2,4,8,16 \
    --max-files 20

ms2-spectra - DDA Fragment Extraction

Find all MS2 scans matching a precursor m/z and export their fragment spectra.

oxion ms2-spectra sample.raw --mz 524.2644 --ppm 10.0 -o fragments.csv

convert - RAW to mzML Conversion

Convert Thermo RAW files to indexed mzML format.

# Single file (output: sample.mzML in same directory)
oxion convert sample.raw

# Specify output path
oxion convert sample.raw -o output.mzML

# Folder conversion (parallel, all .raw files)
oxion convert ./raw_files/ -o ./mzml_output/

# Options
oxion convert sample.raw \
    --mz-bits 32 \
    --intensity-bits 32 \
    --compression zlib \
    --ms1-only \
    --min-intensity 100 \
    --no-index

trailer - Trailer Extra Data

Display raw trailer metadata for a specific scan (Thermo RAW only).

oxion trailer sample.raw -n 1

benchmark - Performance Test

Benchmark scan decoding speed.

oxion benchmark sample.raw --mmap              # Sequential decode
oxion benchmark sample.raw --mmap --parallel    # Parallel decode
oxion benchmark sample.raw --mmap --xic         # XIC extraction benchmark

Python API

Opening Files

import oxion

# Auto-detect format from extension (works for all 7 formats)
raw = oxion.open("sample.raw")           # Thermo RAW
mzml = oxion.open("data.mzML")           # mzML
lcd = oxion.open("sample.lcd")            # Shimadzu LCD
bruker = oxion.open("data.d")             # Bruker TDF/TSF
waters = oxion.open("data.raw")           # Waters .raw directory
agilent = oxion.open("data.D")            # Agilent .d directory
wiff = oxion.open("data.wiff")            # Sciex WIFF

# RAW file with memory-mapped I/O (faster for large files)
raw = oxion.open("sample.raw", mmap=True)

# Or use the format-specific class directly
raw = oxion.RawFile("sample.raw", mmap=True)

File Metadata

raw = oxion.RawFile("sample.raw")

raw.n_scans              # Total number of scans
raw.first_scan           # First scan number
raw.last_scan            # Last scan number
raw.start_time           # Start RT in minutes
raw.end_time             # End RT in minutes
raw.instrument_model     # Instrument model string
raw.sample_name          # Sample name from acquisition
raw.version              # RAW format version (57-66)

Reading Scans

# Get scan data as NumPy arrays
mz, intensity = raw.scan(1)

# Get scan metadata (no array decoding)
info = raw.scan_info(1)
info.scan_number         # 1
info.rt                  # Retention time in minutes
info.ms_level            # 1, 2, 3, ...
info.polarity            # "positive" or "negative"
info.tic                 # Total ion current
info.base_peak_mz        # Base peak m/z
info.base_peak_intensity # Base peak intensity
info.filter_string       # e.g. "FTMS + p NSI Full ms [100.00-1000.00]"
info.precursor_mz        # Precursor m/z (MS2+ only, None for MS1)
info.precursor_charge    # Charge state (MS2+ only, None for MS1)

# Read all MS1 scans in parallel
all_ms1 = raw.all_ms1_scans(progress=True)  # list of (mz, intensity) tuples

Chromatograms

# TIC (sub-millisecond, from scan index)
rt, intensity = raw.tic()

# XIC (single target)
rt, intensity = raw.xic(524.2644, ppm=5.0)

# XIC restricted to MS1 scans (faster for DDA)
rt, intensity = raw.xic_ms1(524.2644, ppm=5.0)

# Batch XIC (multiple targets, single scan pass)
targets = [(524.2644, 5.0), (445.12, 5.0), (302.05, 10.0)]
results = raw.xic_batch_ms1(targets, progress=True)
for rt, intensity in results:
    print(f"  {len(rt)} points")

DDA / DIA Analysis

# Acquisition type detection
raw.acquisition_type()     # "dda", "dia", "ms1_only", or "mixed"

# MS level queries
raw.ms_level_of_scan(100)  # 1 or 2
raw.is_ms2_scan(100)       # True/False
raw.scan_numbers_by_level(2)  # [4, 5, 6, 8, ...]

# Precursor information
precursors = raw.precursor_list()     # Unique precursor m/z as NumPy array
parent = raw.parent_ms1_scan(5000)    # Parent MS1 scan number for scan 5000

# Find MS2 scans for a precursor
ms2_scans = raw.ms2_scans_for_precursor(524.2644, tolerance_ppm=10.0)
for s in ms2_scans:
    print(f"  Scan {s.scan_number}, RT={s.rt:.2f}, CE={s.collision_energy}")

# All MS2 scan metadata (fast, no decoding)
all_ms2 = raw.all_ms2_scan_info()

DIA Windows

# Get unique isolation windows
windows = raw.isolation_windows()
for w in windows:
    print(w)  # IsolationWindow(center_mz=500.0, width=25.0, ce=30.0, activation=HCD)

# Get MS2 scans for a window
scans = raw.scans_for_window(windows[0])

# XIC within a DIA window
rt, intensity = raw.xic_ms2_window(524.2644, ppm=5.0, window=windows[0])

Trailer Metadata

# Available trailer field names
fields = raw.trailer_fields()  # ["Charge State", "Ion Injection Time (ms)", ...]

# Trailer data for a specific scan
data = raw.trailer_extra(1)    # dict: {"Charge State": "2", "Ion Injection Time (ms)": "35.00", ...}

Multi-File Batch XIC

Extract aligned chromatograms across many files with bounded memory.

import oxion

files = ["sample1.raw", "sample2.raw", "sample3.raw"]
targets = [(524.2644, 5.0), (445.12, 5.0)]

# Standard batch — returns 3D tensor (samples × targets × timepoints)
tensor, rt_grid, sample_names = oxion.batch_xic(
    files, targets,
    progress=True,          # tqdm progress bar
    max_concurrent=4,       # parallel files (tune for NAS)
    timeout_secs=120,       # skip stalled files after 2 min
)
# tensor.shape: (3, 2, n_timepoints), dtype=float64

# Half-memory mode — f32 intensities (sufficient for XIC data)
tensor, rt_grid, names = oxion.batch_xic(
    files, targets,
    use_f32=True,           # returns float32 tensor
    max_concurrent=4,
)
# tensor.dtype: float32

Streaming Batch XIC

For maximum memory control (hundreds of files × thousands of targets), use the two-step streaming API:

import oxion
import numpy as np

files = [f"sample_{i}.raw" for i in range(270)]
targets = [(mz, 5.0) for mz in np.linspace(70, 1050, 2500)]

# Step 1: Prescan — lightweight RT grid computation (mmap, ~1s for 270 files)
rt_grid, valid_files, sample_names = oxion.prescan_batch_xic(files)

# Step 2: Process one file at a time — you control memory
for i, path in enumerate(valid_files):
    data = oxion.extract_xic_onto_grid(path, targets, rt_grid)
    # data.shape: (2500, n_timepoints), dtype=float32
    
    # Save incrementally, build sparse matrix, stream to HDF5, etc.
    # Only ~1 GB in memory at a time instead of ~40 GB

Progress Bars

Most long-running operations support tqdm progress bars:

# Install tqdm for progress bar support
# pip install tqdm

rt, intensity = raw.xic(524.2644, progress=True)
results = raw.xic_batch_ms1(targets, progress=True)
scans = raw.all_ms1_scans(progress=True)

GUI Converter

The desktop application provides drag-and-drop RAW-to-mzML conversion with a visual interface.

  • Drag one or more .raw files onto the window
  • Configure output settings (precision, compression, MS1-only filter)
  • Click convert and monitor progress
  • Output mzML files are written alongside the source files or to a chosen directory

Performance

Benchmarked with a 796 MB Thermo Orbitrap Astral file (228,790 scans). The CLI is a 2 MB static binary with no runtime dependencies.

Tested on Apple M5 Pro (macOS), Windows x86_64, and Linux EPYC 9554.

End-to-end operations

Measured with hyperfine (wall-clock time including process startup and file open).

Operation Oxion (macOS) .NET (macOS) Oxion (Windows) .NET (Windows) Oxion (Linux) .NET (Linux)
Full scan decode (228K scans) 84 ms 804 ms 393 ms
RAW → mzML conversion 5.5 s 12.1 s 9.4 s 9.8 s 13.8 s 22.8 s

Data access operations

Internal timing after file open — measures only the operation itself.

Operation Oxion (macOS) .NET (macOS) Oxion (Linux) .NET (Linux)
Full scan decode (228K) 40 ms 396 ms 213 ms 772 ms
TIC (228K scans) 0.4 ms 57 ms 0.3 ms 109 ms
XIC (single target) 0.9 ms 154 ms 3.0 ms 250 ms
XIC (2,000 targets) 9.6 ms 2,589 ms 94 ms 3,044 ms

Oxion Benchmark — 796 MB Orbitrap Astral

Runtime comparison

Oxion .NET RawFileReader
Binary size 2 MB ~500 MB (.NET SDK)
Runtime dependencies None .NET 8 runtime
Platform Linux, macOS, Windows Windows (native), Linux (Mono)

Why is Oxion fast?

  • Direct binary parsing — reads the RAW format natively, no .NET runtime
  • Memory-mapped I/O — zero-copy file access via OS virtual memory
  • Zero-allocation hot paths — XIC reads raw bytes in-place, no heap allocations
  • Buffer reuse — pre-allocated decode buffers, eliminating ~220K allocations per file
  • Parallel processing — work-stealing across all cores for scan decode and folder conversion
  • Bounded-memory batch pipeline — two-pass chunked extraction scales to 270+ files on NAS without exceeding available RAM
  • LTO + codegen-units=1 — whole-program link-time optimization

License

Proprietary. Distributed as pre-compiled binaries and Python wheels.

For questions or bug reports, please open an issue.

About

Universal mass spectrometry file reader — fast, cross-platform, no .NET required

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors