MS File ToolBox - tools for parsing some mass-spectrometry related file formats (mzML, mzXML, pep.xml, prot.xml, etc.)
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


The acronym stands for Mass Spectrometry File Toolbox.
This is a library for access to some common mass-spectrometry/proteomics data formats from Java.
This library is what drives BatMass.


  • Parsers for mzML/mzXML with unified API
    • Very fast, multi-threaded
    • Rich standardized API for contents of those files (scan and run meta-info, not just spectra).
    • msNumpress compression support for mzML
    • Automated LC/MS run structure determination:
      • Data structures for parent-child relationship between spectra
      • Indexes for scans based on scan numbers, retention times both globally and for each MS level separately
      • Convenient methods to get next-previous scans at the same MS level
    • Tolerant to malformed data
      • Can handle MS2 scan tags nested inside MS1 scans
      • Tolerant to missing or broken file index
      • Reindexing on the fly
    • Memory management
      • Automated spectra parsing on demand
        • You can parse just the structure of an LC/MS run without the spectral data, the memory footprint in this case will be very small. Only when spectra are requested will they be parsed.
        • Soft referencing of spectral data for GC
      • Tracking of which loaded data is not being used by any components with automated unloading.
  • Upcoming support for Thermo RAW files on Windows
  • pepXML parser and writer
  • protXML parser and writer
  • mzIdentML parser
  • GPMdb XML files parser
  • Agilent .cef files parser


You can find pre-compiled binaries here.

JAR: You can load MSFileToolbox subdirectory as a project into Intellij IDEA IDE and build the jar from there. Main Menu -> Build -> Build Artifacts.
NetBeans Module: Open the root directory in NetBeans as a project. You will see MSFTBX module suite which consists of 3 modules: MSFileToolbox Module - (this is the main thing), MSFileToolbox Libx - these are the depencies, and Auto Update (MSFTBX) - this is the update center for NetBeans Platform projects (you definitely don't need this) .

I will mavenize the project in near future.