MS File ToolBox - tools for parsing some mass-spectrometry related file formats (mzML, mzXML, pep.xml, prot.xml, etc.)
The acronym stands for Mass Spectrometry File Toolbox.
This is a library for access to some common mass-spectrometry/proteomics data formats from Java.
This library is what drives BatMass.


  • Parsers for mzML/mzXML with unified API
    • Very fast, multi-threaded
    • Rich standardized API for contents of those files (scan and run meta-info, not just spectra).
    • msNumpress compression support for mzML
    • Automated LC/MS run structure determination:
      • Data structures for parent-child relationship between spectra
      • Indexes for scans based on scan numbers, retention times both globally and for each MS level separately
      • Convenient methods to get next-previous scans at the same MS level
    • Tolerant to malformed data
      • Can handle MS2 scan tags nested inside MS1 scans
      • Tolerant to missing or broken file index
      • Reindexing on the fly
    • Memory management
      • Automated spectra parsing on demand
        • You can parse just the structure of an LC/MS run without the spectral data, the memory footprint in this case will be very small. Only when spectra are requested will they be parsed.
        • Soft referencing of spectral data for GC
      • Tracking of which loaded data is not being used by any components with automated unloading.
  • Upcoming support for Thermo RAW files on Windows
  • pepXML parser and writer
  • protXML parser and writer
  • mzIdentML parser
  • GPMdb XML files parser
  • Agilent .cef files parser


You can find pre-compiled binaries here.

JAR: You can load MSFileToolbox subdirectory as a project into Intellij IDEA IDE and build the jar from there. Main Menu -> Build -> Build Artifacts.
NetBeans Module: Open the root directory in NetBeans as a project. You will see MSFTBX module suite which consists of 3 modules: MSFileToolbox Module - (this is the main thing), MSFileToolbox Libx - these are the depencies, and Auto Update (MSFTBX) - this is the update center for NetBeans Platform projects (you definitely don't need this) .

I will mavenize the project in near future.