Open-source java library to handle different file format standards for proteomics. Specially ms-data-core-api is good for MetaData representation.
Java Other
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
src Fixing bugs of fastvalidation Jul 5, 2018


Build Status


The primary purpose of ms-data-core-api library is to provide commonly used classes and Object Model for Proteomics Experiments. You may also find it useful for your own computational proteomics projects.

Main Features

  • Common Object Model for different proteomics experiments, with classes to represent proteins, peptides, psms, psectrums
  • DataAccessControllers for mzTab, mzIdentML, PRIDE XML, PRIDE Database, mzML, mzXML, mgf, pkl, apl, ms2, dta files
  • Proteomics Standard compleint Data model with classes for Ontologies and User params
  • Read different file formats in proteomics in a common Object Model
  • Export the current model to mzTab files in for Identification Experiments.

Note: the library is still evolving, we are committed to expand this library and add more useful classes.

Getting ms-data-core-api

The zip file in the releases section contains the ms-data-core-api jar file and all other required libraries.

Maven Dependency

PRIDE Utilities library can be used in Maven projects, you can include the following snippets in your Maven pom file.

<!-- EBI repo -->

<!-- EBI SNAPSHOT repo -->

Note: you need to change the version number to the latest version.

For developers, the latest source code is available from our SVN repository.

How to use ms-data-core-api

Using ms-data-core-api

Reading a mzIdentML file:

This example shows how to read an mzIdentML file and retrieve the information from them:


//Open an inputFile mzIdentml File using memory 
MzIdentMLControllerImpl mzIdentMlController = new MzIdentMLControllerImpl(inputFile, true);

//Print size of the Sample List
List<Sample> samples = mzIdentMlController.getSamples();

//Print the Id of the first sample

// Get the list of softwares from the mzIdentMlController
List<Software> software = mzIdentMlController.getSoftwares();

//Print size of the Software List

//Print the Name of the first Software

//Retrieve the Identification Metadata    
IdentificationMetaData experiment = mzIdentMlController.getIdentificationMetaData();
// test SearchDatabase
List<SearchDataBase> databases = experiment.getSearchDataBases();
// test SpectrumIdentificationProtocol
List<SpectrumIdentificationProtocol> spectrumIdentificationProtocol = experiment.getSpectrumIdentificationProtocols();

// Retrieve the Protein Identification Protocol
Protocol proteinDetectionProtocol = experiment.getProteinDetectionProtocol();

//Retrieve all Protein Identifications
List<Comparable> identifications = new ArrayList<Comparable>(mzIdentMlController.getProteinIds());

Reading a PRIDE XML file:

This example shows how to read an PRIDE XML file and retrieve the information from them:

//Open an inputFile mzIdentml File using memory 
PrideXmlControllerImpl prideXMLController = new PrideXmlControllerImpl(inputFile);

// You can use the example above and the same functions to retrieve the data using this controller for example:
//Print size of the Sample List
List<Sample> samples = prideXMLController.getSamples();

Using tools in ms-data-core-api:

File format conversion

Convert from mzIdentML to mzTab

java -jar ms-data-core-api<version>.jar -c -mzid <input.mzid> -outputfile <output.mztab>

Convert from PRIDE XML to mzTab

java -jar ms-data-core-api<version>.jar -c -pridexml <pride.xml> -outputformat <output.mztab>

Convert from annotated mzTab to (sorted, filtered*) proBed

java -jar ms-data-core-api<version>.jar -c -mztab <input.mztab> -chromsizes <chrom.txt> -outputformat probed

Convert from annotated mzIdentML to (sorted, filtered*) proBed

java -jar ms-data-core-api<version>.jar -c -mzid <input.mztab> -chromsizes <chrom.txt> -outputformat probed

Convert from (sorted, filtered*) proBed to bigBed

java -jar ms-data-core-api<version>.jar -c -mztab <> -chromsizes <chrom.txt> -asqlfile <> -bigbedconverter <bedToBigBed>

File Validation

MzIdentML validation

java -jar ms-data-core-api<version>.jar -v -mzid <sample.mzid> -peak <spectra.mgf> -skipserialization -reportfile <outputReport.txt>

MzTab validation

java -jar ms-data-core-api<version>.jar -v -mztab <input.mztab> -peaks <spectra1.mgf>##<spectra2.mgf> -skipserialization -reportfile <outputReport.txt>

PRIDE XML validation

java -jar ms-data-core-api<version>.jar -v -pridexml <input.pride.xml> -skipserialization -reportfile <outputReport.txt>

XML schema validation

MzIdentML schema validation and normal validation

java -jar ms-data-core-api<version>.jar -v -mzid <input.mzid> -peak <spectra.mgf> -scehma -skipserialization -reportfile <outputReport.txt>

PRIDE XML schema validation only, without normal validation

java -jar ms-data-core-api<version>.jar -v -pridexml <input.pride.xml> -schemaonly -skipserialization -reportfile <outputReport.txt>

ProBed validation

ProBed validation with the default schema

java -jar ms-data-core-api<version>.jar -v -proBed <> -reportfile <outputReport.txt>

ProBed validation with a custom schema

java -jar ms-data-core-api<version>.jar -v -proBed -proBed <> -asqlfile <> -reportfile <outputReport.txt>


Check Results Files

java -jar ms-data-core-api<version>.jar -check -inputfile <inputfile>

Convert PRIDE or mzIdentML file to MzTab

java -jar ms-data-core-api<version>.jar -convert -inputfile <inputfile> -format <format>

Print Error/Warn detail message based on code

java -jar ms-data-core-api<version>.jar -error -code <code>


java -jar ms-data-core-api<version>.jar -h or --help 

How to cite it:

  • Perez-Riverol Y, Uszkoreit J, Sanchez A, Ternent T, Del Toro N, Hermjakob H, Vizcaíno JA, Wang R. (2015). ms-data-core-api: An open-source, metadata-oriented library for computational proteomics. Bioinformatics. 2015 Apr 24. PDF File Pubmed Record

This library has been used in:

  • Wang, R., Fabregat, A., Ríos, D., Ovelleiro, D., Foster, J. M., Côté, R. G., ... & Vizcaíno, J. A. (2012). PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nature biotechnology, 30(2), 135-137. PDF File, Pubmed Record
  • Vizcaíno, J. A., Côté, R. G., Csordas, A., Dianes, J. A., Fabregat, A., Foster, J. M., ... & Hermjakob, H. (2013). The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic acids research, 41(D1), D1063-D1069. PRIDE-Archive

Getting Help

If you have questions or need additional help, please contact the PRIDE Helpdesk at the EBI: pride-support at (replace at with @).

Please send us your feedback, including error reports, improvement suggestions, new feature requests and any other things you might want to suggest to the PRIDE team.


ms-data-core-api is a PRIDE API licensed under Apache License 2.0.