The primary purpose of ms-data-core-api library is to provide commonly used classes and Object Model for Proteomics Experiments. You may also find it useful for your own computational proteomics projects.

Main Features

  • Common Object Model for different proteomics experiments, with classes to represent proteins, peptides, psms, psectrums
  • DataAccessControllers for mzTab, mzIdentML, PRIDE XML, PRIDE Database, mzML, mzXML, mgf, pkl, apl, ms2, dta files
  • Proteomics Standard compleint Data model with classes for Ontologies and User params
  • Read different file formats in proteomics in a common Object Model
  • Export the current model to mzTab files in for Identification Experiments.

Note: the library is still evolving, we are committed to expand this library and add more useful classes.

Getting ms-data-core-api

The zip file in the releases section contains the ms-data-core-api jar file and all other required libraries.

Maven Dependency

PRIDE Utilities library can be used in Maven projects, you can include the following snippets in your Maven pom file.

<!-- EBI repo -->

<!-- EBI SNAPSHOT repo -->

Note: you need to change the version number to the latest version.

For developers, the latest source code is available from our SVN repository.

How to use ms-data-core-api

Using ms-data-core-api

Reading a mzIdentML file:

This example shows how to read an mzIdentML file and retrieve the information from them:


//Open an inputFile mzIdentml File using memory 
MzIdentMLControllerImpl mzIdentMlController = new MzIdentMLControllerImpl(inputFile, true);

//Print size of the Sample List
List<Sample> samples = mzIdentMlController.getSamples();

//Print the Id of the first sample

// Get the list of softwares from the mzIdentMlController
List<Software> software = mzIdentMlController.getSoftwares();

//Print size of the Software List

//Print the Name of the first Software

//Retrieve the Identification Metadata    
IdentificationMetaData experiment = mzIdentMlController.getIdentificationMetaData();
// test SearchDatabase
List<SearchDataBase> databases = experiment.getSearchDataBases();
// test SpectrumIdentificationProtocol
List<SpectrumIdentificationProtocol> spectrumIdentificationProtocol = experiment.getSpectrumIdentificationProtocols();

// Retrieve the Protein Identification Protocol
Protocol proteinDetectionProtocol = experiment.getProteinDetectionProtocol();

//Retrieve all Protein Identifications
List<Comparable> identifications = new ArrayList<Comparable>(mzIdentMlController.getProteinIds());

Reading a PRIDE XML file:

This example shows how to read an PRIDE XML file and retrieve the information from them:

//Open an inputFile mzIdentml File using memory 
PrideXmlControllerImpl prideXMLController = new PrideXmlControllerImpl(inputFile);

// You can use the example above and the same functions to retrieve the data using this controller for example:
//Print size of the Sample List
List<Sample> samples = prideXMLController.getSamples();

Using tools in ms-data-core-api:

File format conversion

Convert from mzIdentML to mzTab

java -jar ms-data-core-api<version>.jar -c -mzid <input.mzid> -outputfile <output.mztab>

Convert from PRIDE XML to mzTab

java -jar ms-data-core-api<version>.jar -c -pridexml <pride.xml> -outputformat <output.mztab>

Convert from annotated mzTab to (sorted, filtered*) proBed

java -jar ms-data-core-api<version>.jar -c -mztab <input.mztab> -chromsizes <chrom.txt> -outputformat probed

Convert from annotated mzIdentML to (sorted, filtered*) proBed

java -jar ms-data-core-api<version>.jar -c -mzid <input.mztab> -chromsizes <chrom.txt> -outputformat probed

Convert from (sorted, filtered*) proBed to bigBed

java -jar ms-data-core-api<version>.jar -c -mztab <> -chromsizes <chrom.txt> -asqlfile <> -bigbedconverter <bedToBigBed>

File Validation

MzIdentML validation

java -jar ms-data-core-api<version>.jar -v -mzid <sample.mzid> -peak <spectra.mgf> -skipserialization -reportfile <outputReport.txt>

MzTab validation

java -jar ms-data-core-api<version>.jar -v -mztab <input.mztab> -peaks <spectra1.mgf>##<spectra2.mgf> -skipserialization -reportfile <outputReport.txt>

PRIDE XML validation

java -jar ms-data-core-api<version>.jar -v -pridexml <input.pride.xml> -skipserialization -reportfile <outputReport.txt>

XML schema validation

MzIdentML schema validation and normal validation

java -jar ms-data-core-api<version>.jar -v -mzid <input.mzid> -peak <spectra.mgf> -scehma -skipserialization -reportfile <outputReport.txt>

PRIDE XML schema validation only, without normal validation

java -jar ms-data-core-api<version>.jar -v -pridexml <input.pride.xml> -schemaonly -skipserialization -reportfile <outputReport.txt>

ProBed validation

ProBed validation with the default schema

java -jar ms-data-core-api<version>.jar -v -proBed <> -reportfile <outputReport.txt>

ProBed validation with a custom schema

java -jar ms-data-core-api<version>.jar -v -proBed -proBed <> -asqlfile <> -reportfile <outputReport.txt>


Check Results Files

java -jar ms-data-core-api<version>.jar -check -inputfile <inputfile>

Convert PRIDE or mzIdentML file to MzTab

java -jar ms-data-core-api<version>.jar -convert -inputfile <inputfile> -format <format>

Print Error/Warn detail message based on code

java -jar ms-data-core-api<version>.jar -error -code <code>


java -jar ms-data-core-api<version>.jar -h or --help 

How to cite it:

  • Perez-Riverol Y, Uszkoreit J, Sanchez A, Ternent T, Del Toro N, Hermjakob H, Vizcaíno JA, Wang R. (2015). ms-data-core-api: An open-source, metadata-oriented library for computational proteomics. Bioinformatics. 2015 Apr 24. PDF File Pubmed Record

This library has been used in:

  • Wang, R., Fabregat, A., Ríos, D., Ovelleiro, D., Foster, J. M., Côté, R. G., ... & Vizcaíno, J. A. (2012). PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nature biotechnology, 30(2), 135-137. PDF File, Pubmed Record
  • Vizcaíno, J. A., Côté, R. G., Csordas, A., Dianes, J. A., Fabregat, A., Foster, J. M., ... & Hermjakob, H. (2013). The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic acids research, 41(D1), D1063-D1069. PRIDE-Archive

Getting Help

If you have questions or need additional help, please contact the PRIDE Helpdesk at the EBI: pride-support at (replace at with @).

Please send us your feedback, including error reports, improvement suggestions, new feature requests and any other things you might want to suggest to the PRIDE team.


ms-data-core-api is a PRIDE API licensed under Apache License 2.0.


