Manipluating and exploring protein and proteomics data
R Makefile
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Manipulating and exploring protein and proteomics data.


It is advised to install Pbase from Bioconductor:


From github using devtools::install_github:



See the DESCRIPTION file for a complete list.

Getting started

Currently, the best way to get started is ?Proteins and the Pbase-data vignette. More documentation is on its way.


Pbase is under heavy development and is likely to considerably change in the near future. Suggestion and bug reports are welcome and can be filed as github issues.

If you would like to contribute, please directly send pull requests for minor contributions and typos. For major contributions, we suggest to first get in touch with the package maintainers.


Assessing the redundancy of a protein fasta database

Given a protein fasta file, what is the maximal sensitivity that can be expected from a mass spectrometry experiment with 0, 1, ... miscleavages. This should probably also include a filtering step for peptide flyability.


Some literature about estimating detectability:

Liu et al. 2011:

Requirements for in-silico created peptides: missedCleavages = 0:2, length(peptides) >= 6, mass(peptides) < 6000 (Da)

Logistic Regression based on Hydrophobicity, Isoelectric point, length, molecular weight, average hydrophobicity, average isoelectric point

Webb-Robertson et al. 2007:

Requirements for in-silico created peptides: missedCleavages = 0:2, length(peptides) >= 6, mass(peptides) < 6000 (Da)

35 features: length, weidght, # of (non-)polar, # of (un)charged, # of pos./neg. charged residues, hydrophobicity (different models), polarity (different models), bulkiness, AA singlet counts

Sanders et al. 2007

Requirements for in-silico created peptides: length(peptides) >= 6

Features: Length, Charge, Isoelectric Point, Molecular Weight, Hydropathicity, Counts of each AA (20 Features), Percent composition of each AA (20 Features), Percent of polar, psoitive, negative, hydrophobic AA

take-home-message: a model of one species/dataset could not be transfered to another dataset (without dramatically decreasing the performance)

Mallick et al. 2007

~1000 Features.

Some of the most discriminating properties: Total/Average net/positive charge, hydrophobic moment, isoelectric point, Histidine composition

take-home-message: The model of one species is comparable to another if the evolutionary distance is small (e.g. yeast and human) but you can't compare different devices/datasets (e.g. MALDI vs ESI)

Simple Rules

Mass: 500:4500

Length: 5:40

95% of all peptides are of length 5:30:

Average Isoelectric point: seq(0, 1.4)

Hydropathy/Hydrophobicity Kyte, Jack, and Russell F. Doolittle. "A simple method for displaying the hydropathic character of a protein." Journal of molecular biology 157.1 (1982): 105-132.

Selection of optimal heavy peptides for absolute quantitation

See Pavel's idea.

Protein domains

Available through the integration with the EnsmbleDb package. See the Pbase-with-ensembldb vignette.

Mapping a Protein Sequence to a Genome Sequence

See the mapping vignette.

See also this document for additional examples and integration with RNA-seq data.


The package allows to easily interact with AAString and AAStringSet instances, protein databases such as UniProt (and possibly biomaRt in the future) using protein identifiers, protein identification results (mzID or (devel) mzR packages) and possibly also MSnExp and MSnSet instances.