Unambiguous representation of modified DNA, RNA, and proteins
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
bpforms adding identifiers test for protein Feb 17, 2019
docs adding calculation of major tautomerizaton state Feb 15, 2019
examples adding base_monomer, to_fasta to example Feb 14, 2019
tests
.gitignore added folder "paulTmpTests" Feb 1, 2019
.karr_lab_build_utils.yml initial commit of skeleton Jan 31, 2019
LICENSE
MANIFEST.in
README.md adding calculation of major tautomerizaton state Feb 15, 2019
requirements.optional.txt
requirements.txt enabled spaces and parentheses in base characters Feb 10, 2019
setup.cfg initial commit of skeleton Jan 31, 2019
setup.py adding website to package data Feb 15, 2019

README.md

PyPI package Documentation Test results Test coverage Code analysis License Analytics

BpForms: unambiguous representation of modified DNA, RNA, and proteins

BpForms is a set of tools for unambiguously representing the structures of modified forms of biopolymers such as DNA, RNA, and protein.

  • The BpForms notation can unambiguously represent the structure of modified forms of biopolymers. For example, the following represents a modified DNA molecule that contains a deoxyinosine monomer at the fourth position.
    ACG[id: "dI" 
         | structure: InChI=1S
            /C10H12N4O4
            /c15-2-6-5(16)1-7(18-6)14-4-13-8-9(14)11-3-12-10(8)17
            /h3-7,15-16H,1-2H2,(H,11,12,17)
            /t5-,6+,7+
            /m0
            /s1
            ]T
    
  • This concrete representation of modified biopolymers enables the BpForms software tools to calculate the chemical formulae, molecular weights, and charges of biopolymers, as well as to automatically calculate the major protonation and tautomerization state of biopolymers at specific pHs.

BpForms encompasses five tools:

BpForms was motivated by the need to concretely represent the biochemistry of DNA modification, DNA repair, post-transcriptional processing, and post-translational processing in whole-cell computational models. In addition, BpForms are a valuable tool for experimental proteomics. In particular, we developed BpForms because there were no notations, schemas, data models, or file formats for concretely representing modified forms of biopolymers, despite the existence of several databases and ontologies of DNA, RNA, and protein modifications and the ProForma Proteoform Notation.

The BpForms syntax was inspired by the ProForma Proteoform Notation. BpForms improves upon this syntax in several ways:

  • BpForms separates the representation of modified biopolymers from the chemical processes which generate them.
  • BpForms clarifies the representation of multiply modified monomers. This is necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
  • BpForms can be customized to represent any modification and, therefore, is not limited to previously enumerated modifications. This is also necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
  • BpForms supports two additional types of uncertainty in the structures of biopolymers: uncertainty in the position of a modified nucleotide/amino acid within the polymer sequence, and uncertainty in the chemical identity of modified nucleotide/amino acid as deviation from its expected mass or charge.
  • BpForms has a concrete grammar. This enables error checking, as well the calculation of chemical formulae, masses, and charges, which is essential for modeling.

Installation

  1. Install the third-party dependencies listed below. Detailed installation instructions are available in An Introduction to Whole-Cell Modeling.

  2. To use Marvin to calculate major protonation and tautomerization states, set JAVA_HOME to the path to your Java virtual machine (JVM)

    export JAVA_HOME=/usr/lib/jvm/default-java
    
  3. To use Marvin to calculate major protonation and tautomerization states, add Marvin to the Java class path

    export CLASSPATH=$CLASSPATH:/opt/chemaxon/marvinsuite/lib/MarvinBeans.jar
    
  4. Install this package

    • Install the latest release from PyPI. For most environments, the --process-dependency-links option is needed to install some of the dependencies from GitHub.

      pip install --process-dependency-links bpforms[all]
      
    • Install the latest revision from GitHub. For most environments, the --process-dependency-links option is needed to install some of the dependencies from GitHub.

      pip install --process-dependency-links git+git://github.com/KarrLab/bpforms#egg=bpforms[all]
      

Examples, tutorial, and documentation

Please see the documentation. An interactive tutorial is also available in the whole-cell modeling sandbox.

License

The package is released under the MIT license.

Development team

This package was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, USA.

Questions and comments

Please contact the Karr Lab with any questions or comments.