`BpForms` is a toolkit for unambiguously describing the primary sequence of biopolymers such as DNA, RNA, and proteins, including modified DNA, RNA, and proteins. This tutorial illustrates how to use the `BpForms` Python API. Please see the [documentation](https://docs.karrlab.org/bpforms/) for more information about the `BpForms` notation and more instructions for using the `BpForms` website, JSON REST API, and command line interface.

### 1. Import `BpForms`

In [5]:
import bpforms

### 2. Create an instance of `BpForm`
Use the [`BpForm` notation](https://docs.karrlab.org/bpforms) and the `BpForm.from_str` method to create an instance of `BpForm` to represent a form of a biopolymer.

`BpForms` includes six predefined alphabet and six pre-defined subclasses of `BpForm`:
* Canonical DNA (`bpforms.canonical_dna_alphabet`, `bpforms.CanonicalDnaForm`): four canonical bases
* Canonical RNA (`bpforms.canonical_rna_alphabet`, `bpforms.CanonicalRnaForm`): four canonical bases
* Canonical Protein (`bpforms.canonical_protein_alphabet`, `bpforms.CanonicalProteinForm`): 20 canonical bases
* DNA (`bpforms.dna_alphabet`, `bpforms.DnaForm`): four canonical bases plus the non-canonical bases defined in [DNAmod](https://dnamod.hoffmanlab.org)
* RNA (`bpforms.rna_alphabet`, `bpforms.RnaForm`): four canonical bases plus the non-canonical bases defined in [MODOMICS](http://modomics.genesilico.pl/modifications/)
* Protein (`bpforms.protein_alphabet`, `bpforms.ProteinForm`): 20 canonical bases plus the non-canonical bases defined in [RESID](https://pir.georgetown.edu/resid/)

#### Create a `BpForm` composed of canonical bases
Bases defined in the alphabets can be referenced by their single character codes.

In [6]:
dna_form = bpforms.DnaForm().from_str('ACGT')

#### Create a `BpForm` that includes non-canonical bases 
Some of the non-canonical bases in the alphabets are represented by multiple characters. Their character codes must be delimited by curly brackets.

In [7]:
dna_form = bpforms.DnaForm().from_str('A{m2C}GT')

#### Create a `BpForm` that includes a base that is not defined in the alphabet
Additional bases can be described in square brackets using one or more attributes separated by vertical pipes ("|"):
* `id`
* `name`
* `synonym`
* `identifier`
* `structure`: InChI-encoded string that represents the structure of the base
* `delta-mass`: additional mass in Daltons beyond that described by the `structure` attribute; used to represent uncertainty in the structure of the base.
* `delta-charge`: additional charge beyond that described by the `structure` attribute; used to represent uncertainty in the structure of the base.
* `position`: represents uncertainty in the location of a non-canonical base.
* `comments`

In [8]:
dna_form = bpforms.DnaForm().from_str(
    '''A[
         id: "m2C" 
         | name: "2-O-methylcytosine"
         | synonym: "4-amino-2-methoxypyrimidine"
         | synonym: "o-2-methylcytosine"
         | identifier: "ChEBI" / "CHEBI:70854"
         | structure: InChI=1S/C5H7N3O/c1-9-5-7-3-2-4(6)8-5/h2-3H,1H3,(H2,6,7,8)
         | comments: "Methylation of deoxycytidine"
         | position: 2-3
        ]GT'''.replace('\n', '').replace(' ', ''))

### 3. Get and set bases and slices of bases of the biopolymer
Individual residues and slices of residues can be get and set similar to lists.

In [9]:
dna_form[0]

<bpforms.core.Base at 0x7f2d4644ecc0>

In [10]:
dna_form[1] = bpforms.dna_alphabet.bases.A

In [11]:
dna_form[1:3]

[<bpforms.core.Base at 0x7f2d4644ecc0>, <bpforms.core.Base at 0x7f2d462eee48>]

In [12]:
dna_form[1:3] = bpforms.DnaForm().from_str('TA')

### 4. Calculate the major protonatation state of the form at pH 8.0

In [13]:
dna_form.protonate(8.)
str(dna_form)

'ATAT'

### 5. Calculate physical properties of the form
`BpForms` can calculate the length, formula, molecular weight, and charge of the biopolymer forms.

In [14]:
len(dna_form)

4

In [15]:
str(dna_form.get_formula())

'C40H45N14O28P4'

In [16]:
dna_form.get_mol_wt()

1293.765047992

In [17]:
dna_form.get_charge()

-7

### 6. Determine if the biopolymer is equal to another biopolymer
Use the `is_equal` method to check if two biopolymers are equal.

In [18]:
dna_form_1 = bpforms.DnaForm().from_str('ACGT')
dna_form_2 = bpforms.DnaForm().from_str('ACGT')
dna_form_3 = bpforms.DnaForm().from_str('GCTC')

In [19]:
dna_form_1.is_equal(dna_form_2)

True

In [20]:
dna_form_1.is_equal(dna_form_3)

False