# Sequence Molecules (DNA, RNA & Protein Sequences)

Sequence molecules are derived by inheritance from `Polynucleotide` and `Polypeptide` classes from `bioseq` module, which are themselves inherited from `chem2.Molecule`. They add a sequence property to molecules that can be queried and operated upon. Note that each molecule shown in this example defines a single polymeric strand. For objects like double-stranded DNA or DNA-RNA complexes, each of the two molecules would have to be represented separately.

## Instantiation

Just like molecules (see Example 01), sequence molecules are instantiated by calling the respective constructor (`DNA()`,`RNA()`, `Protein()`) from `bioseq`, then using `set_sequence()` to set the sequence.

In [9]:
from wc_rules.bioseq import DNA, RNA, Protein
inputstr = 'TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA'
dna1 = DNA(ambiguous=False).set_sequence(inputstr)

## Accessing (Sub)-Sequences 

Sequences are indexed like Python strings, i.e., a sequence L bases long has bases indexed from 0 to L-1. The position L is also valid and refers to the position AFTER the last base. To get a subsequence from a sequence molecule, use `get_sequence()`, which has the input signature

`get_sequence(start=None,end=None,length=None,as_string=False)`

In [10]:
print(dna1.get_sequence())

TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA


To get a subsequence, either provide (`start,end`) or (`start`,`length`).

In [11]:
print(dna1.get_sequence(start=90,end=99))
print(dna1.get_sequence(90,99))

CAATACAGA
CAATACAGA


In [12]:
print(dna1.get_sequence(start=90,length=9))
print(dna1.get_sequence(90,None,9))

CAATACAGA
CAATACAGA


`get_sequence()` outputs a `Bio.Seq.Seq` object by default (`Bio`=Biopython). To get a pure string, use `as_string=True`. The method `get_sequence_length()` returns an `int` value equal to length of the sequence of subsequence.

## Converting Sequences

To convert between DNA, RNA and protein sequences, the `DNA` and `RNA` objects provide the methods `get_dna()`, `get_rna()` and `get_protein()`. These methods use the `option` kwarg to define which sequence is to be converted: either the `coding` sequence read directly from the strand (default), the `complementary` sequence derived by directly converting each base to its complement, or the `reverse_complementary` sequence which is the reverse of the complement. 

Below we show an example of `get_dna()` method applied in three different modes to a `DNA` sequence molecule. 

In [13]:
print('1: '+ dna1.get_sequence(as_string=True))
print('2: '+ dna1.get_dna(option='coding',as_string=True))
print('3: '+ dna1.get_dna(option='complementary',as_string=True))
print('4: '+ dna1.get_dna(option='reverse_complementary',as_string=True))

1: TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA
2: TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA
3: AACAATAGCAATGGCCCTCACTCCGCAGGCGCAGGGAAAGTCCAGTTCGCTGACTTTTTGGAACGTCAACTAAAATTTCGCATATCTTCTGTTATGTCT
4: TCTGTATTGTCTTCTATACGCTTTAAAATCAACTGCAAGGTTTTTCAGTCGCTTGACCTGAAAGGGACGCGGACGCCTCACTCCCGGTAACGATAACAA


Similar to `get_sequence()`, the methods `get_dna()`, `get_rna()` and `get_protein()` can operate on subsequences defined by (`start`,`end`) or (`start`,`length`). For example, let us try to get the reverse-complementary RNA coded in the first 66 bases of `dna1` and instantiate a new `RNA` molecule.

In [14]:
seq = dna1.get_rna(option='reverse_complementary',start=0,length=66,as_string=True)
rna1 = RNA().set_sequence(seq)
print(rna1.get_sequence())

UGCAAGGUUUUUCAGUCGCUUGACCUGAAAGGGACGCGGACGCCUCACUCCCGGUAACGAUAACAA


Now let us try to get the protein sequence coded in the first 66 bases of `dna1` and instantiate a new `Protein` molecule.

In [15]:
seq = dna1.get_protein(option='coding',start=0,length=66,as_string=True)
prot1 = Protein().set_sequence(seq)
print(prot1.get_sequence())

LLSLPGVRRPRPFQVKRLKNLA
