##### [Full pydna documentation](https://bjornfjohansson.github.io/pydna/)


For reference, the Dseq class is a subclass of `Bio.Seq.Seq`, whose documentation can be found [here](https://biopython.org/wiki/Seq). The `Seq` object created from `Bio.Seq.Seq` is a single stranded sequence. pydna improves on the `Seq` object created to make double stranded sequences that can be manipulated with modern molecular biology methods (E.g cloning and cutting).


# Dseq Class

The Dseq class can create a double-stranded DNA object using two approaches. For one, the DNA sequence can be passed directly as a single string, representing the sense/`watson`strand. The anti-sense/`crick` strand is autoamtically generated. This can be the sequence provided in .gb files, for instance. **Note that in pydna, ***all*** DNA sequences must be passed from a 5'-3' direction**. The number presented denotes the length of the Dseq object; the "-" denotes linearity. A ciruclar Dseq object (described later) is shown with "o". For example:

In [78]:
from pydna.dseq import Dseq
my_seq = Dseq("aatat")
my_seq

Dseq(-5)
aatat
ttata

Alternatively, two DNA sequences can be passed as strings to represent the `watson` and `crick` strands. `watson` denotes the sense strand, and `crick` denotes the anti-sense strand. Again, both the `watson` and `crick` strands must be passed from a 5'-3' direction. 

In [79]:
my_seq = Dseq("aATAt", "atatt")
my_seq

Dseq(-5)
aATAt
ttata

Of note, the DNA sequence can be passed in both lower case and upper case, and are not restricted to the conventional ATCG nucleotides (E.g ), The class supports the IUPAC ambiguous nucleotide code.

## Dseq Class Parameters

The Dseq object can model various properties of real biological sequences. For a linear DNA sequence, overhangs (`ovhg`) can be added. A 5 prime (5') overhang can be specified, and the 3' overhang is automatically generated using the `watson` and  `crick` strands. A `crick` strand must be provided. For example: 

In [80]:
my_seq = Dseq("aatat", "atatt", ovhg=-2)
my_seq

Dseq(-7)
aatat
  ttata

pydna defines each overhang as the number of nucleotides from the 3' end to the complementary strand's 5' end. The best way to understand how the overhangs work is to visualise the possible scenarios as such:


        dsDNA       overhang

          nnn...    2
        nnnnn...

         nnnn...    1
        nnnnn...

        nnnnn...    0
        nnnnn...

        nnnnn...   -1
         nnnn...

        nnnnn...   -2
          nnn...

Another way to pass the overhangs is to use the `from_full_sequence_and_overhangs` classmethod, which only needs the `watson`/sense strand. This is useful if your file provides only the `watson` strand, or if you want to specify overhangs on both sides of the double stranded DNA.

Both the `watson_ovhg` and `crick_ovhg` can be passed following the same rules as above. Specifically, the `crick_ovhg` argument is identical to the conventional `ovhg` argument. The `watson_ovhg` argument is bascially the `ovhg` argument applied to the reverse complementary sequence. This can be visualised in a basic drawing here:

      (-3)--(-2)--(-1)--(x)--(x)--(x)--(-1)--(-2)
      
    5'( a)--( a)--( a)--(t)--(t)--(a)--( a)--( a)3'
    3'( a)--( a)--( a)--(t)--(t)--(a)--( a)--( a)5'

    5'( a)--( a)--( a)--(t)--(t)--(a)--(  )--(  )3'
    3'(  )--(  )--(  )--(t)--(t)--(a)--( a)--( a)5'

And this is show in the code below, too:


In [81]:
my_seq = Dseq.from_full_sequence_and_overhangs("aaattaaa", crick_ovhg=-3, watson_ovhg=-2)
my_seq

Dseq(-8)
aaatta
   aattt

A list of possible scenarios, applying positive and negative `crick_ovhg` and `watson_ovhg` to a `Dseq` object are visualised in the output of the code below:

In [82]:
for crick_ovhg in [-2, 2]:
    for watson_ovhg in [-3, 3]:
        print("watson_ovhg is " + str(watson_ovhg) + ", crick_ovhg is " + str(crick_ovhg))
        my_seq = Dseq.from_full_sequence_and_overhangs("aaattaaa", crick_ovhg, watson_ovhg)
        print(my_seq.__repr__() + "\n")

watson_ovhg is -3, crick_ovhg is -2
Dseq(-8)
aaatt
  taattt

watson_ovhg is 3, crick_ovhg is -2
Dseq(-8)
aaattaaa
  taa

watson_ovhg is -3, crick_ovhg is 2
Dseq(-8)
  att
tttaattt

watson_ovhg is 3, crick_ovhg is 2
Dseq(-8)
  attaaa
tttaa



If you would like to check the overhangs for a `Dseq` object, it can be done by calling the methods `five_prime_end` and `three_prime_end` to show the 5' and 3' overhangs, respectively. An example of a `Dseq` object, and examples showing what the print-out of the methods looks like are demonstrated here:

In [83]:
my_seq = Dseq("aatat", "atatt", ovhg=-2)
print(my_seq.__repr__())
print(my_seq.five_prime_end())
print(my_seq.three_prime_end())

Dseq(-7)
aatat
  ttata
("5'", 'aa')
("5'", 'at')


To deal with ciruclar sequences (e.g. plasmids), the argument `circular=True` can also be passed. Note that the ends of the DNA fragment must be blunt, to set `circular=True`. In other words, `ovhg` must be `ovhg=0`.

In [84]:
my_seq = Dseq("aatat", circular=True)
my_seq

Dseq(o5)
aatat
ttata

## __getitem__, __repr__, and  __str__ methods

### __getitem__

The `__getitem__` method is modified in pydna to deal with `Dseq` objects. For the unfamililar, `__getitem__` is essentially identical in practice as `[]` and *slices*. In pydna, `__getitem__` is used to return a specific tract of the `Dseq` object, defined by the a start value and a stop value. In other words, `__getitem__` indexes `Dseq`. In pratice, it is recommended to use the `[]` notation for convinience, as shown below. Note that '__getitem__' (and, consequently, `[]`) employs the python counting method, starting from 0.  

In [85]:
my_seq = Dseq("aatataa")
my_seq[2:5]


Dseq(-3)
tat
ata

`__getitem__` also works with overhangs. Overhangs provided using both the `ovhg` parameter or the `from_full_sequence_and_overhangs` class method works with `__getitem__`.

In [86]:
my_seq = Dseq.from_full_sequence_and_overhangs("aatataa", crick_ovhg=0, watson_ovhg=-1)
my_seq[2:7]

Dseq(-5)
tata
atatt

When applying `__getitem__` to circular `Dseq` objects, the method reads the sequence as if it is circular, looping around position 0 of the sequence. The sequence returned is linear. If you would like to make the sequence circular again, please refer to the `looped` method on the Dseq_protocols page.

In [87]:
my_seq = Dseq("aatataa", circular=True)
my_seq[5:2]

Dseq(-4)
aaaa
tttt

### __repr__ and __str__

`__repr__` and `__str__` methods are used in python to show the double stranded sequences in a readable format. In pydna, it is highly recommended to use `__repr__` (i.e. 'the variable') which returns the representation of the `Dseq` object that you have seen so far. Using `__str__`, which is practically equivalent to the `print` function, will not yield the useful representation. A faulty example is show below:

In [88]:
my_seq = Dseq("aatataa")
print(my_seq)

aatataa


And a corrected example, using `__repr__` is shown here too:

In [89]:
my_seq = Dseq("aatataa")
my_seq.__repr__()

Dseq(-7)
aatataa
ttatatt

Which is equivalent to:

In [90]:
my_seq = Dseq("aatataa")
my_seq

Dseq(-7)
aatataa
ttatatt