# Cookbook for pydna

Björn Johansson
CBMA
University of Minho
Braga
Portugal

![logo](logo.png "logo")

## What is pydna?

Pydna is a python package that provides functions and data types to deal with double stranded DNA. It depends on Biopython (a python bioinformatics package), networkx (a graph theory package) and numpy (a mathematics package).

## What does Python dna provide?

Python dna provide classes and functions for molecular biology using python. Notably, PCR, cut and paste cloning (sub-cloning) and homologous recombination between linear DNA fragments are supported. Most functionality is implemented as methods for the double stranded DNA sequence record classes “Dseq” and "Dseqrecord", which are a subclasses of the Biopython Seq and SeqRecord classes, respectively.

Pydna was designed to semantically imitate how sub-cloning experiments are typically documented in scientific literature. One use case for pydna is to create executable documentation for a sub-cloning experiment. The pydna code unambiguously describe the experiment, and can be executed to yield the sequence of the of the resulting DNA molecule(s) and all intermediary steps.  Pydna code describing a sub cloning is reasonably compact and also meant to be easily readable.

Look [here](https://github.com/BjornFJohansson/pydna-examples) for examples.

In [1]:
from pydna.genbank import Genbank
gb = Genbank("myemail@mydomain.com")
YEp24PGK = gb.nucleotide("KC562906")
YEp24PGK

In [2]:
cyc1 = gb.nucleotide('NM_001181706.1')
cyc1

In [3]:
from pydna.design import primer_design
cyc1_prd = primer_design(cyc1[:-3])

In [4]:
gfp = gb.nucleotide('AF298787 REGION: 2271..2987').reverse_complement()
gfp

Dseqrecord(-717)

In [5]:
gfp_prd = primer_design(gfp)

In [6]:
from Bio.Restriction import BglII
yep_bgl = YEp24PGK.linearize(BglII)
yep_bgl

Dseqrecord(-9641)

In [7]:
yep_bgl.seq

Dseq(-9641)
GATCTCCC..AAAA    
    AGGG..TTTTCTAG

In [8]:
from pydna.design import assembly_fragments
vec, cyc1_prd, gfp_prd, vec = assembly_fragments( (yep_bgl, cyc1_prd, gfp_prd, yep_bgl) )

In [9]:
cyc1_prd

In [10]:
cyc1_prd.figure()

                                   5ATGACTGAATTCAAGGC...TGAAAAAAGCCTGTGAG3
                                                        ||||||||||||||||| tm 50.8 (dbd) 56.5
                                                       3ACTTTTTTCGGACACTCTACAGATTTCCACTTCTT5
5ATTATCTACTTTTTACAACAAATATAAAACCAAAAATGACTGAATTCAAGGC3
                                    ||||||||||||||||| tm 50.2 (dbd) 55.0
                                   3TACTGACTTAAGTTCCG...ACTTTTTTCGGACACTC5

In [11]:
cyc1_prd.program()


Taq (rate 30 nt/s) 35 cycles             |380bp
95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|Primer1C 1.0µM
|         |      \ 52.3°C/ 0min11s| 5min |Primer2C 1.0µM
|         |       \_____/         |      |GC 40%
|         |         30s           |      |4-12°C

In [12]:
gfp_prd

In [14]:
gfp_prd.figure()

                  5ATGTCTAAAGGTGAAGAATTATT...GTATGGATGAATTGTACAAATAA3
                                             ||||||||||||||||||||||| tm 50.1 (dbd) 56.4
                                            3CATACCTACTTAACATGTTTATTCTAGAGGGTACAGAGATGACCACCACCACGAAGAA5
5TTGAAAAAAGCCTGTGAGATGTCTAAAGGTGAAGAATTATT3
                   ||||||||||||||||||||||| tm 50.9 (dbd) 55.9
                  3TACAGATTTCCACTTCTTAATAA...CATACCTACTTAACATGTTTATT5

In [15]:
gfp_prd.program()


Taq (rate 30 nt/s) 35 cycles             |770bp
95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|Primer1C 1.0µM
|         |      \ 51.9°C/ 0min23s| 5min |Primer2C 1.0µM
|         |       \_____/         |      |GC 36%
|         |         30s           |      |4-12°C

In [16]:
from pydna.assembly import Assembly
asm = Assembly((yep_bgl, cyc1_prd, gfp_prd))

In [17]:
asm

Assembly:
Sequences........................: [9641] [380] [770]
Sequences with shared homologies.: [9641] [380] [770]
Homology limit (bp)..............: 25
Number of overlaps...............: 3
Nodes in graph(incl. 5' & 3')....: 5
Only terminal overlaps...........: No
Circular products................: [10681]
Linear products..................: [10720] [10717] [10716] [10376] [9982] [1114] [39] [36] [35]

In [18]:
cnt = asm.assemble_circular()[0]

In [19]:
cnt

In [23]:
from pydna.all import *
gb = Genbank("myemail@mydomain.com")

p1, p3 = parse('''
>primer1
GCGGATCCTCTAGAATGGTTTGTTCAGTAATTCAG
>primer3
AGATCTGGATCCTTAGATGAGAGTCTTTTCCAG''', ds=False)

XKS1 = gb.nucleotide("Z72979").rc()
PCR_prod = pcr( p1, p3, XKS1 )
from Bio.Restriction import BamHI, BglII
stuffer1, insert, stuffer2 = PCR_prod.cut(BamHI)
YEp24PGK = gb.nucleotide("KC562906")
YEp24PGK_BglII = YEp24PGK.linearize(BglII)
YEp24PGK_XK = ( YEp24PGK_BglII + insert ).looped()
YEp24PGK_XK = YEp24PGK_XK.synced(YEp24PGK)

In [26]:
from pydna.all import *
gb = Genbank("myemail@mydomain.com")

GUP1rec1sens, GUP1rec2AS = parse('''
>GUP1rec1sens
gaattcgatatcaagcttatcgataccgatgtcgctgatcagcatcctgtc
>GUP1rec2AS
gacataactaattacatgactcgaggtcgactcagcattttaggtaaattccg
''', ds=False)

pGREG505 = read("pGREG505.gb")
GUP1_locus = gb.nucleotide("Z72606")
insert = pcr(GUP1rec1sens, GUP1rec2AS, GUP1_locus)
from Bio.Restriction import SalI
lin_vect, his3_stuffer = pGREG505.cut(SalI)
asm = Assembly( (lin_vect, insert) )
pGUP1 = asm.assemble_circular()[0]