# Cookbook for pydna

Björn Johansson
CBMA
University of Minho
Braga
Portugal

![logo](logo.png "logo")

##What is pydna?

Pydna is a python package that provides functions and data types to deal with double stranded DNA. It depends on Biopython (a python bioinformatics package), networkx (a graph theory package) and numpy (a mathematics package).

##What does Python dna provide?

Python dna provide classes and functions for molecular biology using python. Notably, PCR, cut and paste cloning (sub-cloning) and homologous recombination between linear DNA fragments are supported. Most functionality is implemented as methods for the double stranded DNA sequence record classes “Dseq” and "Dseqrecord", which are a subclasses of the Biopython Seq and SeqRecord classes, respectively.

Pydna was designed to semantically imitate how sub-cloning experiments are typically documented in scientific literature. One use case for pydna is to create executable documentation for a sub-cloning experiment. The pydna code unambiguously describe the experiment, and can be executed to yield the sequence of the of the resulting DNA molecule(s) and all intermediary steps.  Pydna code describing a sub cloning is reasonably compact and also meant to be easily readable.

Typical usage at the command line could look like this:

In [5]:
import pydna
seq = pydna.Dseq("GGATCCAAA","TTTGGATCC", ovhg=0)
seq

Dseq(-9)
GGATCCAAA
CCTAGGTTT

The example above shows an example usage of the Dseq class which is a double stranded version of the Biopython seq class. This is the main pydna data type along with the Dseqrecord class which is a double stranded version of the Biopython SeqRecord class.

The Dseq object was initialized using two strings and a value for the stagger (ovhg) between the DNA strands in the 5' (left) extremity. This is of course not a practical way of creating a Dseq object in most cases, but there are other more practical methods as we will see further on.

The Dseq object comes with a cut method that takes one or more restriction enzymes as arguments. A list is returned with the fragments produced in the digestion:

In [6]:
from Bio.Restriction import BamHI
a,b = seq.cut(BamHI)

In [7]:
a

Dseq(-5)
G
CCTAG

In [8]:
b

Dseq(-8)
GATCCAAA
    GTTT

The fragments a and b formed in the example above can be religated together by the addition operator:

In [9]:
a+b

Dseq(-9)
GGATCCAAA
CCTAGGTTT

In [10]:
b+a

Dseq(-13)
GATCCAAAG
    GTTTCCTAG

In [11]:
b+a+b

Dseq(-17)
GATCCAAAGGATCCAAA
    GTTTCCTAGGTTT

The Dseq objects keep track of the structure of the DNA ends and only allow ligation of compatible fragments:

In [12]:
b+a+a

TypeError: sticky ends not compatible!

Two examples are given in this tutorial (Example 1 and 2). The data files that are referred to in this document can be found in the folder “cookbook_files” that was downloaded together with this file. Alternatively, the examples can be solved on-line using pydna live.

# pydna live

Python 2.7.3 with pydna and Biopython are avaliable for testing interactively online at http://pydna-shell.appspot.com/.

In [5]:
from IPython.display import HTML
HTML('<iframe src=http://pydna-shell.appspot.com width=800 height=350></iframe>')

The Biopython package is not completely supported by pydna live since pydna live runs on the google app engine, which currently does not permit C-extensions. However, all functionality needed for pydna is provided.

All files referred to in this cookbook are provided in the sub directory “cookbook_files”. This means that you can execute the statements given here directly as they are written by copy and paste (leaving out the prompt “>>>”). 

If you perform these examples on your own system, you have to adjust file paths when reading and writing files.

# Example 1: Sub cloning by restriction digestion and ligation

The construction of the vector YEp24PGK_XK is described on page 4250 in the publication below:

In [6]:
HTML('<iframe src=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC93154 width=800 height=300></iframe>')

Briefly, the XKS1 gene from Saccharomyces cerevisiae is amplified by PCR using two primers called primer1 and primer3. The primers add restriction sites for BamHI to the ends of the  XKS1 gene. The gene is digested with BamHI and ligated to the YEp24PGK plasmid that has previously been digested with BglII which cut the plasmid in one location. The two enzymes are compatible so fragments cut with either enzyme can be ligated together. Fig 1 shows an image outlining the strategy.