# pYPK0_TDH3_SsXYL2_PGI

This notebook describes the assembly of the Saccaromyces cerevisiae 
single gene expression vector pYPK0_TDH3_SsXYL2_PGI.

It is made by _in-vivo_ homologous recombination between three PCR products;
a promoter generated from a pYPK_Z vector, a gene from a pYPKa_A vector and 
a terminator from a pYPKa_E vector. The three PCR products are joined to
a linearized pYPKpw backbone vector that has the URA3 marker and a S. crevisiae 2 micron ori. 

The four linear DNA fragments are joined by homologous recombination in a Saccharomyces ura3 mutant.

![pYPK0_promoter_gene_terminator](tp_g_tp.png "pYPK0_promoter_gene_terminator")

The [pydna](https://pypi.python.org/pypi/pydna/) package is imported in the code cell below. 
There is a [publication](http://www.biomedcentral.com/1471-2105/16/142) describing pydna as well as
[documentation](http://pydna.readthedocs.org/en/latest/) available online. 
Pydna is developed on [Github](https://github.com/BjornFJohansson/pydna).

In [1]:
import pydna

Initiate the standard primers into a dictionary variable. The primers are read from [this](primers.fasta) file.

In [2]:
p = { x.id: x for x in pydna.parse("primers.fasta") }

The backbone vector is read from this file: [pYPKpw](pYPKpw.gb)

In [3]:
pYPKpw = pydna.read("pYPKpw.gb")

The backbone vector is linearized by digestion with [EcoRV](http://rebase.neb.com/rebase/enz/EcoRV.html)

In [4]:
from Bio.Restriction import EcoRV

pYPK_EcoRV = pYPKpw.linearize(EcoRV)

The pYPKa E. coli plasmids from which the [promoter](pYPKa_Z_TDH3.gb) and [terminator](pYPKa_E_PGI.gb) are PCR amplified 
are read into two variables below.

In [5]:
promoter_template   = pydna.read("pYPKa_Z_TDH3.gb")
gene_template       = pydna.read("SsXYL2.gb")
terminator_template = pydna.read("pYPKa_E_PGI.gb")

The construction of the two vector above are described in the [pYPKa_ZE_TDH3](pYPKa_ZE_TDH3.ipynb) notebooks.

The promoter was amplified with from a [pYPKa_Z_TDH3](pYPKa_Z_TDH3.gb).

In [6]:
prom = pydna.pcr( p['577'], p['567'], promoter_template)

Primer tails are needed for the recombination with the gene. The tails are designed to 
provide 33 bp of homology to the promoter and terminator PCR products.

In [7]:
fp_tail = "tgcccactttctcactagtgacctgcagccgacAA"
rp_tail = "AAatcctgatgcgtttgtctgcacagatggCAC"

Primers with tails above are designed.

In [8]:
fp, rp = pydna.cloning_primers(gene_template, fp_tail=fp_tail, rp_tail=rp_tail)

Primers are given the names below. These primers are included in the primer list in the end of the [pathway notebook](pw.ipynb) file.

In [9]:
fp.id = "SsXYL2fw"
rp.id = "SsXYL2rv"

Primers have the following sequences:

In [10]:
print(fp.format("tab"))
print(rp.format("tab"))

SsXYL2fw	tgcccactttctcactagtgacctgcagccgacAAATGACTGCTAACCCTTCCTT

SsXYL2rv	AAatcctgatgcgtttgtctgcacagatggCACTTACTCAGGGCCGTCA



The gene was amplifed with the primers.

In [11]:
gene = pydna.pcr( fp, rp, gene_template)

The terminator was amplified from  [pYPKa_E_PGI](pYPKa_E_PGI.gb).

In [12]:
term = pydna.pcr( p['568'], p['578'], terminator_template)

The four linear DNA fragments are mixed and transformed
to a Saccharomyces cerevisiae ura3 mutant.

The fragments will be assembled by in-vivo homologous recombination:

In [13]:
asm = pydna.Assembly( (pYPK_EcoRV, prom, gene, term), limit=31 )

asm

Assembly:
Sequences........................: [5603] [929] [1160] [1339]
Sequences with shared homologies.: [5603] [929] [1339] [1160]
Homology limit (bp)..............: 31
Number of overlaps...............: 4
Nodes in graph(incl. 5' & 3')....: 6
Only terminal overlaps...........: No
Circular products................: [8580]
Linear products..................: [8824] [8721] [8613] [8613] [7825] [7518] [7486] [6698] [6391] [3362] [2466] [2056] [244] [141] [33] [33]

The representation of the asm object above should normally indicate one circcular product only.  
More than one circular products might indicate an incorrect assembly strategy or might represent
by-products that might arise in the assembly process.  
The largest recombination product is chosen as candidate for the pYPK0_TDH3_SsXYL2_PGI vector.

In [14]:
candidate = asm.circular_products[0]

candidate.figure()

 -|pYPKpw|124
|         \/
|         /\
|         124|929bp_PCR_prod|33
|                            \/
|                            /\
|                            33|1160bp_PCR_prod|33
|                                               \/
|                                               /\
|                                               33|1339bp_PCR_prod|242
|                                                                  \/
|                                                                  /\
|                                                                  242-
|                                                                     |
 ---------------------------------------------------------------------

The candidate vector is synchronized to the backbone vector. This means that
the plasmid origin is shifted so that it matches the original.

In [15]:
result = candidate.synced(pYPKpw)

###Diagnostic PCR confirmation

The structure of the final vector is confirmed by two
separate PCR reactions, one for the promoter and gene and
one for the gene and terminator.

PCR using standard primers 577 and 467 to amplify promoter and gene.

In [16]:
product = pydna.pcr( p['577'], p['467'], result)

A correct clone should give this size:

In [17]:
print(len(product))

2060


The promoter is missing from the assembly:

In [18]:
print(len(product) - len(prom))

1131


The gene is missing from the assembly:

In [19]:
print(len(product) - len(gene))

900


PCR using standard primers 468 and 578 to amplify gene and terminator.

In [20]:
product2 = pydna.pcr( p['468'], p['578'], result)

A correct clone should give this size:

In [21]:
print(len(product2))

2483


The gene is missing from the assembly:

In [22]:
print(len(product2) - len(gene))

1323


The terminator is missing from the assembly:

In [23]:
print(len(product2) - len(term))

1144


Calculate cseguid checksum for the resulting plasmid for future reference.
This is a seguid checksum that uniquely describes a circular double stranded 
sequence.

In [24]:
result.cseguid()

UdTwzMX6S8kBTgDnzL1y_glnGCM

The file is named based on the promoter, gene and terminator.

In [25]:
result.locus = "pYPK0_tp_g_tp"
result.definition = "pYPK0_TDH3_SsXYL2_PGI"

Stamp sequence with cseguid checksum. This can be used to verify the 
integrity of the sequence file.

In [26]:
result.stamp()

cSEGUID_UdTwzMX6S8kBTgDnzL1y_glnGCM_2015-07-22T17:28:35.513523

Write sequence to a local file.

In [27]:
result.write("pYPK0_TDH3_SsXYL2_PGI.gb")

###[pYPK0_TDH3_SsXYL2_PGI](pYPK0_TDH3_SsXYL2_PGI.gb)

# Download [pYPK0_TDH3_SsXYL2_PGI](pYPK0_TDH3_SsXYL2_PGI.gb)

In [28]:
import pydna
reloaded = pydna.read("pYPK0_TDH3_SsXYL2_PGI.gb")
reloaded.verify_stamp()


cSEGUID_UdTwzMX6S8kBTgDnzL1y_glnGCM