#Pathway pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1

This notebook describes the assembly of 4 single gene expression cassettes into a single pathway. 
Notebooks describing the single gene expression vectors are linked at the end of this document as are notebooks 
describing pYPKa promoter, gene and terminator vectors. Specific primers needed are also listed below.

![pathway with N genes](pw.png "pathway with N genes")

The [pydna](https://pypi.python.org/pypi/pydna/) package is imported in the code cell below. 
There is a [publication](http://www.biomedcentral.com/1471-2105/16/142) describing pydna as well as
[documentation](http://pydna.readthedocs.org/en/latest/) available online. 
Pydna is developed on [Github](https://github.com/BjornFJohansson/pydna).

The assembly performed here is based on content of the [INDATA_pth6.txt](INDATA_pth6.txt) text file.
The assembly log can be viewed [here](log.txt).

In [1]:
import pydna

Initiate the standard primers needed to amplify each cassette.
The first cassette in the pathway is amplified with standard
primers 577 and 778, the last with
775 and 578 and all others with 775 and 778.
Standard primers are listed [here](primers.fasta).

In [2]:
p = { x.id: x for x in pydna.parse("primers.fasta") }

The backbone vector is linearized with [EcoRV](http://rebase.neb.com/rebase/enz/EcoRV.html).

In [3]:
from Bio.Restriction import EcoRV, NotI, PacI

pYPKpw = pydna.read("pYPKpw.gb")

The assembly_fragments variable holds the list of DNA fragments to
be assembled.

In [4]:
assembly_fragments = [ pYPKpw.linearize(EcoRV) ]

The expression cassettes comes from a series of single gene expression vectors 
held in the template_vectors list.

In [5]:
cas_vectors ='''
             
             pYPK0_TEF1_SsXYL1_TDH3.gb
             
             pYPK0_TDH3_SsXYL2_PGI.gb
             
             pYPK0_PGI_ScXKS1_FBA1.gb
             
             pYPK0_FBA1_ScTAL1_PDC1.gb'''.splitlines()

template_vectors = [pydna.read(v.strip()) for v in cas_vectors if v.strip()]

template_vectors

[Dseqrecord(o8024), Dseqrecord(o8580), Dseqrecord(o9223), Dseqrecord(o8383)]

The first cassette in the pathway is amplified with standard primers 577 and 778

In [6]:
assembly_fragments.append( pydna.pcr( p['577'], p['778'],  template_vectors[0] ) )

Cassettes in the middle cassettes are amplified with standard primers 775 and 778.

In [7]:
assembly_fragments.extend( pydna.pcr( p['775'], p['778'], v) for v in template_vectors[1:-1] ) 

The last cassette in the pathway is amplified with standard primers 775 and 578

In [8]:
assembly_fragments.append( pydna.pcr( p['775'], p['578'], template_vectors[-1] ) )

Cassettes and plasmid backbone are joined by homologous recombination in a Saccharomyces cerevisiae ura3 host
which selects for the URA3 gene in pYPKpw.

In [9]:
asm = pydna.Assembly( assembly_fragments, limit=167-47-10)
asm

Assembly:
Sequences........................: [5603] [2524] [2924] [3567] [3009]
Sequences with shared homologies.: [5603] [2524] [3009] [2924] [3567]
Homology limit (bp)..............: 110
Number of overlaps...............: 5
Nodes in graph(incl. 5' & 3')....: 7
Only terminal overlaps...........: No
Circular products................: [14800]
Linear products..................: [15838] [15536] [15468] [15044] [14941] [13650] [13153] [12939] [12703] [11267] [10751] [10174] [9582] [8368] [7986] [7794] [7241] [5908] [5453] [4712] [1038] [736] [668] [244] [141]

Normally, only one circular product should be formed since the 
homology limit is quite large (see cell above). More than one 
circular products might indicate an incorrect strategy. 
The largest recombination product is chosen as candidate for 
the pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1 pathway.

In [10]:
candidate = asm.circular_products[0]

This assembly figure shows how the fragments came together.

In [11]:
candidate.figure()

 -|pYPKpw|124
|         \/
|         /\
|         124|2524bp_PCR_prod|711
|                             \/
|                             /\
|                             711|2924bp_PCR_prod|1013
|                                                 \/
|                                                 /\
|                                                 1013|3567bp_PCR_prod|643
|                                                                      \/
|                                                                      /\
|                                                                      643|3009bp_PCR_prod|242
|                                                                                          \/
|                                                                                          /\
|                                                                                          242-
|                                                                                             |
 

The final pathway is synchronized to the backbone vector. This means that
the plasmid origin is shifted so that it matches the original.

In [12]:
pw = candidate.synced(pYPKpw)

Calculate cseguid checksum for the resulting plasmid for future reference.
This is a seguid checksum that uniquely describes a circular double stranded 
sequence.

In [13]:
pw.cseguid()

3RjS0AfdAqVibyeJfuLPrPqWkl4

The file is given a name based on the sequence of expressed genes.

In [14]:
pw.locus = "pw"
pw.definition = "pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1"

Stamp sequence with cseguid checksum. This can be used to verify the 
integrity of the sequence file.

In [15]:
pw.stamp()

cSEGUID_3RjS0AfdAqVibyeJfuLPrPqWkl4_2015-07-22T17:28:41.092493

Write sequence to a local file.

In [16]:
pw.write("pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1.gb")

###[pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1](pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1.gb)

The pathway can be extended by digestion with either NotI or PacI or both provided that the enzymes cut once in the final pathway sequence.

In [17]:
print("NotI cuts {} time(s) and PacI cuts {} time(s) in the final pathway.".format(len(pw.cut(NotI)), len(pw.cut(PacI))))

NotI cuts 1 time(s) and PacI cuts 1 time(s) in the final pathway.


##DOWNLOAD [pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1](pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1.gb)

In [18]:
import pydna

reloaded = pydna.read("pYPK0_SsXYL1_SsXYL2_ScXKS1_ScTAL1.gb")

reloaded.verify_stamp()

cSEGUID_3RjS0AfdAqVibyeJfuLPrPqWkl4

### New Primers needed for assembly.

This list contains all needed primers that are not in the standard primer [list](primers.fasta) above.

FBA1fw	ttaaatATAACAATACTGACAGTACTAAA

FBA1rv	taattaaTTTGAATATGTATTACTTGGT

PDC1fw	ttaaatAGGGTAGCCTCCCCAT

PDC1rv	taattaaTTTGATTGATTTGACTGT

PGIfw	ttaaatAATTCAGTTTTCTGACTGA

PGIrv	taattaaTTTTAGGCTGGTATCTTG

TDH3fw	ttaaatATAAAAAACACGCTTTTTC

TDH3rv	taattaaTTTGTTTGTTTATGTGTGTT

TEF1fw	ttaaatACAATGCATACTTTGTAC

TEF1rv	taattaaTTTGTAATTAAAACTTAGATTA

ScTAL1fw	tgcccactttctcactagtgacctgcagccgacAAATGTCTGAACCAGCTC

ScTAL1rv	AAatcctgatgcgtttgtctgcacagatggCACTTAAGCGGTAACTTTCTT

ScXKS1fw	tgcccactttctcactagtgacctgcagccgacAAATGTTGTGTTCAGTAATTC

ScXKS1rv	AAatcctgatgcgtttgtctgcacagatggCACTTAGATGAGAGTCTTTTCC

SsXYL2fw	tgcccactttctcactagtgacctgcagccgacAAATGACTGCTAACCCTTCCTT

SsXYL2rv	AAatcctgatgcgtttgtctgcacagatggCACTTACTCAGGGCCGTCA

SsXYL1fw	tgcccactttctcactagtgacctgcagccgacAAATGCCTTCTATTAAGTTGAAC

SsXYL1rv	AAatcctgatgcgtttgtctgcacagatggCACTTAGACGAAGATAGGAATCT


### New single gene expression vectors (pYPK0_prom_gene_term) needed for assembly.

Hyperlinks to notebook files describing the singlke gene expression plasmids needed for the assembly.

[pYPK0_TEF1_SsXYL1_TDH3](pYPK0_TEF1_SsXYL1_TDH3.ipynb)  
[pYPK0_TDH3_SsXYL2_PGI](pYPK0_TDH3_SsXYL2_PGI.ipynb)  
[pYPK0_PGI_ScXKS1_FBA1](pYPK0_PGI_ScXKS1_FBA1.ipynb)  
[pYPK0_FBA1_ScTAL1_PDC1](pYPK0_FBA1_ScTAL1_PDC1.ipynb)  


### New pYPKa vectors needed for assembly of the single gene expression vectors above.

Hyperlinks to notebook files describing the pYPKa plasmids needed for the assembly of the single gene clones listed above.

[pYPKa_ZE_TEF1](pYPKa_ZE_TEF1.ipynb)  
[pYPKa_ZE_TDH3](pYPKa_ZE_TDH3.ipynb)  
[pYPKa_ZE_PGI](pYPKa_ZE_PGI.ipynb)  
[pYPKa_ZE_FBA1](pYPKa_ZE_FBA1.ipynb)  
[pYPKa_ZE_PDC1](pYPKa_ZE_PDC1.ipynb)