## Plan for assembly of the Dominant Del Assay plasmid pPS1

In [1]:
from IPython.display import Image
Image(url='http://cancerres.aacrjournals.org/content/66/7/3480/F1.medium.gif')

### The del assay

The image above depicts the principle of the original del assay [Kirpnick-Sobol et al. 2006](https://www.ncbi.nlm.nih.gov/pubmed/16585171).

The RS112 yeast strain contains a plasmid (pRS6) carrying the LEU2 gene and an internal fragment of the yeast HIS3 gene integrated into the HIS3 locus. This results in two partial inactive copies of the his3 gene, one with a terminal deletion at the 3'-end, and the other with a terminal deletion at the 5'-end. There are 410 bp of homology between the two (shown as striped region).

### The pPS1 plasmid

This cassette consists of two dominant markers HphMX4 and the kanamycin resistance gene from the E. coli transposon TN903 "kan". The HphMX4 marker is the Hygromycin B resistance gene from an E. coli [plasmid](http://www.ncbi.nlm.nih.gov/pubmed/6319235) under control of the Ashbya gossypii TEF1 promoter and terminator.

The idea is to split the HphMX4 marker in two pieces in a way that produces a shared homology between the fragments. The G418 resistance gene kan is be placed between the HphMX4 fragments. The kan gene will be controlled by the promoter and terminator from the Kluyveromyces lactis TEF1 homolog.

The whole construct is made by in-vivo gap repair in one reaction. 

The circular construct is made by in vivo gap-repair in one reaction as a recombination between seven linear DNA fragments.
    
    1   URA3_2micron
    2               HphMX4(5'part)
    3                             KlTEF1p
    4                                    kan
    5                                       KlTEF1t
    6                                              YIplac128_smaI    
    7                                                            HphMX4(3'part)
    
1. The URA3_2micron fragment contain a URA3 marker and the 2 micron sequence for plasmid replication
2. The Ashbya gossypii TEF1 promoter with a little more than half the Hygromycin B resistance orf
3. TEF1 promoter from Kluyveromyces lactis
4. The kan resistance orf
5. TEF1 terminator from Kluyveromyces lactis
6. The YIplac128_smaI is a linearized vector containing E. coli replicative sequences and a LEU2 marker
7. The second half the Hygromycin B resistance orf and The Ashbya gossypii TEF1 terminator 

### Material

|DNA      | Source  -80      |
|---------|------------------|
|pAG32    | box 3	pos 45   |
|pSU0     |                  |
|pUG6     | box 3	pos 55   |
|YIplac128| box 1   pos 81   |

In [2]:
# This notebook require pydna version 2.0.2 or later
import pydna
print( pydna.__version__ )
del pydna
from pydna.all import *

2.0.2


### 1. URA3_2micron

This fragment is PCR amplified from the pSU0 vector described by [Iizasa and Nagano 2006](https://www.ncbi.nlm.nih.gov/pubmed/16454044). This plasmid is available from Genbank under the accession number AB215109.

In [3]:
gb = Genbank("bjornjobb@gmail.com")
pSU0 = gb.nucleotide("AB215109.1")

In [4]:
URA3_2micron = pSU0[1041:3620]

In [5]:
URA3_2micron

Dseqrecord(-2579)

### 2. HphMX4(5'part) and 7. HphMX4(3'part)

The plasmid pAG32 contains the HphMX4 marker gene. It is available from [EUROSCARF](http://www.euroscarf.de/plasmid_details.php?accno=P30106). It was constructed by [Goldstein & McCusker](http://www.ncbi.nlm.nih.gov/pubmed/10514571).

The sequence is not available from Genbank, but the EUROSCARF website provides it. Unfortunately, the LOCUS line is malformed in this record (genbank format). 

For this reason we deposited a corrected copy of the sequence [here](https://gist.github.com/BjornFJohansson/c5424b7ebbf553c52053). The size of the plasmid is 4160 bp.

In [6]:
text  = download_text("https://gist.githubusercontent.com/BjornFJohansson/c5424b7ebbf553c52053/raw/64318ead495bc7ade8bb598ab192e76a3569a724/pAG32.gb")
pAG32 = read(text)

In [7]:
pAG32

Dseqrecord(o4160)

In [8]:
pAG32.list_features()

+----------+-----------+-------+------+--------+--------------+--------------+------+
| Feature# | Direction | Start | End  | Length | id           | type         | orf? |
+----------+-----------+-------+------+--------+--------------+--------------+------+
| 0        |    -->    |   90  | 458  |    368 | <unknown id> | misc_feature |  no  |
| 1        |    -->    |  458  | 1487 |   1029 | <unknown id> | CDS          | yes  |
| 2        |    -->    |  1487 | 1727 |    240 | <unknown id> | misc_feature |  no  |
| 3        |    <--    |  2832 | 3693 |    861 | <unknown id> | CDS          | yes  |
+----------+-----------+-------+------+--------+--------------+--------------+------+

We can inspect the features in the table above to conclude that the HphMX4 cassete starts at 90 in feature 0 and ends at 1727 in feature 2. We pick out this slice ofthe sequence below to the variable "hyg_cassette".

In [9]:
hyg_cassette = pAG32[90:1727]

In [10]:
hyg_cassette

Dseqrecord(-1637)

In [11]:
hyg_cassette.write("hyg_cassette.gb")

The HphMX4 cassette is 1637 bp. We will split the HphMX4 in two parts in such a way that there is an overlap or shared homology in the middle. The overlap was set to 200 bp to reflect the overlap length in the original del cassette.

In [12]:
middle = int(len(hyg_cassette)/2)
overlap = 200

We split the HphMX4 in two parts:

In [13]:
hphMX4_5_part = hyg_cassette[:middle+overlap]
hphMX4_3_part = hyg_cassette[(middle-overlap):]

The last 400 bp of the first part are equal to the first 400 bp of the second part. 

In [14]:
eq( hphMX4_5_part[-400:], hphMX4_3_part[:400] )

True

The overlap sequence can be seen below.

In [15]:
print( str(hphMX4_5_part[-400:].seq) )

TCAGCGAGAGCCTGACCTATTGCATCTCCCGCCGTGCACAGGGTGTCACGTTGCAAGACCTGCCTGAAACCGAACTGCCCGCTGTTCTGCAGCCGGTCGCGGAGGCCATGGATGCGATCGCTGCGGCCGATCTTAGCCAGACGAGCGGGTTCGGCCCATTCGGACCGCAAGGAATCGGTCAATACACTACATGGCGTGATTTCATATGCGCGATTGCTGATCCCCATGTGTATCACTGGCAAACTGTGATGGACGACACCGTCAGTGCGTCCGTCGCGCAGGCTCTCGATGAGCTGATGCTTTGGGCCGAGGACTGCCCCGAAGTCCGGCACCTCGTGCACGCGGATTTCGGCTCCAACAATGTCCTGACGGACAATGGCCGCATAACAGCGGTCATTGA


### 3. KlTEF1p and 5. KlTEF1t

The Kluyveromyces lactis TEF1 promoter and terminator to use for the kan gene. This promoter and terminator has not been proven to function in S. cerevisiae.

K. lactis sequences can be found at the [Yeast Gene Order Browser](http://ygob.ucd.ie/)

The Kl TEF1 promoter has the following [sequence](http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B09020g&org=klac&nbr=KLLA0B08998g&dir=inverted)

The Kl TEF1 promoter has the following [sequence](http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B08998g&org=klac&nbr=KLLA0B08976g&dir=inverted)

In [16]:
promoter_link ="http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B09020g&org=klac&nbr=KLLA0B08998g&dir=inverted" 

In [17]:
html = download_text(promoter_link)

The links above goes to html documents that contain the sequences. We use the [BeautifulSoup library](https://www.crummy.com/software/BeautifulSoup/) to extract the sequence.

In [18]:
from bs4 import BeautifulSoup

In [19]:
KlTEF1p = read( ''.join( BeautifulSoup( html, "lxml").findAll( text = True ) ) )

In [20]:
KlTEF1p

Dseqrecord(-1421)

The K. lactis TEF1prom contains the intergenic sequence between the KlTEF1 gene and the upstream gene. About 400 bp should be sufficient for the promoter to give efficient expression.

In [21]:
KlTEF1p = KlTEF1p[-400:]

We establish the terminator in the same manner.

In [22]:
terminator_link = "http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B08998g&org=klac&nbr=KLLA0B08976g&dir=inverted"

In [23]:
html = download_text(terminator_link)

In [24]:
KlTEF1t = read( ''.join( BeautifulSoup( html, "lxml").findAll( text = True ) ) )

In [25]:
KlTEF1t

Dseqrecord(-457)

Likewise, 400bp should be enough for the terminator

In [26]:
KlTEF1t = KlTEF1t[:400]

### 4.  kan

The kan gene was amplififed from the pUG6 plasmid. It was constructed by [Güldener et al. 1996](http://nar.oxfordjournals.org/content/24/13/2519.full).

The sequence is available from [Genbank](http://www.ncbi.nlm.nih.gov/nuccore/AF298793.1). The plasmid itself can be obtained from [EUROSCARF](http://www.euroscarf.de/plasmid_details.php?accno=P30114).

We will download the sequence from Genbank.

In [27]:
pUG6 = gb.nucleotide("AF298793")

The size should be 4009bp.

In [28]:
len(pUG6)

4009

In [29]:
pUG6

We inspect features to obtain the coding sequence of the kan gene.

In [30]:
pUG6.list_features()

+----------+-----------+-------+------+--------+--------------+--------------+------+
| Feature# | Direction | Start | End  | Length | id           | type         | orf? |
+----------+-----------+-------+------+--------+--------------+--------------+------+
| 0        |    -->    |   0   | 4009 |   4009 | <unknown id> | source       |  no  |
| 1        |    -->    |   52  |  86  |     34 | <unknown id> | misc_feature |  no  |
| 2        |    -->    |   86  | 484  |    398 | <unknown id> | regulatory   |  no  |
| 3        |    -->    |  484  | 1294 |    810 | <unknown id> | gene         | yes  |
| 4        |    -->    |  484  | 1294 |    810 | <unknown id> | CDS          | yes  |
| 5        |    -->    |  1294 | 1559 |    265 | <unknown id> | regulatory   |  no  |
| 6        |    -->    |  1559 | 1593 |     34 | <unknown id> | misc_feature |  no  |
| 7        |    <--    |  2681 | 3542 |    861 | <unknown id> | gene         | yes  |
| 8        |    <--    |  2681 | 3542 |    861 | <unkn

The feature number 4 is the coding sequence for the kan gene:

In [31]:
kan = pUG6.extract_feature(4)

### 6. YIplac128_smaI

The final DNA fragment is the YIplac128 vector. This vector was constructed by [Gietz and Sugino](https://www.ncbi.nlm.nih.gov/pubmed/3073106). The sequence can be found in Genbank [here](https://www.ncbi.nlm.nih.gov/nuccore/X75463.1). The YIplac 128 will be linearized with [SmaI](http://rebase.neb.com/rebase/enz/SmaI.html) which leaves a blunt cut.

The reason for including this fragment is twofold. We can in theory rescue the construct to E. coli since this plasmid has sequences that allow selection and replication in E. coli. It also serves to extend the region where a double stranded DNA break can be expected to result in recombinantion.

In [32]:
YIplac128 = gb.nucleotide("X75463").looped() # The sequence in Genbank is wrongly marked as linear 
from Bio.Restriction import SmaI
YIplac128_smaI = YIplac128.linearize(SmaI)
YIplac128_smaI

Dseqrecord(-4302)

In [33]:
YIplac128_smaI.name = "YIplac128"

We have now established all seven necessary linear DNA fragments for the assembly.

In [34]:
from Bio.Restriction import XhoI, SpeI

Two restricion sites, XhoI and SpeI are added to the assembly to facilitate analysis of the final construct. The assembly_fragment function designs the necessary primers for assembly. The YIplac128_smaI sequence is repeated since we want a circular assembly.

In [35]:
fragments =  assembly_fragments((   YIplac128_smaI,
                                                       primer_design( hphMX4_3_part ),
                                                       Dseqrecord( XhoI.site ),
                                                       primer_design( URA3_2micron ),
                                                       primer_design( hphMX4_5_part ),
                                                       Dseqrecord( SpeI.site ),
                                                       primer_design( KlTEF1p ), 
                                                       primer_design( kan ), 
                                                       primer_design( KlTEF1t ),
                                    YIplac128_smaI ))

In [36]:
fragments

[Dseqrecord(-4302),
 Amplicon(1075),
 Amplicon(2618),
 Amplicon(1057),
 Amplicon(439),
 Amplicon(846),
 Amplicon(453),
 Dseqrecord(-4302)]

We set the max primer size to 40 bp for the primers guiding the assembly of the internal fragments. This is done to keep cost down. We leave out the first and last fragment as this is vector sequence. 

In [37]:
for i, prd in enumerate(fragments[2:-2]):
    prd.forward_primer = prd.forward_primer[-40:]
    prd.reverse_primer = prd.reverse_primer[-40:]

The primers that guide recombination with the YIp128 vector are allowed to be longer (50 bp) as these carry the entire homology that allow recombination.

In [38]:
for i, prd in enumerate( [fragments[2],fragments[-2]] ):
    prd.forward_primer = prd.forward_primer[-50:]
    prd.reverse_primer = prd.reverse_primer[-50:]

Since we changed the primers, the pcr products should be resimulated. We can simply do this by looping of the pcr products using the pcr function with the old Amplicon object as argument.

We can now give proper names to the pcr products and primers.

In [39]:
names = ''' Hph-term      dda1_2nd_f         dda2_2nd_r
            URA3_2my      dda3_URA3_2my_f    dda4_URA3_2my_r
            prom-Hph      dda5_1st_f         dda6_1st_r
            KlTEF1prom    dda7_Kl_pr_f       dda8_Kl_pr_r
            kan_orf       dda9_kan_f         dda10_kan_r
            KlTEF1term    dda11_Kl_tr_f      dda12_Kl_tr_r '''

In [40]:
for f,n in zip(fragments[1:-1], names.splitlines()):
    f.name, f.forward_primer.id, f.reverse_primer.id,  = n.split()

We use the pydna Assembly functionality to simulate the in-vivo homologou recombination.

In [41]:
asm = Assembly( fragments[1:], limit = 26)

In [42]:
asm

Assembly:
Sequences........................: [1075] [2618] [1057] [439] [846] [453] [4302]
Sequences with shared homologies.: [1075] [2618] [1057] [4302] [439] [846] [453]
Homology limit (bp)..............: 26
Number of overlaps...............: 8
Nodes in graph(incl. 5' & 3')....: 10
Only terminal overlaps...........: No
Circular products................: [10540] [6318] [4222]
Linear products..................: [10792] [10620] [10576] [10576] [10576] [10576] [10576] [10575] [10575] [10173] [10158] [9982] [9766] [9564] [9555] [9536] [9363] [9348] [9152] [8945] [8342] [7994] [7924] [7393] [7375] [7164] [6973] [6954] [6909] [6840] [6771] [6757] [6570] [6492] [6354] [6354] [6354] [6353] [6353] [6308] [5951] [5936] [5933] [5891] [5760] [5682] [5544] [5530] [5342] [5297] [5279] [5269] [5141] [5126] [5081] [4852] [4723] [4720] [4678] [4258] [4258] [4042] [3657] [3639] [2687] [2270] [2086] [1676] [1669] [1666] [1460] [1263] [1249] [859] [456] [36] [36] [36] [36] [36] [35] [35]

In [43]:
candidate = asm.circular_products[0]
candidate.figure()

 -|Hph-term|36
|           \/
|           /\
|           36|URA3_2my|36
|                       \/
|                       /\
|                       36|prom-Hph|36
|                                   \/
|                                   /\
|                                   36|KlTEF1prom|36
|                                                 \/
|                                                 /\
|                                                 36|kan_orf|36
|                                                            \/
|                                                            /\
|                                                            36|KlTEF1term|35
|                                                                          \/
|                                                                          /\
|                                                                          35|YIplac128|35
|                                                                                

In [44]:
pPS1 = candidate

In [45]:
pPS1.cseguid()

vXelWQ46lP0x8856rBYXpzFaSTk

In [46]:
for prd in fragments[1:-1]:
    print(prd.forward_primer.format("tab"))
    print(prd.reverse_primer.format("tab"))

dda1_2nd_f	CTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCTCAGCGAGAGCCTG

dda2_2nd_r	AGCGGTATTCGCAATCTCGAGTCGACACTGGATGGC

dda3_URA3_2my_f	GCCATCCAGTGTCGACTCGAGATTGCGAATACCGCT

dda4_URA3_2my_r	ACCCGGCGGGGATAATAACTGATATAATTAAATTGAAGCT

dda5_1st_f	TAATTATATCAGTTATTATCCCCGCCGGGTC

dda6_1st_r	ATTGACCCAGTGTTACTAGTTCAATGACCGCTGTTATGCG

dda7_Kl_pr_f	AACAGCGGTCATTGAACTAGTAACACTGGGTCAATCATAG

dda8_Kl_pr_r	AGTCTTTTCCTTACCCATTTTTAATGTTACTTCTCTTGCA

dda9_kan_f	AGAGAAGTAACATTAAAAATGGGTAAGGAAAAGACTC

dda10_kan_r	AGTAGTATCAAGTTAAACTTAGAAAAACTCATCGAGC

dda11_Kl_tr_f	CTCGATGAGTTTTTCTAAGTTTAACTTGATACTACTAGATTTTTT

dda12_Kl_tr_r	CCAGTGAATTCGAGCTCGGTACCCTTAGTATTAGTAAATTTGTTGACAAT



In [47]:
pPS1.name = "pPS1"
pPS1.description=""

In [48]:
pPS1.stamp()

cSEGUID_vXelWQ46lP0x8856rBYXpzFaSTk

In [49]:
pPS1.write("pPS1.gb")

## PCR conditions

In [50]:
for prd in fragments[1:-1]:
    print("product name:", prd.name)
    print("template:", prd.template.name)
    print(prd.program())
    print()
    print("----------------------------------------------------------------------")
    print()

product name: Hph-term
template: pAG32

Taq (rate 30 nt/s) 35 cycles             |1075bp
95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|Primer1C 1.0µM
|         |      \ 58.3°C/ 0min32s| 5min |Primer2C 1.0µM
|         |       \_____/         |      |GC 54%
|         |         30s           |      |4-12°C

----------------------------------------------------------------------

product name: URA3_2my
template: AB215109

Taq (rate 30 nt/s) 35 cycles             |2618bp
95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|Primer1C 1.0µM
|         |      \ 53.3°C/ 1min19s| 5min |Primer2C 1.0µM
|         |       \_____/         |      |GC 39%
|         |         30s           |      |4-12°C

----------------------------------------------------------------------

pro