## Plan for assembly of the Dominant Del Assay plasmid pPS1

In [49]:
from IPython.display import Image
Image(url='http://cancerres.aacrjournals.org/content/66/7/3480/F1.medium.gif')

### The del assay

The image above depicts the principle of the original del assay.

http://cancerres.aacrjournals.org/content/66/7/3480.abstract?sid=9297eb2f-00bd-466f-89d8-b38af175d67f

The RS112 yeast strain contains a plasmid carrying the LEU2 gene and an internal fragment of the yeast HIS3 gene integrated into the genome at the HIS3 locus. This resulted in two copies of the his3 gene, one with a terminal deletion at the 3'-end, and the other with a terminal deletion at the 5'-end. There are ~400 bp of homology between the two copies (striped region). B, DNA strand breakage leads to bidirectional degradation until homologous single-stranded regions are exposed. C, annealing of homologous regions. D, reversion to HIS+ phenotype and deletion of plasmid.

### The pPS1 plasmid

This cassette consists of two dominant markers HphMX4 and the kanamycin resistance gene from the E. coli transposon TN903 "kan".

The HphMX4 marker is the Hygromycin B resistance gene from the plasmid [pAG32](http://www.ncbi.nlm.nih.gov/pubmed/6319235) under control of the Ashbya gossypii TEF1 promoter and terminator.

The idea is to split the HphMX4 marker in two pieces so that there is a shared homology, like for the HIS3 gene of the del cassette. The kan gene will be controlled by the promoter and terminator from the Kluyveromyces lactis TEF1 homolog.

The TEF1 promoter-kan-TEF1 terminator fragments are cloned inside the HphMX4 marker in such a way that there is a region of homology on each side by which the TEF1 promoter-kan-TEF1 terminator can be lost and the HphMX4 gene reconstituted. 

The a circular construct is made by in vivo gap-repair in one reaction as a recombination between seven linear DNA fragments.
    
    1   URA3_2micron
    2               HphMX4(5'part)
    3                             KlTEF1p
    4                                    kan
    5                                       KlTEF1t
    6                                              YIplac128_smaI    
    7                                                            HphMX4(3'part)
    
1. The URA3_2micron fragment contain a URA3 marker and the 2 micron sequence for plasmid replication
2. The Ashbya gossypii TEF1 promoter with a little more than half the Hygromycin B resistance orf
3. TEF1 promoter from Kluyveromyces lactis
4. The kan resistance orf
5. TEF1 terminator from Kluyveromyces lactis
6. The YIplac128_smaI is a linearized vector containing E. coli replicative sequences and a LEU2 marker
7. The second half the Hygromycin B resistance orf and The Ashbya gossypii TEF1 terminator 

### Material

|DNA      | Source  -80      |
|---------|------------------|
|pAG32    | box 3	pos 45   |
|pSU0     |                  |
|pUG6     | box 3	pos 55   |
|YIplac128| box 1   pos 81   |

In [50]:
# This notebook require pydna version 1.2.0 or earlier
import pydna
pydna.__version__

'1.2.0'

### 1. URA3_2micron

This fragment is PCR amplified from the [pSU0 vector](https://www.ncbi.nlm.nih.gov/pubmed/16454044).

In [51]:
gb = pydna.Genbank("bjornjobb@gmail.com")
pSU0 = gb.nucleotide("AB215109.1")

In [52]:
URA3_2micron = pSU0[1041:3620]

In [53]:
URA3_2micron

Dseqrecord(-2579)

### 2. HphMX4(5'part) and 7. HphMX4(3'part)

The plasmid pAG32 contains the HphMX4 marker gene. It is available from [EUROSCARF](http://www.euroscarf.de/plasmid_details.php?accno=P30106). It was constructed by [Goldstein & McCusker](http://www.ncbi.nlm.nih.gov/pubmed/10514571).

The sequence is not available from Genbank, but the EUROSCARF website provides it. Unfortunately, the LOCUS line is malformed in this record (genbank format). 

For this reason we made our own copy of the sequence [here](https://gist.github.com/BjornFJohansson/c5424b7ebbf553c52053). The size of the plasmid is 4160 bp.

In [54]:
text  = pydna.download_text("https://gist.githubusercontent.com/BjornFJohansson/c5424b7ebbf553c52053/raw/64318ead495bc7ade8bb598ab192e76a3569a724/pAG32.gb")
pAG32 = pydna.read(text)

In [55]:
pAG32

Dseqrecord(o4160)

In [56]:
pAG32.list_features()

+----------+-----------+-------+------+--------+--------------+--------------+------+
| Feature# | Direction | Start | End  | Length | id           | type         | orf? |
+----------+-----------+-------+------+--------+--------------+--------------+------+
| 0        |    -->    |   90  | 458  |    368 | <unknown id> | misc_feature |  no  |
| 1        |    -->    |  458  | 1487 |   1029 | <unknown id> | CDS          | yes  |
| 2        |    -->    |  1487 | 1727 |    240 | <unknown id> | misc_feature |  no  |
| 3        |    <--    |  2832 | 3693 |    861 | <unknown id> | CDS          | yes  |
+----------+-----------+-------+------+--------+--------------+--------------+------+

We can inspect the features in the table above to conclude that the HphMX4 cassete starts at 90 in feature 0 and ends at 1727 in feature 2.

In [57]:
hyg_cassette = pAG32[90:1727]

In [58]:
hyg_cassette

Dseqrecord(-1637)

The HphMX4 cassette is 1637 bp. We will split the HphMX4 in two parts in such a way that there is an overlap or shared homology in the middle. The overlap was set to 200 bp to reflect the overlap length in the original del cassette.

In [59]:
middle = int(len(hyg_cassette)/2)
overlap = 200

We split the HphMX4 in two parts:

In [60]:
hphMX4_5_part = hyg_cassette[:middle+overlap]
hphMX4_3_part = hyg_cassette[(middle-overlap):]

The last 400 bp of the first part are equal to the first 400 bp of the second part. 

In [61]:
pydna.eq( hphMX4_5_part[-400:], hphMX4_3_part[:400] )

True

The overlap sequence can be seen below.

In [62]:
print( str(hphMX4_5_part[-400:].seq) )

TCAGCGAGAGCCTGACCTATTGCATCTCCCGCCGTGCACAGGGTGTCACGTTGCAAGACCTGCCTGAAACCGAACTGCCCGCTGTTCTGCAGCCGGTCGCGGAGGCCATGGATGCGATCGCTGCGGCCGATCTTAGCCAGACGAGCGGGTTCGGCCCATTCGGACCGCAAGGAATCGGTCAATACACTACATGGCGTGATTTCATATGCGCGATTGCTGATCCCCATGTGTATCACTGGCAAACTGTGATGGACGACACCGTCAGTGCGTCCGTCGCGCAGGCTCTCGATGAGCTGATGCTTTGGGCCGAGGACTGCCCCGAAGTCCGGCACCTCGTGCACGCGGATTTCGGCTCCAACAATGTCCTGACGGACAATGGCCGCATAACAGCGGTCATTGA


### 3. KlTEF1p and 5. KlTEF1t

Now we need to define the K. lactis TEF1 promoter and terminator to use for the kan gene.

K. lactis sequences can be found at the [Yeast Gene Order Browser](http://ygob.ucd.ie/)

The Kl TEF1 promoter has the following [sequence](http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B09020g&org=klac&nbr=KLLA0B08998g&dir=inverted)

The Kl TEF1 promoter has the following [sequence](http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B08998g&org=klac&nbr=KLLA0B08976g&dir=inverted)

In [63]:
promoter_link ="http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B09020g&org=klac&nbr=KLLA0B08998g&dir=inverted" 

In [64]:
html = pydna.download_text(promoter_link)

The links above goes to html documents that contain the sequences. We use the [BeautifulSoup library](https://www.crummy.com/software/BeautifulSoup/) to extract the sequence.

In [65]:
from bs4 import BeautifulSoup

In [66]:
KlTEF1p = pydna.read( ''.join( BeautifulSoup( html, "lxml").findAll( text = True ) ) )

In [67]:
KlTEF1p

Dseqrecord(-1421)

The K. lactis TEF1prom contains the intergenic sequence between the KlTEF1 gene and the upstream gene. Probably about 400 is sufficient for the promoter to give efficient expression.

In [68]:
KlTEF1p = KlTEF1p[-400:]

We establish the terminator in the same manner

In [69]:
terminator_link = "http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B08998g&org=klac&nbr=KLLA0B08976g&dir=inverted"

In [70]:
html = pydna.download_text(terminator_link)

In [71]:
KlTEF1t = pydna.read( ''.join( BeautifulSoup( html, "lxml").findAll( text = True ) ) )

In [72]:
KlTEF1t

Dseqrecord(-457)

Likewise, 400bp is more than enough for the terminator

In [73]:
KlTEF1t = KlTEF1t[:400]

### 4.  kan

The kan gene was amplififed from the pUG6 plasmid. It was constructed by [Güldener et al.](http://nar.oxfordjournals.org/content/24/13/2519.full).

The sequence is available from [Genbank](http://www.ncbi.nlm.nih.gov/nuccore/AF298793.1). The plasmid itself can be obtained from [EUROSCARF](http://www.euroscarf.de/plasmid_details.php?accno=P30114).

We will download the sequence from Genbank.

In [74]:
pUG6 = gb.nucleotide("AF298793")

The size should be 4009bp.

In [75]:
len(pUG6)

4009

In [76]:
pUG6

We can inspect features to obtain the coding sequence of the kan gene.

In [77]:
pUG6.list_features()

+----------+-----------+-------+------+--------+--------------+--------------+------+
| Feature# | Direction | Start | End  | Length | id           | type         | orf? |
+----------+-----------+-------+------+--------+--------------+--------------+------+
| 0        |    -->    |   0   | 4009 |   4009 | <unknown id> | source       |  no  |
| 1        |    -->    |   52  |  86  |     34 | <unknown id> | misc_feature |  no  |
| 2        |    -->    |   86  | 484  |    398 | <unknown id> | regulatory   |  no  |
| 3        |    -->    |  484  | 1294 |    810 | <unknown id> | gene         | yes  |
| 4        |    -->    |  484  | 1294 |    810 | <unknown id> | CDS          | yes  |
| 5        |    -->    |  1294 | 1559 |    265 | <unknown id> | regulatory   |  no  |
| 6        |    -->    |  1559 | 1593 |     34 | <unknown id> | misc_feature |  no  |
| 7        |    <--    |  2681 | 3542 |    861 | <unknown id> | gene         | yes  |
| 8        |    <--    |  2681 | 3542 |    861 | <unkn

The feature number 4 is the coding sequence for the kan gene:

In [78]:
kan = pUG6.extract_feature(4)

Now we have defined five DNA fragments between 0.4 and 1.1 kb

### 6. YIplac128_smaI

The final DNA fragment is the YIplac128 vector. This vector was constructed by [Gietz and Sugino](https://www.ncbi.nlm.nih.gov/pubmed/3073106). The sequence can be found in Genbank [here](https://www.ncbi.nlm.nih.gov/nuccore/X75463.1). The YIplac 128 will be linearized with [SmaI](http://rebase.neb.com/rebase/enz/SmaI.html) which leaves a blunt cut.

The reason for including this fragment is twofold. We can in theory rescue the construct to E. coli since this plasmid has sequences that allow selection and replication in E. coli. It also serves to extend the region where a double stranded DNA break can be expected to result in recombinantion.

In [79]:
YIplac128 = gb.nucleotide("X75463").looped() # The sequence in Genbank is wrongly marked as linear 
from Bio.Restriction import SmaI
YIplac128_smaI = YIplac128.linearize(SmaI)
YIplac128_smaI

Dseqrecord(-4302)

We have now established all seven necessary linear DNA fragments for the assembly.

In [80]:
from Bio.Restriction import XhoI, SpeI

# There is a bug below! The SpeI site was added to the end of the "hphMX4_5_part" and not to the beginning!

In [81]:
((p1,  p2),
 (p3,  p4), 
 (p5,  p6), 
 (p7,  p8), 
 (p9,  p10),
 (p11, p12))= pydna.assembly_primers(( hphMX4_3_part,
                                       pydna.Dseqrecord( XhoI.site ),
                                       URA3_2micron,
                                       pydna.Dseqrecord( SpeI.site ),
                                       hphMX4_5_part,                                      
                                       KlTEF1p, 
                                       kan, 
                                       KlTEF1t),                                       
                                       vector=YIplac128_smaI, target_tm=50)



In [82]:
p1.id=  "dda1_2nd_f"
p2.id=  "dda2_2nd_r"

p3.id=  "dda3_URA3_2my_f"
p4.id=  "dda4_URA3_2my_r"

p5.id=  "dda5_1st_f"
p6.id=  "dda6_1st_r"

p7.id=  "dda7_Kl_pr_f"
p8.id=  "dda8_Kl_pr_r"

p9.id=  "dda9_kan_f"
p10.id= "dda10_kan_r"

p11.id= "dda11_Kl_tr_f"
p12.id= "dda12_Kl_tr_r"

In [83]:
(p2, p3, p4, p5, p6, p7, p8, p9, p10, p11) = [p[-40:] for p in (p2, p3, p4, p5, p6, p7, p8, p9, p10, p11)]    
p1=p1[-50:]; p12=p12[-50:]

In [87]:
hphMX4_3_part_prd  = pydna.pcr(p1, p2,  pAG32)
URA3_2micron_prd   = pydna.pcr(p3, p4,  pSU0)
hphMX4_5_part_prd  = pydna.pcr(p5, p6,  pAG32)
KlTEF1p_prd        = pydna.pcr(p7, p8,  KlTEF1p)
kan_prd            = pydna.pcr(p9, p10, pUG6)
KlTEF1t_prd        = pydna.pcr(p11,p12, KlTEF1t)

In [88]:
prods = (   URA3_2micron_prd,
            hphMX4_5_part_prd,
            KlTEF1p_prd,
            kan_prd,
            KlTEF1t_prd,
            hphMX4_3_part_prd )

names = ("URA3_2my",
         "prom-Hph",
         "KlTEF1prom",
         "kan_orf",
         "KlTEF1term",
         "Hph-term")

for f,n in zip(prods, names):
    f.name = n

In [90]:
asm = pydna.Assembly(( URA3_2micron_prd,
                       hphMX4_5_part_prd,
                       KlTEF1p_prd,
                       kan_prd,
                       KlTEF1t_prd,
                       YIplac128_smaI,
                       hphMX4_3_part_prd), limit = 26)

In [91]:
asm

Assembly:
Sequences........................: [2617] [1053] [439] [846] [442] [4302] [1074]
Sequences with shared homologies.: [2617] [1053] [1074] [439] [846] [442] [4302]
Homology limit (bp)..............: 26
Number of overlaps...............: 8
Nodes in graph(incl. 5' & 3')....: 10
Only terminal overlaps...........: No
Circular products................: [10540] [6318] [4222]
Linear products..................: [10792] [10620] [10576] [10576] [10575] [10574] [10574] [10572] [10566] [10169] [10158] [9980] [9764] [9564] [9554] [9536] [9357] [9348] [9151] [8941] [8339] [7994] [7923] [7392] [7371] [7164] [6973] [6954] [6900] [6839] [6760] [6757] [6570] [6492] [6354] [6352] [6352] [6350] [6344] [6298] [5947] [5936] [5933] [5890] [5758] [5682] [5542] [5530] [5342] [5296] [5275] [5260] [5135] [5126] [5080] [4852] [4719] [4718] [4673] [4258] [4257] [4042] [3655] [3635] [2678] [2270] [2076] [1676] [1668] [1657] [1460] [1254] [1249] [858] [451] [36] [36] [35] [34] [34] [32] [26]

In [92]:
candidate = asm.circular_products[0]
candidate.figure()

 -|URA3_2my|35
|           \/
|           /\
|           35|prom-Hph|32
|                       \/
|                       /\
|                       32|KlTEF1prom|36
|                                     \/
|                                     /\
|                                     36|kan_orf|34
|                                                \/
|                                                /\
|                                                34|KlTEF1term|26
|                                                              \/
|                                                              /\
|                                                              26|X75463|34
|                                                                        \/
|                                                                        /\
|                                                                        34|Hph-term|36
|                                                                                

In [93]:
pPS1 = candidate

In [94]:
pPS1.cseguid()

vXelWQ46lP0x8856rBYXpzFaSTk

In [95]:
primers = (p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12)
[len(p) for p in primers]

[50, 37, 37, 40, 34, 40, 37, 37, 34, 34, 40, 50]

In [96]:
for p in primers:
    print(p.format("tab"))

dda1_2nd_f	TTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCTCAGCGAGAGCCTGAC

dda2_2nd_r	AGCGGTATTCGCAATCTCGAGTCGACACTGGATGGCG

dda3_URA3_2my_f	GCCATCCAGTGTCGACTCGAGATTGCGAATACCGCTT

dda4_URA3_2my_r	GGGTGACCCGGCGGGGATAATAACTGATATAATTAAATTG

dda5_1st_f	TAATTATATCAGTTATTATCCCCGCCGGGTCACC

dda6_1st_r	GACCCAGTGTTACTAGTTCAATGACCGCTGTTATGCGGCC

dda7_Kl_pr_f	AACAGCGGTCATTGAACTAGTAACACTGGGTCAATCA

dda8_Kl_pr_r	AGTCTTTTCCTTACCCATTTTTAATGTTACTTCTCTT

dda9_kan_f	AGAGAAGTAACATTAAAAATGGGTAAGGAAAAGA

dda10_kan_r	AGTAGTATCAAGTTAAACTTAGAAAAACTCATCG

dda11_Kl_tr_f	CGATGAGTTTTTCTAAGTTTAACTTGATACTACTAGATTT

dda12_Kl_tr_r	GGCCAGTGAATTCGAGCTCGGTACCCTTAGTATTAGTAAATTTGTTGACA



In [97]:
pPS1.name = "pPS1"
pPS1.description=""

In [98]:
pPS1.stamp()

cSEGUID_vXelWQ46lP0x8856rBYXpzFaSTk

In [100]:
pPS1.write("pPS1.gb")

## PCR conditions

In [101]:
for prd in prods:
    print("product name:", prd.name)
    print("template:", prd.template.name)
    print(prd.program())
    print("----------------------------------------------------------")

product name: URA3_2my
template: AB215109

Taq (rate 30 nt/s) 35 cycles             |2617bp
95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|Primer1C 1.0µM
|         |      \ 51.1°C/ 1min19s| 5min |Primer2C 1.0µM
|         |       \_____/         |      |GC 39%
|         |         30s           |      |4-12°C
----------------------------------------------------------
product name: prom-Hph
template: pAG32

Taq (rate 30 nt/s) 35 cycles             |1053bp
95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|Primer1C 1.0µM
|         |      \ 61.7°C/ 0min32s| 5min |Primer2C 1.0µM
|         |       \_____/         |      |GC 54%
|         |         30s           |      |4-12°C
----------------------------------------------------------
product name: KlTEF1prom
templa