## Plan for Dominant Del Assay plasmid pPS1

In [1]:
from IPython.display import Image
Image(url='http://cancerres.aacrjournals.org/content/66/7/3480/F1.medium.gif')

### The del assay

The image above depicts the principle of the original del assay.

http://cancerres.aacrjournals.org/content/66/7/3480.abstract?sid=9297eb2f-00bd-466f-89d8-b38af175d67f

The RS112 yeast strain contains a plasmid carrying the LEU2 gene and an internal fragment of the yeast HIS3 gene integrated into the genome at the HIS3 locus. This resulted in two copies of the his3 gene, one with a terminal deletion at the 3'-end, and the other with a terminal deletion at the 5'-end. There are ~400 bp of homology between the two copies (striped region). B, DNA strand breakage leads to bidirectional degradation until homologous single-stranded regions are exposed. C, annealing of homologous regions. D, reversion to HIS+ phenotype and deletion of plasmid.

### the pPS1 plasmid

This cassette consists of two dominant markers HphMX4 and the kanamycin resistance gene from the E. coli transposon TN903 "kan".

The HphMX4 marker is the Hygromycin B resistance gene from an E. coli [plasmid](http://www.ncbi.nlm.nih.gov/pubmed/6319235) under control of the Ashbya gossypii TEF1 promoter and terminator.

The idea is to split the HphMX4 marker in two pieces so that there is a shared homology, like the HIS3 gene of the del assay. The kan gene will be controlled by the promoter and terminator from the Kluyveromyces lactis TEF1 homolog. 

The TEF1 promoter-kan-TEF1 terminator fragments are cloned inside the HphMX4 marker in such a way that there is a region of homology on each side by which the TEF1 promoter-kan-TEF1 terminator can be lost and the HphMX4 gene reconstituted. 

The whole construct is made by gap repair in one reaction.


###material

|DNA      | Source  -80      |
|---------|------------------|
|pAG32    | box 3	pos 45   |
|pMEC1030 | Filipa #114      |
|pUG6     | box 3	pos 55   |
|YIplac128| box 1   pos 81   |


K. lactis is on plate

In [2]:
import pydna

In [3]:
web = pydna.web()

The plasmid pAG32 contains the HphMX4 marker gene. It is available from [EUROSCARF](http://www.euroscarf.de/plasmid_details.php?accno=P30106). It was constructed by [Goldstein & McCusker](http://www.ncbi.nlm.nih.gov/pubmed/10514571).

The sequence is not available from Genbank, but the EUROSCARF website provides it. Unfortunately, the LOCUS line is malformed in this record (genbank format). For this reason I made my own copy of the sequence [here](https://gist.github.com/BjornFJohansson/c5424b7ebbf553c52053). The size of the plasmid is 4160 bp.

In [4]:
text  = web.download("https://gist.githubusercontent.com/BjornFJohansson/c5424b7ebbf553c52053/raw/64318ead495bc7ade8bb598ab192e76a3569a724/pAG32.gb")
pAG32 = pydna.read(text)

In [5]:
pAG32

Dseqrecord(o4160)

In [6]:
pAG32.list_features()

+----------+-----------+-------+------+--------+--------------+--------------+------+
| Feature# | Direction | Start | End  | Length | id           | type         | orf? |
+----------+-----------+-------+------+--------+--------------+--------------+------+
| 0        |    -->    |   90  | 458  |    368 | <unknown id> | misc_feature |  no  |
| 1        |    -->    |  458  | 1487 |   1029 | <unknown id> | CDS          | yes  |
| 2        |    -->    |  1487 | 1727 |    240 | <unknown id> | misc_feature |  no  |
| 3        |    <--    |  2832 | 3693 |    861 | <unknown id> | CDS          | yes  |
+----------+-----------+-------+------+--------+--------------+--------------+------+

We can inspect the features to see that the HphMX4 cassete starts at 90 and ends at 1727

In [7]:
hyg_cassette = pAG32[90:1727]

This makes the HphMX4 cassette 1637 bp long.

In [8]:
hyg_cassette

Dseqrecord(-1637)

In [9]:
middle = len(hyg_cassette)/2
overlap = 200

We split the HphMX4 in two parts:

In [10]:
first_part = hyg_cassette[:middle+overlap]
second_part = hyg_cassette[(middle-overlap):]

In [11]:
pydna.eq( first_part[-400:], second_part[:400] )

True

In [12]:
str(first_part[-400:].seq)

'TCAGCGAGAGCCTGACCTATTGCATCTCCCGCCGTGCACAGGGTGTCACGTTGCAAGACCTGCCTGAAACCGAACTGCCCGCTGTTCTGCAGCCGGTCGCGGAGGCCATGGATGCGATCGCTGCGGCCGATCTTAGCCAGACGAGCGGGTTCGGCCCATTCGGACCGCAAGGAATCGGTCAATACACTACATGGCGTGATTTCATATGCGCGATTGCTGATCCCCATGTGTATCACTGGCAAACTGTGATGGACGACACCGTCAGTGCGTCCGTCGCGCAGGCTCTCGATGAGCTGATGCTTTGGGCCGAGGACTGCCCCGAAGTCCGGCACCTCGTGCACGCGGATTTCGGCTCCAACAATGTCCTGACGGACAATGGCCGCATAACAGCGGTCATTGA'

Now we need to define the promoter and terminator to use for the kan gene.

K. lactis sequences are from the [Yeast Gene Order Browser](http://ygob.ucd.ie/)


The Kl TEF1 promoter has the following [sequence](http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B09020g&org=klac&nbr=KLLA0B08998g&dir=inverted)

The Kl TEF1 promoter has the following [sequence](http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B08998g&org=klac&nbr=KLLA0B08976g&dir=inverted)

In [13]:
promoter_link ="http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B09020g&org=klac&nbr=KLLA0B08998g&dir=inverted" 

In [14]:
terminator_link = "http://ygob.ucd.ie/cgi/browser/intergenic.pl?ver=Latest&gene=KLLA0B08998g&org=klac&nbr=KLLA0B08976g&dir=inverted"

In [15]:
from bs4 import BeautifulSoup

In [16]:
html = web.download(promoter_link)

In [18]:
TEF1prom = pydna.read( ''.join( BeautifulSoup( html ).findAll( text = True ) ) )

In [19]:
TEF1prom

Dseqrecord(-1421)

About 400bp is sufficient for the promoter

In [20]:
TEF1prom = TEF1prom[-400:]

We establish the terminator in the same manner

In [22]:
html = web.download(terminator_link)
TEF1term = pydna.read( ''.join( BeautifulSoup( html ).findAll( text = True ) ) )

In [23]:
TEF1term

Dseqrecord(-457)

Likewise, 400bp is more than enough for the terminator

In [24]:
TEF1term = TEF1term[:400]

The kan gene can be found in the pUG6 plasmid. It was constructed by [Güldener et al.](http://nar.oxfordjournals.org/content/24/13/2519.full).
The sequence is available from [Genbank](http://www.ncbi.nlm.nih.gov/nuccore/AF298793.1). The plasmid itself can be obtained from [EUROSCARF](http://www.euroscarf.de/plasmid_details.php?accno=P30114).

We will download the sequence from Genbank:

In [25]:
gb = pydna.Genbank("bjornjobb@gmail.com")

In [26]:
pUG6 = gb.nucleotide("AF298793")

The size is 4009bp

In [27]:
pUG6

Dseqrecord(o4009)

We can inspect features to obtain the coding sequence:

In [28]:
pUG6.list_features()

+----------+-----------+-------+------+--------+--------------+--------------+------+
| Feature# | Direction | Start | End  | Length | id           | type         | orf? |
+----------+-----------+-------+------+--------+--------------+--------------+------+
| 0        |    -->    |   0   | 4009 |   4009 | <unknown id> | source       |  no  |
| 1        |    -->    |   52  |  86  |     34 | <unknown id> | misc_feature |  no  |
| 2        |    -->    |   86  | 484  |    398 | <unknown id> | regulatory   |  no  |
| 3        |    -->    |  484  | 1294 |    810 | <unknown id> | gene         | yes  |
| 4        |    -->    |  484  | 1294 |    810 | <unknown id> | CDS          | yes  |
| 5        |    -->    |  1294 | 1559 |    265 | <unknown id> | regulatory   |  no  |
| 6        |    -->    |  1559 | 1593 |     34 | <unknown id> | misc_feature |  no  |
| 7        |    <--    |  2681 | 3542 |    861 | <unknown id> | gene         | yes  |
| 8        |    <--    |  2681 | 3542 |    861 | <unkn

The feature number 4 is the coding sequence for the kan gene:

In [29]:
kan_orf = pUG6.extract_feature(4)

Now we have defined five DNA fragments between 0.4 and 1.1 kb

In [56]:
pMEC1030 = pydna.read("pMEC1030.gb")

##We could not prep the pMEC1030, so we use pSU0 instead. The name is still pMEC1030 in the code below.

In [31]:
pSU0 = gb.nucleotide("AB215109.1")

In [32]:
pMEC1030 = pSU0

In [57]:
URA3_2micron = pMEC1030[1041:3620]

In [58]:
frags = (URA3_2micron,
         first_part,
         TEF1prom,
         kan_orf,
         TEF1term,
         second_part)

In [59]:
frags

(Dseqrecord(-2579),
 Dseqrecord(-1018),
 Dseqrecord(-400),
 Dseqrecord(-810),
 Dseqrecord(-400),
 Dseqrecord(-1019))

We will also need a vector backbone for the construction. We will use YIplac128.

In [60]:
YIplac128 = gb.nucleotide("X75463").looped()
from Bio.Restriction import SmaI
YIplac128_smaI = YIplac128.linearize(SmaI)

In [61]:
from Bio.Restriction import XhoI, SpeI

In [62]:
((p1,  p2), 
 (p3,  p4), 
 (p5,  p6), 
 (p7,  p8), 
 (p9,  p10),
 (p11, p12))= pydna.assembly_primers((second_part,
                                      pydna.Dseqrecord( XhoI.site ),
                                      URA3_2micron,
                                      pydna.Dseqrecord( SpeI.site ),
                                      first_part, 
                                      TEF1prom, 
                                      kan_orf, 
                                      TEF1term),                                       
                                      vector=YIplac128_smaI, target_tm=50)

In [63]:
p1.id=  "dda1_2nd_f"
p2.id=  "dda2_2nd_r"
p3.id=  "dda3_URA3_2my_f"
p4.id=  "dda4_URA3_2my_r"
p5.id=  "dda5_1st_f"
p6.id=  "dda6_1st_r"
p7.id=  "dda7_Kl_pr_f"
p8.id=  "dda8_Kl_pr_r"
p9.id=  "dda9_kan_f"
p10.id= "dda10_kan_r"
p11.id= "dda11_Kl_tr_f"
p12.id= "dda12_Kl_tr_r"

In [64]:
(p2, p3, p4, p5, p6, p7, p8, p9, p10, p11) = [p[-40:] for p in (p2, p3, p4, p5, p6, p7, p8, p9, p10, p11)]    

In [65]:
p1=p1[-50:]
p12=p12[-50:]

In [68]:
second_part_prd = pydna.pcr(p1,p2, pAG32)
URA3_2micron_prd = pydna.pcr(p3,p4, pMEC1030)
first_part_prd = pydna.pcr(p5,p6, pAG32)
prom_prd = pydna.pcr(p7, p8, TEF1prom)
kan_prd = pydna.pcr(p9, p10, pUG6)
term_prd = pydna.pcr(p11, p12, TEF1term)

In [67]:
ape URA3_2micron_prd

In [69]:
prods = (URA3_2micron_prd,
         first_part_prd,
         prom_prd,
         kan_prd,
         term_prd,
         second_part_prd)

In [70]:
names = ("URA3_2my",
         "prom-Hph",
         "KlTEF1prom",
         "kan_orf",
         "KlTEF1term",
         "Hph-term")

In [71]:
for f,n in zip(prods, names):
    f.name = n

In [72]:
#import os
#os.environ["pydna_cache"] = "refresh"

In [73]:
asm = pydna.Assembly(( YIplac128_smaI,
                       URA3_2micron_prd,
                       first_part_prd,
                       prom_prd,
                       kan_prd,
                       term_prd,
                       second_part_prd), limit = 30)

In [74]:
asm

Assembly:
Sequences........................: [4302] [2617] [1057] [439] [846] [451] [1074]
Sequences with shared homologies.: [4302] [451] [1074] [2617] [1057] [439] [846]
Homology limit (bp)..............: 30
Number of overlaps...............: 8
Nodes in graph(incl. 5' & 3')....: 10
Only terminal overlaps...........: No
Circular products................: [10540] [6318] [4222]
Linear products..................: [10792] [10620] [10576] [10576] [10576] [10575] [10575] [10574] [10574] [10173] [10158] [9981] [9765] [9564] [9554] [9536] [9362] [9348] [9151] [8945] [8340] [7994] [7923] [7392] [7375] [7164] [6973] [6954] [6908] [6839] [6769] [6757] [6570] [6492] [6354] [6354] [6353] [6352] [6352] [6306] [5951] [5936] [5933] [5890] [5759] [5682] [5543] [5530] [5342] [5296] [5279] [5268] [5140] [5126] [5080] [4852] [4723] [4719] [4677] [4258] [4257] [4042] [3655] [3639] [2686] [2270] [2084] [1676] [1668] [1665] [1460] [1262] [1249] [858] [455] [36] [36] [36] [35] [35] [34] [34]

In [98]:
candidate = asm.circular_products[1]
candidate.figure()

 -|X75463|34
|         \/
|         /\
|         34|Hph-term|400
|                     \/
|                     /\
|                     400|prom-Hph|36
|                                  \/
|                                  /\
|                                  36|KlTEF1prom|36
|                                                \/
|                                                /\
|                                                36|kan_orf|35
|                                                           \/
|                                                           /\
|                                                           35|KlTEF1term|34
|                                                                         \/
|                                                                         /\
|                                                                         34-
|                                                                            |
 -------------------------------------

In [77]:
pPS1 = candidate

In [86]:
pPS1.cseguid()

vXelWQ46lP0x8856rBYXpzFaSTk

In [87]:
primers = (p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12)
[len(p) for p in primers]

[50, 37, 37, 40, 34, 37, 37, 37, 34, 34, 40, 50]

In [88]:
for p in primers:
    print p.format("tab")

dda1_2nd_f	TTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCTCAGCGAGAGCCTGAC

dda2_2nd_r	AGCGGTATTCGCAATCTCGAGTCGACACTGGATGGCG

dda3_URA3_2my_f	GCCATCCAGTGTCGACTCGAGATTGCGAATACCGCTT

dda4_URA3_2my_r	GGGTGACCCGGCGGGGATAATAACTGATATAATTAAATTG

dda5_1st_f	TAATTATATCAGTTATTATCCCCGCCGGGTCACC

dda6_1st_r	GATTGACCCAGTGTTACTAGTTCAATGACCGCTGTTA

dda7_Kl_pr_f	AACAGCGGTCATTGAACTAGTAACACTGGGTCAATCA

dda8_Kl_pr_r	AGTCTTTTCCTTACCCATTTTTAATGTTACTTCTCTT

dda9_kan_f	AGAGAAGTAACATTAAAAATGGGTAAGGAAAAGA

dda10_kan_r	AGTAGTATCAAGTTAAACTTAGAAAAACTCATCG

dda11_Kl_tr_f	TCGATGAGTTTTTCTAAGTTTAACTTGATACTACTAGATT

dda12_Kl_tr_r	AAAACGACGGCCAGTGAATTCGAGCTCGGTACCCTTAGTATTAGTAAATT



##PCR conditions

In [89]:
for prd in prods:
    print "product name:", prd.name
    print "template:", prd.template.name
    print prd.program()
    print "----------------------------------------------------------"

product name: URA3_2my
template: pMEC1030

Taq (rate 30 nt/s) 35 cycles             |2617bp
95.0°C    |95.0°C                 |      |SantaLucia 1998
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|
|         |      \ 50.0°C/ 1min18s| 5min |
|         |       \_____/         |      |
|         |         30s           |      |4-12°C
----------------------------------------------------------
product name: prom-Hph
template: pAG32

Taq (rate 30 nt/s) 35 cycles             |1057bp
95.0°C    |95.0°C                 |      |SantaLucia 1998
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|
|         |      \ 56.0°C/ 0min31s| 5min |
|         |       \_____/         |      |
|         |         30s           |      |4-12°C
----------------------------------------------------------
product name: KlTEF1prom
template: Intergenic-KLLA0

Taq (rate 30 nt/s) 35 cycles             |439bp
95.0°C    |95.0°C     

In [90]:
pPS1.cseguid()

vXelWQ46lP0x8856rBYXpzFaSTk

In [91]:
pPS1.name = "pPS1"
pPS1.description=""

In [92]:
pPS1.stamp()

cSEGUID_vXelWQ46lP0x8856rBYXpzFaSTk_2015-06-01T10:42:12.839945

In [93]:
pPS1.write("pPS1.gb")

#[DOWNLOAD](pPS1.gb)

In [94]:
r = pydna.read("pPS1.gb")

In [95]:
r.verify_stamp()

cSEGUID_vXelWQ46lP0x8856rBYXpzFaSTk

# Use

In [99]:
large, small = pPS1.cut(SpeI, XhoI)

In [100]:
ape large