# The structure of the pRS6 plasmid

The RS112 strain is a diploid yeast strain containing a complete deletion of the HIS3 ORF on one chromosome and an inactivating integration of the pRS6 plasmid in the HIS3 locus of the other. The integrated pRS6 plasmid can recombine so that and active HIS3 locus is gained and a LEU2 marker is lost. 

This notebook provide a detailed analysis of the construction of the pRS6 plasmid, integration and recombination based on cloning strategies found in the literature. This notebook is also a piece of executable documentation document that simulates the cloning and recombination steps using [pydna](http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0544-x).

    Complete genotype of the pRS112 strain
    
    MATa            MATalfa
    ura3-52         ura3-52                
    leu2-3,112      leu2-D98                
    trp5-27         TRP 
    arg4-3          ARG
    ade2-40         ade2-101
    ilv-92          ILV
    HIS3::pRS6      his3-D200 
    LYS             lys2-801

The sequence for pRS6 does not seem to have been deposited in any database. The physical plasmid has been deposited in an E. coli strain at ATCC according to this [patent](http://www.google.com/patents/US4997757). The patent also contain figures describing the cloning strategy used for creating the pRS6.

The images below (Fig 7, 8 and 9) were extracted from the patent.

Fig 7 shows the construction of the pRS5 vector from pBR322 and pSZ515.
Fig 8 shows the construction of the pRS6 vector from pRS5 and YEp13.
Fig 9 shows the integration of the pRS6 plasmid into the HIS3 locus of Saccharomyces cerevisiae.

![fig7](fig7.png)

In [1]:
from pydna.all import *

In [2]:
pBR322 = genbank("J01749")

In [3]:
pBR322

In [4]:
from Bio.Restriction import ClaI, BamHI

In [5]:
stuffer, pBR322_cla_bam = pBR322.cut(ClaI, BamHI)

In [6]:
stuffer, pBR322_cla_bam

(Dseqrecord(-355), Dseqrecord(-4012))

The pSZ515 plasmid sequence does not seem to be avilable in any sequence databse. The plasmid is described in [Orr-Weaver et al. 1983](Orr-Weaver and Szostak 1983 - Yeast recombination - the association between double-strand gap repair and crossing-over.pdf). The fragment #106 in Fig 7 seems to be originally a piece of DNA from around the HIS3 locus in the yeast genome. Related plasmids and strategies are also described in [Orr-Weaver et al. 1981](Orr-Weaver et al. 1981 - Yeast transformation - a model system for the study of recombination.pdf). The HIS3 fragment has also been described along with a detailed restriction map in [Jones and Prakash 1990](jones1990.pdf).

Figure 7 indicate that the fragment #106 is flanked by BamHI sites with an internal HpaII site. We will use this information to try to identify this fragment in the yeast genome.

In [7]:
from pygenome import sg

In [8]:
HIS3 = sg.stdgene["HIS3"]

In [9]:
HIS3_locus = Dseqrecord( HIS3.locus() )

In [10]:
HIS3_locus

Dseqrecord(-2663)

We start with the HIS3 locus sequence from the Saccharomyces cerevisiae genome sequencing project. The HIS3_locus variable contains the HIS3 orf and 1000 bp up and downstream sequence. The total size is 2663 bp.

In [11]:
HIS3_locus.cut(BamHI)

(Dseqrecord(-535), Dseqrecord(-1771), Dseqrecord(-365))

This fragment has two BamHI sites which can be assumed to be the ones flanking the #86 fragment in Fig 7. If this is correct, the middle (1771 bp fragment) should have one internal HpaII site. If this assumption is correct, the HIS3 open reading frame should be contained withit this fragment. The cell below shows that this is so and fragment should be the 1771 bp fragment.

In [12]:
str(HIS3.cds.seq) in HIS3_locus

True

In [13]:
stuffer1 ,fragment86, stuffer2 = HIS3_locus.cut(BamHI)

In [14]:
stuffer1 ,fragment86, stuffer2

(Dseqrecord(-535), Dseqrecord(-1771), Dseqrecord(-365))

In [15]:
from Bio.Restriction import HpaII, BglII

Apparently, the fragment 86 has the expected internal HpaII site.

In [16]:
fragment86.cut(HpaII)

(Dseqrecord(-689), Dseqrecord(-1084))

The fragment also has four BglII sites as depicted in Fig 2 of [Orr-Weaver et al. 1983](Orr-Weaver and Szostak 1983 - Yeast recombination - the association between double-strand gap repair and crossing-over.pdf).

In [17]:
fragment86.cut(BglII)

(Dseqrecord(-872), Dseqrecord(-64), Dseqrecord(-745), Dseqrecord(-102))

This means that we can assume that we have the correct sequence with some degree of certainty. Fragment #106 is the large digestion product of fragment #86 with HpaII according to Fig 7.

In [18]:
stuffer, fragment106 = fragment86.cut(HpaII)

In [19]:
stuffer, fragment106

(Dseqrecord(-689), Dseqrecord(-1084))

The pRS5 vector results from ligation of a pBR322 BamHI-ClaI fragment with fragment #106. 

In [20]:
pRS5 = (pBR322_cla_bam + fragment106).looped()

In [21]:
pRS5.write("pRS5.gb")

The pRS6 is made from combining a KpnI-XhoI fragment from vector YEP13 (fragment #114) with a KpnI-SalI fragment from pRS5 (fragment #110) (Fig 8). The YEP13 sequence can be found in Genbank under the accession number [U03498](https://www.ncbi.nlm.nih.gov/nuccore/U03498).

![fig8](fig8.png)

In [22]:
YEP13 = genbank("U03498")

In [23]:
YEP13

In [24]:
from Bio.Restriction import XhoI, KpnI

The fragment #114 is a partial digest using XhoI and KpnI. Pydna has no partial digestion functionality, so we have to cut completely and then add back the desired fragment(s). XhoI cuts once in YEp13 whle KpnI cuts twice. 

In [25]:
YEP13.cut(XhoI)

(Dseqrecord(-10671),)

In [26]:
YEP13.cut(KpnI)

(Dseqrecord(-8186), Dseqrecord(-2489))

The complete digestion yields three fragments

In [27]:
frag2,frag3,frag1 = YEP13.cut(XhoI, KpnI)

In [28]:
frag1,frag2,frag3

(Dseqrecord(-7291), Dseqrecord(-899), Dseqrecord(-2489))

Fragment #114 in Fig 8 is the 899 and 2489 fragments above combined.

In [29]:
fragment114 = frag2+frag3 

In [30]:
fragment114.cut(KpnI)

(Dseqrecord(-899), Dseqrecord(-2489))

In [31]:
fragment114.seq

Dseq(-3384)
TCGAGGAG..GGTGGTAC
    CCTC..CCAC    

In [32]:
from Bio.Restriction import SalI, HindIII

In [33]:
pRS5.cut(KpnI, SalI)

(Dseqrecord(-956), Dseqrecord(-4142))

The KpnI SalI digestion produces two fragments from pRS5. The larger fragment (fragment #110) is the one we need for the construction of pRS6 (Fig 8).

In [34]:
pRS5.cut(KpnI, SalI)

(Dseqrecord(-956), Dseqrecord(-4142))

In [35]:
stuffer, fragment110 = pRS5.cut(KpnI, SalI)

In [36]:
pRS6 = (fragment110 + fragment114.rc()).looped()

In [37]:
pRS6.write("pRS6.gb")

The plasmid pRS6 was digested with HindIII prior to integration in one of the parent strains of pRS112 (Fig 9).

![fig9](fig9.png)

Quote from the patent: 

"Referring to FIG. 10, plasmid 116 was digested with HindIII to produce a small gap within the internal fragment of his3; the HindIII gap is illustrated in FIG. 6 at points 120 and 122. Digestion was conducted in substantial accordance with the procedure described above."

"The yeast strain S35/2-10C was transformed with the digested plasmid 116 (pRS6), and colonies able to grow on media lacking leucine were isolated. This isolation procedure was conducted in accordance with the procedure of the F. Sherman article ("Methods in yeast genetics . . . ", 1986) described above."


In [38]:
pRS6.cut(HindIII)

(Dseqrecord(-7332), Dseqrecord(-194))

The 194 bp fragment is the "small gap" described in the patent quote.

In [39]:
pRS6_hind, stuffer = pRS6.cut(HindIII)

In [40]:
pRS6_hind.name = "prs6"

In [41]:
HIS3_locus

Dseqrecord(-2663)

In [42]:
HIS3_locus.name = "HIS3 locus"

We use the pydna Assembly functionality to simulate the recombination between the HIS3 locus and the pRS6 plasmid.

In [43]:
asm = Assembly((HIS3_locus, pRS6_hind, HIS3_locus))

In [44]:
asm

Assembly
fragments..: 2663bp 7332bp 2663bp
limit(bp)..: 25
G.nodes....: 4
algorithm..: common_sub_strings

In [45]:
candidate = asm.assemble_linear()[0]
candidate.figure()

HIS3 locus|133
           \/
           /\
           133|prs6|91
                    \/
                    /\
                    91|HIS3 locus

The figure above indicate that the plasmid was inserted be recombination between two sequences 133 bp and 91 bp. We call the resulting integrated plasmid "RS112_his3_locus".

In [46]:
RS112_his3_locus = candidate

In [47]:
RS112_his3_locus.write("RS112_his3_locus")

The cassette can recombine between repeated sequences of the HIS3 orf in such a way that the LEU2 marker and the pBR322 plasmid sequences are lost and a complete and active HIS3 marker is gained.

A DNA double stranded break somewhere between the repeated sequences is thought to induce such a recombination. Below we use the SalI restriction enzyme to simulate this break. SalI cuts inside the cassette once in the pBR322 derived sequence.

In [48]:
from Bio.Restriction import SalI
RS112_his3_locus.cut(SalI)

(Dseqrecord(-2790), Dseqrecord(-7395))

In [49]:
before, after = RS112_his3_locus.cut(SalI)

We cut the cassette in two pieces called "before" and "after".

In [50]:
before.name  ="before"
after.name  = "after"

We use the pydna Assembly functionality to simulate this inermolecular recombination as well.

In [51]:
asm2 = Assembly((before, after), limit=400)
asm2

Assembly
fragments..: 2790bp 7395bp
limit(bp)..: 400
G.nodes....: 2
algorithm..: common_sub_strings

In [52]:
candidate = asm2.assemble_linear()[1]
candidate.figure()

before|410
       \/
       /\
       410|after

The second linear recombination product is the one we want since the two fragments appear in the order we expected. We can conclude that the recombination happens between two sequences that are 410 bp long.

From this analysis follows that the region where a double stranded break can expected to form is the size of the integrated cassette (10181) minus the size of the cassette after recombination (2663) which is 7518 bp.

We also confirm below that the original HIS3 sequence is present in the recombined cassette.

In [53]:
str(HIS3.cds.seq) in candidate

True