# Gibson primer design & assembly

This notebook describe primer design for assembly of linear DNA fragments by techniques like homologous recombination or Gibson assembly. The goal of this experiemtn is to create a Saccharomyces cerevisiae vector that expresses the cytochrome C1 gene CYC1 with a c-terminal GFP tag using the yeast expression vector p426GPD. We also would like to have a unique restriction site between the promoter in p426GPD (Which is the TDH3 promoter).

This notebook designs the necessary primers for this experiment. For more inormation on Gibson assembly, addgene has a nice page [here](https://www.addgene.org/protocols/gibson-assembly/).

The first step is to read the sequences from local files. The sequences can also be read directly from genbank using their accession numbers which are:

* [V01298](https://www.ncbi.nlm.nih.gov/nuccore/V01298)
* [AF298787](https://www.ncbi.nlm.nih.gov/nuccore/AF298787)
* [DQ019861](https://www.ncbi.nlm.nih.gov/nuccore/DQ019861)

In [1]:
from pydna.readers import read

In [2]:
cyc1 = read("cyc1.gb")

In [3]:
cyc1

The cyc1.gb sequence file only contains the ORF, so we can use it directly. The sequence file can be inspected using the ling above.

In [4]:
cyc1.isorf()

True

In [5]:
pUG35 = read("pUG35.gb")

In [6]:
pUG35

In [7]:
p426GPD = read("p426GPD.gb")

In [8]:
p426GPD

The pUG35 is a plasmid containing the GFP gene. We have to find the exact DNA fragment we want. The pUG35 genbank file contains features, one of which is the GFP ORF. Inspection in ApE concluded that the feature number 5 in the list below is the GFP ORF.

In [9]:
pUG35.list_features()

+----------+-----------+-------+------+--------+--------------+--------------+------+
| Feature# | Direction | Start | End  | Length | id           | type         | orf? |
+----------+-----------+-------+------+--------+--------------+--------------+------+
| 0        |    -->    |   0   | 6231 |   6231 | <unknown id> | source       |  no  |
| 1        |    -->    |  416  | 1220 |    804 | <unknown id> | gene         | yes  |
| 2        |    -->    |  416  | 1220 |    804 | <unknown id> | CDS          | yes  |
| 3        |    -->    |  2003 | 2262 |    259 | <unknown id> | terminator   |  no  |
| 4        |    <--    |  2270 | 2987 |    717 | <unknown id> | gene         | yes  |
| 5        |    <--    |  2270 | 2987 |    717 | <unknown id> | CDS          | yes  |
| 6        |    -->    |  3050 | 3443 |    393 | <unknown id> | promoter     |  no  |
| 7        |    -->    |  3881 | 3954 |     73 | <unknown id> | rep_origin   |  no  |
| 8        |    <--    |  4656 | 5517 |    861 | <unkn

We extract the GFP sequence from Feature #5. The GFP gene is on the antisense strand, but it is returned in the correct orientation:

In [10]:
gfp=pUG35.extract_feature(5)

In [11]:
gfp.seq

Dseq(-717)
ATGT..ATAA
TACA..TATT

In [12]:
gfp.isorf()

True

We need to linearize p426GPD vector before the assembly. The [SmaI](http://rebase.neb.com/rebase/enz/SmaI.html) restriction enzyme cuts between the promoter and the terminator. 

In [13]:
from Bio.Restriction import SmaI

In [14]:
linear_vector= p426GPD.linearize(SmaI)

In [15]:
linear_vector

Dseqrecord(-6606)

In [16]:
from pydna.design import primer_design

We will amplify mosrt of the fragments using PCR, so we have to design primers first.

In [17]:
cyc1_amplicon = primer_design(cyc1)

The primer_design function returns an Amplicon object which describes a PCR amplification:

In [18]:
cyc1_amplicon.figure()

5ATGACTGAATTCAAGGC...GAAAAAAGCCTGTGAGTAA3
                     ||||||||||||||||||| tm 51.4 (dbd) 56.3
                    3CTTTTTTCGGACACTCATT5
5ATGACTGAATTCAAGGC3
 ||||||||||||||||| tm 50.2 (dbd) 55.0
3TACTGACTTAAGTTCCG...CTTTTTTCGGACACTCATT5

In [19]:
gfp_amplicon = primer_design(gfp)

Then it is practical to collect all fragments to be assembled in a list or tuple. Note that below, the linear_vector appears both in the beginning and at the end. We do this since we would like to have a circular assembly.

In [20]:
fragments = ( linear_vector, cyc1_amplicon, gfp_amplicon, linear_vector )

We would like to have a unique cutting enzyme befor the cyc1 gene, so we should try to find some that dont cut:

In [21]:
from Bio.Restriction import BamHI

In [22]:
if not any( x.cut(BamHI) for x in fragments ):
    print("no cut!")
else:
    print("cuts!")

cuts!


In [23]:
from Bio.Restriction import NotI

BamHI apparently cuts, lets try with NotI 

In [24]:
if not any( x.cut(NotI) for x in fragments ):
    print("no cut!")
else:
    print("cuts!")

no cut!


NotI does not cut, lets use this!

In [25]:
from pydna.dseqrecord import Dseqrecord

In [26]:
site = Dseqrecord(NotI.site)

In [27]:
site.seq

Dseq(-8)
GCGGCCGC
CGCCGGCG

In [28]:
from pydna.design import assembly_fragments

In [29]:
linear_vector.locus = "p426GPD"
cyc1_amplicon.locus = "CYC1"
gfp_amplicon.locus = "GFP"

In [30]:
fragment_list = assembly_fragments((linear_vector, site, cyc1_amplicon,gfp_amplicon,linear_vector))

In [31]:
fragment_list

[Dseqrecord(-6606), Amplicon(391), Amplicon(770), Dseqrecord(-6606)]

We note that the amplicons are now a little bit larger than before. The assembly_fragments function basically adds tails to the primers of amplicon objects to facilitate the assembly. The NotI site is small ,so it was incorporated in the formward PCR primer of the CYC1 Amplicon. We can see that the CYC1 primers are quite a bit longer:

In [32]:
fragment_list[1].figure()

                                           5ATGACTGAATTCAAGGC...GAAAAAAGCCTGTGAGTAA3
                                                                ||||||||||||||||||| tm 51.4 (dbd) 56.3
                                                               3CTTTTTTCGGACACTCATTTACAGATTTCCACTTCTT5
5TAGTTTCGACGGATTCTAGAACTAGTGGATCCCCCGCGGCCGCATGACTGAATTCAAGGC3
                                            ||||||||||||||||| tm 50.2 (dbd) 55.0
                                           3TACTGACTTAAGTTCCG...CTTTTTTCGGACACTCATT5

Finally, we assemble the fragments using the Assembly class

In [33]:
from pydna.assembly import Assembly

We remove the final fragment, since we want a circular fragment. 

In [34]:
fragment_list = fragment_list[:-1]

In [35]:
fragment_list

[Dseqrecord(-6606), Amplicon(391), Amplicon(770)]

In [36]:
asm = Assembly(fragment_list)

In [37]:
asm

Assembly:
Sequences........................: [6606] [391] [770]
Sequences with shared homologies.: [6606] [391] [770]
Homology limit (bp)..............: 25
Number of overlaps...............: 3
Nodes in graph(incl. 5' & 3')....: 5
Only terminal overlaps...........: No
Circular products................: [7661]
Linear products..................: [7697] [7696] [7696] [7341] [6962] [1125] [36] [35] [35]

In [38]:
candidate = asm.circular_products[0]

In [39]:
candidate

In [40]:
p426GPD_CYC1_GFP = candidate

In [41]:
p426GPD_CYC1_GFP.write("p426GPD_CYC1_GFP.gb")

In [42]:
from pydna.amplicon import Amplicon

In [43]:
amplicons1 = [x for x in fragment_list if isinstance(x, Amplicon)]

In [44]:
amplicons1

[Amplicon(391), Amplicon(770)]

In [45]:
# Get forward and reverse primer for each Amplicon
primers1 = [(y.forward_primer, y.reverse_primer) for y in amplicons1]

In [46]:
# print primer pairs:
for pair in primers1:
    print(pair[0].format("fasta"))
    print(pair[1].format("fasta"))
    print()

>fw330 CYC1
TAGTTTCGACGGATTCTAGAACTAGTGGATCCCCCGCGGCCGCATGACTGAATTCAAGGC

>rv330 CYC1
TTCTTCACCTTTAGACATTTACTCACAGGCTTTTTTC


>fw717 AF298787.1_rc
AAAAAAGCCTGTGAGTAAATGTCTAAAGGTGAAGAATTATT

>rv717 AF298787.1_rc
GTATCGATAAGCTTGATATCGAATTCCTGCAGCCCTTATTTGTACAATTCATCCATAC


