# Gibson assembly implementation

Currently from conversations with Fred, it seems like Gibson assembly is the prudent route for building plasmids. Accordingly, I want to implement automated protocol design into the pipeline. While I think that when actually designing variable regions an extra complemtary region will be added to avoid having to create primers for each
region the concepts will be the same.

Here I am following along with the [pyDNA gibson assembly tutorial](https://github.com/BjornFJohansson/pydna-examples/blob/master/notebooks/gibson/gibson.ipynb) using pMAL-hRNASEH1.

![](files/pmal.png)

## Gibson assembly background and in context

[AddGene gibson assembly](https://www.addgene.org/protocols/gibson-assembly/)

![](https://media.addgene.org/data/easy-thumbnails/filer_public/cms/filer_public/15/c4/15c45cf9-3d03-4f61-93e9-c39159f6916e/gibson_assembly_overview_1.jpg__700x351_q85_crop_subsampling-2_upscale.png)

Here linearized receipient plasmid would be some backbone we want to use to clone variable regions into. The `PCR Product / DNA fragment` would be the variable region. Under this workflow we would want to deseign in the A and B complementary regions into each fragment I think. This would avoid having to use primers for each VR. 

This region would then be dependent on the plasmid backbone that is selected and the location the insert is made. Additionally this method seems best for single fragment insert. In reality we would want to insert both initiatitor and terminator sequences at the same time ideally with promotors.

```
----| Promotor -> |--| Initiator |--| Extension region | -- | Terminator | ------ | Extension region | -- | <- Promotor | 
```

[OpenWetWare](https://openwetware.org/wiki/Janet_B._Matsen:Guide_to_Gibson_Assembly)

![](https://s3-us-west-2.amazonaws.com/oww-files-thumb/7/7b/Gibson_overview_cartoon_JM.png/900px-Gibson_overview_cartoon_JM.png)

## Make sure pyDNA is available

In [4]:
import os
from pydna.readers import read

## Read backbone plasmid

In [8]:
backbone = 'files/pMAL_RH1.gb'
assert os.path.isfile(backbone)
pMal = read(backbone)
pMal

Get plasmid features (workflow should require plasmid backbone in gb format)

In [9]:
pMal.list_features()

| Ft# | Label or Note    | Dir | Sta  | End  |  Len | type         | orf? |
|-----|------------------|-----|------|------|------|--------------|------|
|   0 | nd               | --> | 0    | 7475 | 7475 | source       |  no  |
|   1 | L:lacI           | --> | 80   | 1163 | 1083 | CDS          |  no  |
|   2 | L:tac promoter t | --> | 1405 | 1433 |   28 | promoter     |  no  |
|   3 | L:MBP            | --> | 1527 | 2689 | 1162 | CDS          |  no  |
|   4 | L:Factor Xa site | --> | 2676 | 2688 |   12 | misc_signal  |  no  |
|   5 | L:RNaseH1 (-MLS) | --> | 2700 | 3516 |  816 | misc_feature | yes  |
|   6 | L:HA tag HA\tag  | --> | 2706 | 2733 |   27 | misc_feature |  no  |
|   7 | L:lacZ alpha lac | --> | 3563 | 3745 |  182 | CDS          |  no  |
|   8 | L:AmpR           | --> | 4246 | 5107 |  861 | CDS          | yes  |
|   9 | L:- M13 ori + -\ | <-- | 5148 | 5662 |  514 | rep_origin   |  no  |
|  10 | L:ori            | --> | 5772 | 6361 |  589 | rep_origin   |  no  |
|  11 | L:rop            | <-- | 6790 | 6982 |  192 | misc_feature |  no  |

Lets say we want to place the variable region where `L:RNaseH1 (-MLS)` currently is.

In [12]:
RNAseH = pMal.extract_feature(5)
RNAseH

Dseqrecord(-816)

## Read simulated variable region

For the variable region we want to insert I am using the `cyc.gb` file provided in the gibson assembly example. This genbank file only contains one feature, an ORF. When implementing will need to convert variable region fasta files into this format.

Contents of the file shown below.

```

LOCUS       CYC1                     330 bp ds-DNA     linear       07-FEB-2017
DEFINITION  .
ACCESSION   
VERSION     
SOURCE      .
  ORGANISM  .
COMMENT     
COMMENT     ApEinfo:methylated:1
ORIGIN
        1 ATGACTGAAT TCAAGGCCGG TTCTGCTAAG AAAGGTGCTA CACTTTTCAA GACTAGATGT
       61 CTACAATGCC ACACCGTGGA AAAGGGTGGC CCACATAAGG TTGGTCCAAA CTTGCATGGT
      121 ATCTTTGGCA GACACTCTGG TCAAGCTGAA GGGTATTCGT ACACAGATGC CAATATCAAG
      181 AAAAACGTGT TGTGGGACGA AAATAACATG TCAGAGTACT TGACTAACCC AAAGAAATAT
      241 ATTCCTGGTA CCAAGATGGC CTTTGGTGGG TTGAAGAAGG AAAAAGACAG AAACGACTTA
      301 ATTACCTACT TGAAAAAAGC CTGTGAGTAA   
//
```

In [22]:
cyc1_path = 'files/cyc1.gb'
cyc1 = read(cyc1_path)

## Linearize backbone

In [26]:
from Bio.Restriction import SwaI

linear_backbone = pMal.linearize(SwaI)
linear_backbone

Dseqrecord(-7475)

## Design primers for Cyc1 (variable region)

This would be needed if we wanted to amplify the variable region prior to cloning to increase concentration

In [29]:
from pydna.design import primer_design

In [20]:
cyc1_amplicon = primer_design(cyc1)
cyc1_amplicon.figure()

5ATGACTGAATTCAAGGCC...TGAAAAAAGCCTGTGAGTAA3
                      ||||||||||||||||||||
                     3ACTTTTTTCGGACACTCATT5
5ATGACTGAATTCAAGGCC3
 ||||||||||||||||||
3TACTGACTTAAGTTCCGG...ACTTTTTTCGGACACTCATT5

In [28]:
fragment_list = assembly_fragments(
    (linear_backbone, cyc1_amplicon, linear_backbone)
)

Linear backbone appears at front and end of the fragment list because want the final construct to be cicular.

In [30]:
fragment_list

[Dseqrecord(-7475), Amplicon(400), Dseqrecord(-7475)]

In [31]:
fragment_list[1].figure()

                                   5ATGACTGAATTCAAGGCC...TGAAAAAAGCCTGTGAGTAA3
                                                         ||||||||||||||||||||
                                                        3ACTTTTTTCGGACACTCATTTTTAACATTTGCAATTATAAAACAATTTTAAGCGC5
5CCCCAAAAACAGGAAGATTGTATAAGCAAATATTTATGACTGAATTCAAGGCC3
                                    ||||||||||||||||||
                                   3TACTGACTTAAGTTCCGG...ACTTTTTTCGGACACTCATT5

In [32]:
from pydna.assembly import Assembly

In [33]:
fragment_list = fragment_list[:-1]

In [34]:
fragment_list

[Dseqrecord(-7475), Amplicon(400)]

In [36]:
asm = Assembly(fragment_list)
asm

Assembly
fragments..: 7475bp 400bp
limit(bp)..: 25
G.nodes....: 4
algorithm..: common_sub_strings

In [39]:
canidate = asm.assemble_circular()[0]
canidate

Not exactly sure what we have just created here / where the variable region ended up in the final assembly.

In [42]:
test_assembly = canidate
canidate.write('files/test_assembly.gb')

Visualize in snapgene

![](files/test.1.png)

The insert was 400 bp so it was def added between lacZ alpha and AmpR gene but is not labeled and has taken out the restriction site. Seems like linearizing the plasmid at that point clones in the variable region at that location. Not really want I want to try and do.

## Round 2: Chaning plasmid backbones and going back to design

Switching to more realistic plasmid choice [pFC8](files/pFC8.gb). Not sure why I thought pMal was the way to go.

![](files/pFC8.png)

Lets say just for simplicity we are going to leave in the R-loop forming region, could cut this out to reduce overall length, and use the T3 promotor. Then want to insert the construct (variable region) inbetween `SNRPN` and the `T3` promotor by linearizing at `KpnI` site.

Should note that this will create sticky ends, that might matter since in this approach not using PCR to get homolgous regions between plasmid and variable region. Actually should not because ends would be chewed by the exonuclease.

Sequence at `KpnI` cut site.

![](files/linearization.png)

In [55]:
from Bio.Restriction import KpnI, Analysis, RestrictionBatch
from Bio.Seq import Seq

rb = RestrictionBatch([KpnI])

Sequence overlapping most of region shown above (including `KpnI` site)

In [56]:
s = Seq('tccaagacctcgagggggggcccggtacccagcttttgttccctttagtgagg')
Analong = Analysis(rb, s)

In [61]:
Analong.print_as('map')
Analong.print_that()

                            29 KpnI
                            |                               
tccaagacctcgagggggggcccggtacccagcttttgttccctttagtgagg
|||||||||||||||||||||||||||||||||||||||||||||||||||||
aggttctggagctcccccccgggccatgggtcgaaaacaagggaaatcactcc
1                                                  53




{KpnI: [29]}

So since we are trying to insert our variable region construct at this restriction site the first and last inserted sequence should have homology arms complementary to sequences flanking this site.

Below would be the most I guess extreme version of this, where you are cloning in both promotors, intiation, termination and extention regions. Regions of the same solid color indicate homology.

![](files/insert.png)

But in each backbone, including `PFC8` there is already at least one usable promotor that could be taken advantage of. Additionally, it may not be best to clone in initiation and termiation regions at the same time since we are not 100% sure how long the R-loops that form will be. This is critical for termination region placement relative to the promotor because it needs to be far enough away where by the time R-loops arrive it would be reasonable for them to termininate but close enough so that R-loops are actually, at least on average, able to reach it.

That might look something more like the images below.

### Initiation construct

![](files/insert1.png)

The average length of R-loops formed using the initiation region would then inform termination region design.